html - Load nested links via Java with jsoup -
i working on crawler via jsoup. want display link(s) of categories asian e-shop https://world.taobao.com/. code able find link on page to:
elements links = doc.select("a[href]"); system.out.println("total results: " + links.size()); but not of them. need show links categories, nested in many <div> tags.
here code:
package jsoup; import java.io.ioexception; import org.jsoup.jsoup; import org.jsoup.nodes.document; import org.jsoup.nodes.element; import org.jsoup.select.elements; public class crawler { public static final string cls_name = "crawler"; public static final string url_source = "https://world.taobao.com/"; public static void main(string[] args) throws ioexception{ // load document document doc = jsoup.connect(url_source).get(); // select <a> tag "href" attribute elements links = doc.select("a[href]"); system.out.println("total results: " + links.size()); (element url: links){ system.out.println(string.format("* [%s] : %s ", url.text(), url.attr("abs:href"))); } } } could please me problem?
this has nothing code.
the particular site generates parts of content using javascript. jsoup able static parts of site, won't able crawl easily.
you can still use tools such selenium that, execute javascript code inside browser.

Comments
Post a Comment