java - Parse a table from HTML using jsoup -


i've got problem scraping html text. here's sample of i'm trying extract from:

<table class="scripture">   <tbody>    <tr>    <td class="verse" valign="top">     <a name="2:1"></a><a class="vers" href="javascript:getparallel('luk', 2, 1);" title="klik om grondtekst en sv te zien">&nbsp;1&nbsp;</a>    </td>    <td class="content">     <span class="main">en het geschiedde in die dagen dat er een gebod uitging van keizer augustus dat heel de wereld ingeschreven moest worden.</span>    </td>    </tr>   </tbody> </table>  <table class="scripture">   <tbody>    <tr>    <td class="verse" valign="top">     <a name="2:2"></a><a class="vers" href="javascript:getparallel('luk', 2, 2);" title="klik om grondtekst en sv te zien">&nbsp;2&nbsp;</a>    </td>    <td class="content">     <span class="main">deze eerste inschrijving vond plaats toen cyrenius on syriƫ stadhouder was.</span>    </td>    </tr>   </tbody> </table> 

this similar problem in link want verse text , scripture content. how achieve this?

so far i've tried:

element table = doc.select("table[class=scripture]").first(); log.e("bb", "passage1: " + table.owntext()); 

but doesn't display anything. appreciated. thanks.

assuming want span's content corresponding table contains verse 2:2, can with:

string verse = "2:2"; // span of class main located inside table of class scripture // contains td of class verse link attribute name value of verse element p = doc.select(     string.format("table.scripture:has(td.verse a[name=%s]) span.main", verse) ).first(); system.out.println(p.text()); 

output:

deze eerste inschrijving vond plaats toen cyrenius on syriƫ stadhouder was. 

Comments

Popular posts from this blog

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -