python - Formatting the output with html2text library -


i need retrieve html table data row , columns data api , populate other teams.

import requests import json import html2text #from bs4 import beautifulsoup  headers = {     'authorization': 'bearer hmy0w2ltszfxeysnq8cbjzfcyr4kzfk5k9a0vfca.t',     'content-type': 'application/json', } data = '{}' response = requests.get('https://sandbox.jiveon.com/api/core/v3/contents/436669', headers=headers, data=data) data = response.json() print (data['content']['text']) 

for converting text

format = html2text.html2text() format.ignore_links = true format.bypass_tables = false #format.ignore_tables = true format.wrap_links = true format.ignore_images = true format.ignore_emphasis = true format.wrap_links = true print (format.handle(data['content']['text'])) 

output of above code snippet :

<body><!-- [documentbodystart:756f88b6-eed4-4030-ada9-f74dc8e4418b] --><div class="jive-rendered-content"><p>db release&#160;</p><p style="min-height: 8pt; padding: 0px;">&#160;</p><div class="j-rte-table"><table class="j-table jiveborder" style="border: 1px solid #c6c6c6;" width="100%"><thead><tr style="background-color: #efefef;"><th style="width: 11%;">release version</th><th style="width: 10%;">refdb_id</th><th style="width: 160%;">svn url</th></tr></thead><tbody><tr><td style="width: 11%;">3.7.3</td><td style="width: 10%;"><p style="background-color: #ffffff; border: 0px; padding: 0px;">3710002</p><p style="background-color: #ffffff; border: 0px; padding: 0px;">3710003 <br/>3710005 <br/>3710007 <br/>3710009<br/>3710011</p></td><td style="width: 160%;"><p style="background-color: #ffffff; border: 0px; padding: 0px;"><a class="jive-link-external-small" href="http://svnurl.com" rel="nofollow">http://svnurl1.com&#160;</a></p><p style="background-color: #ffffff; border: 0px; padding: 0px;"><a class="jive-link-external-small" href="http://svnurl2.com" rel="nofollow">http://svnurl2.com</a></p></td></tr></tbody></table></div></div><!-- [documentbodyend:756f88b6-eed4-4030-ada9-f74dc8e4418b] --></body>  db release  release version| refdb_id| svn url ---|---|--- 3.7.3|  3710002  3710003 3710005 3710007 3710009 3710011  |  http://svnurl1.com  http://svnurl2.com 

whereas expected output enter image description here

i got solution filter out data based on command line argument.

import requests import json import sys bs4 import beautifulsoup sys import argv xml.etree import elementtree et   headers = {     'authorization': 'bearer hmy0w2ltszfxeysnq8cbjzfcyr4kzfk5k9a0vfca.t',     'content-type': 'application/json', } data = '{}' response = requests.get('https://sandbox.jiveon.com/api/core/v3/contents/436669', headers=headers, data=data) data = response.json() html_doc = data['content']['text'] soup = beautifulsoup(html_doc, 'html.parser') mytag = [] mydata = [] finaldata = [] table = soup.findall('tr') val in table:     trdata = beautifulsoup(str(val),'html.parser')     if '3.7.4' in str(trdata):       mytag = trdata.findall('td')    val in mytag:   mydata.append(val.get_text())  val in mydata:   if str(val).startswith('http:'):     urldata = str(val).split('.com')     val in urldata:       if val:         finaldata.append("".join([str(val), '.com']))   else:     finaldata.append(val)  val in finaldata:   print (val) 

Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -