printing html entities using lxml in python -

April 15, 2012

i'm trying make div element below string html entities. since string contains html entities, & reserved char in html entity being escaped & in output. html entities displayed plain text. how can avoid html entities rendered properly?

s = 'actress adamari l&#243;pez , amgen launch spanish-language chemotherapy: myths or facts&#8482; website , resources'  div = etree.element("div") div.text = s  lxml.html.tostring(div)  output: <div>actress adamari l&amp;#243;pez , amgen launch spanish-language chemotherapy: myths or facts&amp;#8482; website , resources</div>

you can specify encoding while calling tostring():

>>> lxml.html import fromstring, tostring >>> s = 'actress adamari l&#243;pez , amgen launch spanish-language chemotherapy: myths or facts&#8482; website , resources' >>> div = fromstring(s) >>> print tostring(div, encoding='unicode') <p>actress adamari lópez , amgen launch spanish-language chemotherapy: myths or facts™ website , resources</p>

as side note, should use lxml.html.tostring() while dealing html data:

note should use lxml.html.tostring , not lxml.tostring. lxml.tostring(doc) return xml representation of document, not valid html. in particular, things <script src="..."></script> serialized <script src="..." />, confuses browsers.

also see:

serialising unicode strings

Search This Blog

CSS

printing html entities using lxml in python -

Comments

Post a Comment

Popular posts from this blog

php - trouble displaying mysqli database results in correct order -

depending on nth recurrence of job in control M -

sql server - Cannot query correctly (MSSQL - PHP - JSON) -