printing html entities using lxml in python -
i'm trying make div element below string html entities. since string contains html entities, & reserved char in html entity being escaped & in output. html entities displayed plain text. how can avoid html entities rendered properly?
s = 'actress adamari lópez , amgen launch spanish-language chemotherapy: myths or facts™ website , resources' div = etree.element("div") div.text = s lxml.html.tostring(div) output: <div>actress adamari l&#243;pez , amgen launch spanish-language chemotherapy: myths or facts&#8482; website , resources</div>
you can specify encoding while calling tostring():
>>> lxml.html import fromstring, tostring >>> s = 'actress adamari lópez , amgen launch spanish-language chemotherapy: myths or facts™ website , resources' >>> div = fromstring(s) >>> print tostring(div, encoding='unicode') <p>actress adamari lópez , amgen launch spanish-language chemotherapy: myths or facts™ website , resources</p> as side note, should use lxml.html.tostring() while dealing html data:
note should use
lxml.html.tostring, notlxml.tostring.lxml.tostring(doc)return xml representation of document, not valid html. in particular, things<script src="..."></script>serialized<script src="..." />, confuses browsers.
also see:
Comments
Post a Comment