printing html entities using lxml in python -
i'm trying make div element below string html entities. since string contains html entities, &
reserved char in html entity being escaped &
in output. html entities displayed plain text. how can avoid html entities rendered properly?
s = 'actress adamari lópez , amgen launch spanish-language chemotherapy: myths or facts™ website , resources' div = etree.element("div") div.text = s lxml.html.tostring(div) output: <div>actress adamari l&#243;pez , amgen launch spanish-language chemotherapy: myths or facts&#8482; website , resources</div>
you can specify encoding
while calling tostring()
:
>>> lxml.html import fromstring, tostring >>> s = 'actress adamari lópez , amgen launch spanish-language chemotherapy: myths or facts™ website , resources' >>> div = fromstring(s) >>> print tostring(div, encoding='unicode') <p>actress adamari lópez , amgen launch spanish-language chemotherapy: myths or facts™ website , resources</p>
as side note, should use lxml.html.tostring()
while dealing html
data:
note should use
lxml.html.tostring
, notlxml.tostring
.lxml.tostring(doc)
return xml representation of document, not valid html. in particular, things<script src="..."></script>
serialized<script src="..." />
, confuses browsers.
also see:
Comments
Post a Comment