evernotexml上的Python LXML解析错误

2024-05-13 03:12:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用python2.7中的lxml解析Evernote Markup Language(ENML)。ENML是XHTML的超集。在

from StringIO import StringIO
import lxml.etree as etree

if __name__ == '__main__':
    xml_str = StringIO('<?xml version="1.0" encoding="UTF-8"?>\r\n<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">\r\n\r\n<en-note style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">\nA really simple example. &nbsp;Another sentence.\n</en-note>')
    tree = etree.parse(xml_str)

出现以上错误:

^{pr2}$

如何成功解析ENML?在


Tags: importspacexmllxmlwordennoteetree
2条回答

&nbsp;由HTML解析器而不是XML解析器理解:

from StringIO import StringIO
import lxml.html as LH
if __name__ == '__main__':
    xml_str = StringIO('<?xml version="1.0" encoding="UTF-8"?>\r\n<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">\r\n\r\n<en-note style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">\nA really simple example. &nbsp;Another sentence.\n</en-note>')
    tree = LH.parse(xml_str)
    print(LH.tostring(tree))

可以尝试用数值替换实体名称。在

{a1}

相关问题 更多 >