ExpatError：文档元素后的垃圾

3 投票

2 回答

16325 浏览

提问于 2025-04-17 03:54

我真的不知道问题出在哪里？我遇到了以下错误：

File "C:\Python27\lib\xml\dom\expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
ExpatError: junk after document element: line 5, column 0

我什么都看不见！谁能帮帮我？我快疯了......

text = """<questionaire>
<question>
    <questiontext>Question1</questiontext>
    <answer>Your Answer: 99</answer>
</question>
<question>
    <questiontext>Question2</questiontext>
    <answer>Your Answer: 64</answer>
</question>
<question>
    <questiontext>Question3</questiontext>
    <answer>Your Answer: 46</answer>
</question>
<question>
    <questiontext>Bitte geben</questiontext>
    <answer>Your Answer: 544</answer>
    <answer>Your Answer: 943</answer>
</question>
</questionaire>"""

cleandata = text.split('<questionaire>')
cleandatastring= "".join(cleandata)
stripped = cleandatastring.strip()
planhtml = stripped.split('</questionaire>')[0]
clean= planhtml.strip()


from xml.dom import minidom

doc = minidom.parseString(clean)
for question in doc.getElementsByTagName('question'):
    for answer in question.getElementsByTagName('answer'):
        if answer.childNodes[0].nodeValue.strip() == 'Your Answer: 99':
            question.parentNode.removeChild(question)

print doc.toxml()

谢谢！

错误处理文档解析 expaterror

2 个回答

在我的情况下，这个问题是因为 libxml2-2.9.11 版本的变化导致 tostring() （来自 lxml）返回了比预期更多的内容，也就是元素后面的内容。例如：

from lxml import etree

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<a>
  <b>
  </b>
</a>
'''
t = etree.fromstring(xml.encode()).getroottree()
print(etree.tostring(
  t.xpath('/a/b')[0],
  encoding=t.docinfo.encoding,
).decode())

预期的输出：

<b>
  </b>

实际的输出：

<b>
  </b>
</a>

如果你把结果传给 xml.dom.minidom.parseString()，它会报错。

更多信息可以在这里找到。

为了避免这个问题，你需要使用 libxml2 <= 2.9.10，或者使用 Alpine Linux 版本大于等于 3.14。

回答于 2025-04-17 由 Python大师

分享举报

你最开始的 text 字符串是格式正确的 XML。然后你对它做了一些操作，结果把它弄坏了。只要解析你最开始的 text，就没问题了。

XML 规定必须有一个顶层元素。等你去解析的时候，它里面有好几个顶层的 <question> 标签。XML 解析器把第一个标签当作根元素来解析，结果发现还有其他的顶层元素，就感到很意外。

回答于 2025-04-17 由 Python大师

分享举报

ExpatError：文档元素后的垃圾

2 个回答

撰写回答