Python:空字符中断xml表单

2024-06-09 20:11:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一些python代码可以处理输入文件并将输入中的某些字段转储到XML文件中。此代码在从输入传递空字符时中断--引发无效的令牌错误:

def pretty_print_xml(elem):

    rough_string = ET.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent='    ')

这让我很惊讶,我想知道它为什么会坏掉,还有什么需要从输入中清除掉。我以为只有XML元字符才能抛出这个错误,而minidom已经在处理这些错误。在


Tags: 文件代码stringdef错误prettyxmlet
1条回答
网友
1楼 · 发布于 2024-06-09 20:11:18

XML中不允许使用NUL文本。见the XML standard, version 1.1

2.2 Characters

[Definition: A parsed entity contains text, a sequence of characters, which may represent markup or character data.] [Definition: A character is an atomic unit of text as specified by ISO/IEC 10646 [ISO/IEC 10646]. Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. The versions of these standards cited in A.1 Normative References were current at the time this document was prepared. New characters may be added to these standards by amendments or new editions. Consequently, XML processors must accept any character in the range specified for Char.]

[2]       Char       ::=      [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
[2a]      RestrictedChar     ::=      [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F]

请注意,Char被定义为允许(在其他范围中)\x01到{},但是不是\x00。在


顺便说一句,如果你的目标是很好的打印,我建议使用lxml.etree。如果序列化调用上的the ^{} argument不能开箱即用,请参见the relevant FAQ entry。在

相关问题 更多 >