在Python中写入XML文件时文件损坏
我正在尝试把 xml.dom.minidom
对象的内容写入文件。我的想法很简单,就是用 'writexml' 方法:
import codecs
def write_xml_native():
# Building DOM from XML
xmldoc = minidom.parse('semio2.xml')
f = codecs.open('codified.xml', mode='w', encoding='utf-8')
# Using native writexml() method to write
xmldoc.writexml(f, encoding="utf=8")
f.close()
但问题是,这样会导致文件中的非拉丁编码文本出现乱码。另一种方法是获取文本字符串,然后直接写入文件:
def write_xml():
# Building DOM from XML
xmldoc = minidom.parse('semio2.xml')
# Opening file for writing UTF-8, which is XML's default encoding
f = codecs.open('codified3.xml', mode='w', encoding='utf-8')
# Writing XML in UTF-8 encoding, as recommended in the documentation
f.write(xmldoc.toxml("utf-8"))
f.close()
这样会出现以下错误:
Traceback (most recent call last):
File "D:\Projects\Semio\semioparser.py", line 45, in <module>
write_xml()
File "D:\Projects\Semio\semioparser.py", line 42, in write_xml
f.write(xmldoc.toxml(encoding="utf-8"))
File "C:\Python26\lib\codecs.py", line 686, in write
return self.writer.write(data)
File "C:\Python26\lib\codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 2064: ordinal not in range(128)
我该如何把 XML 文本写入文件呢?我漏掉了什么吗?
编辑:通过添加解码语句,错误得以修复:
f.write(xmldoc.toxml("utf-8").decode("utf-8"))
但是俄文符号仍然出现乱码。
在解释器中查看文本时没有问题,但写入文件后就出现了问题。
2 个回答
0
试试这个:
with open("codified.xml", "w") as f:
f.write(xmldoc.toxml("utf-8").decode("utf-8"))
这个方法对我有效(不过是在Python 3环境下)。
10
嗯,这个应该可以工作:
xml = minidom.parse("test.xml")
with codecs.open("out.xml", "w", "utf-8") as out:
xml.writexml(out)
你也可以试试这个:
with codecs.open("test.xml", "r", "utf-8") as inp:
xml = minidom.parseString(inp.read().encode("utf-8"))
with codecs.open("out.xml", "w", "utf-8") as out:
xml.writexml(out)
更新:如果你是从字符串对象构建xml的话,在传给minidom解析器之前,你需要先对它进行编码,像这样:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import codecs
import xml.dom.minidom as minidom
xml = minidom.parseString(u"<ru>Тест</ru>".encode("utf-8"))
with codecs.open("out.xml", "w", "utf-8") as out:
xml.writexml(out)