用Python3.2从头开始创建Unicode XML

2024-04-19 13:57:58 发布

您现在位置:Python中文网/ 问答频道 /正文

所以基本上,我想用python字典中的数据生成的元素生成一个XML,其中的标记是字典的键,文本是字典的值。我不需要给这些项赋予属性,我想要的输出如下所示:

<AllItems>

  <Item>
    <some_tag> Hello World </some_tag>
    ...
    <another_tag />
  </Item>

  <Item> ... </Item>
  ...

</AllItems>

我尝试使用xml.etree.Element tree包,方法是创建一个树,将元素“AllItems”设置为根,如下所示:

from xml.etree import ElementTree as et

def dict_to_elem(dictionary):
    item = et.Element('Item')
    for key in dictionary:
        field = et.Element(key.replace(' ',''))
        field.text = dictionary[key]
        item.append(field)
    return item

newtree = et.ElementTree()
root = et.Element('AllItems')
newtree._setroot(root)

root.append(dict_to_elem(  {'some_tag':'Hello World', ...}  )
# Lather, rinse, repeat this append step as needed

with open(  filename  , 'w', encoding='utf-8') as file:
    tree.write(file, encoding='unicode')

在最后两行中,我试着省略open()语句中的编码,省略write()方法中的编码并将其更改为“UTF-8”,我得到的错误是“)is type str is not serializable

所以我的问题-我只想知道我应该如何从头开始用上面的格式创建一个UTF-8xml,是否有一个使用另一个包的更健壮的解决方案,它将允许我正确地处理UTF-8字符?我没有为解决方案与ElementTree结婚,但我更希望不必创建模式。提前感谢您的建议/解决方案!


Tags: keyfielddictionary字典astagsomeroot
2条回答

不必这样做,但是如果工具不理解生成的xml文件,可以显式地添加xml声明:

#!/usr/bin/env python3
from xml.etree import ElementTree as etree

your_dict = {'some_tag': 'Hello World ☺'}

def add_items(root, items):
    for name, text in items:
        elem = etree.SubElement(root, name)
        elem.text = text

root = etree.Element('AllItems')
add_items(etree.SubElement(root, 'Item'),
          ((key.replace(' ', ''), value) for key, value in your_dict.items()))
tree = etree.ElementTree(root)
tree.write('output.xml', xml_declaration=True, encoding='utf-8')

output.xml:

<?xml version='1.0' encoding='utf-8'?>
<AllItems><Item><some_tag>Hello World ☺</some_tag></Item></AllItems>

在我看来,ElementTree是一个不错的选择。如果将来需要更强大的包,可以切换到使用相同接口的第三方lxml模块。

您的问题的答案可以在文档http://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.ElementTree.write中找到

The output is either a string (str) or binary (bytes). This is controlled by the encoding argument. If encoding is "unicode", the output is a string; otherwise, it’s binary. Note that this may conflict with the type of file if it’s an open file object; make sure you do not try to write a string to a binary stream and vice versa.

基本上,你做得对。以文本模式open()文件,这样文件接受字符串,并且需要使用'unicode'参数作为tree.write()。否则,可以以二进制模式打开文件(在open()中没有编码参数),并在tree.write()中使用'utf-8'

稍微清理一下独立工作的代码:

#!python3
from xml.etree import ElementTree as et

def dict_to_elem(dictionary):
    item = et.Element('Item')
    for key in dictionary:
        field = et.Element(key.replace(' ',''))
        field.text = dictionary[key]
        item.append(field)
    return item

root = et.Element('AllItems')     # create the element first...
tree = et.ElementTree(root)       # and pass it to the created tree

root.append(dict_to_elem(  {'some_tag':'Hello World', 'xxx': 'yyy'}  ))
# Lather, rinse, repeat this append step as needed

filename = 'a.xml'
with open(filename, 'w', encoding='utf-8') as file:
    tree.write(file, encoding='unicode')

# The alternative is...    
fname = 'b.xml'
with open(fname, 'wb') as f:
    tree.write(f, encoding='utf-8')

这取决于目的。在这两个问题中,我个人更喜欢第一个解决方案。它清楚地表明您编写了一个文本文件(而XML是一个文本文件)。

但不需要告诉编码的最简单的替代方法是将文件名传递给tree.write,如下所示:

tree.write('c.xml', encoding='utf-8')

它打开文件,使用给定的编码写入内容,然后关闭文件。你读起来很容易,在这里也不会出错。

相关问题 更多 >