用Python从文本文件创建XML树

0 投票

2 回答

4540 浏览

提问于 2025-04-16 04:24

我需要在解析一个文本文件时，避免在xml树中创建重复的分支。假设这个文本文件的内容如下（行的顺序是随机的）：

branch1:branch11:message11
branch1:branch12:message12
branch2:branch21:message21
branch2:branch22:message22

所以，最终生成的xml树应该有一个根节点，下面有两个分支。这两个分支各自又有两个子分支。我用来解析这个文本文件的Python代码如下：

import string
fh = open ('xmlbasic.txt', 'r')
allLines = fh.readlines()
fh.close()
import xml.etree.ElementTree as ET
root = ET.Element('root')

for line in allLines:
   tempv = line.split(':')
   branch1 = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]

tree = ET.ElementTree(root)
tree.write('xmlbasictree.xml')

这个代码的问题在于，每读取一行文本文件，就会在xml树中创建一个新的分支。

有没有什么建议，可以避免在xml树中创建重复的分支，如果这个名字的分支已经存在的话？

XML 数据结构树形结构文件处理文本解析根节点重复分支子分支

2 个回答

大概是这个意思吧？你可以把要重复使用的分支层级保存在一个字典里。

b1map = {}

for line in allLines:
   tempv = line.split(':')
   branch1 = b1map.get(tempv[0])
   if branch1 is None:
       branch1 = b1map[tempv[0]] = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]

回答于 2025-04-16 由 Python大师

分享举报

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

import xml.etree.ElementTree as ET

root = ET.Element('root')

for line in lines:
    head, subhead, tail = line.split(":")

    head_branch = root.find(head)
    if not head_branch:
        head_branch = ET.SubElement(root, head)

    subhead_branch = head_branch.find(subhead)
    if not subhead_branch:
        subhead_branch = ET.SubElement(branch1, subhead)

    subhead_branch.text = tail

tree = ET.ElementTree(root)
ET.dump(tree)

逻辑很简单——你在问题中已经说得很清楚了！你只需要在创建一个分支之前，检查一下这个分支在树中是否已经存在。

需要注意的是，这样做可能效率不高，因为你每处理一行都要在整个树中搜索。这是因为 ElementTree 并不是为了保证唯一性而设计的。

如果你需要速度（虽然对于小一点的树来说，可能并不需要！），一种更有效的方法是先用 defaultdict 来存储树的结构，然后再转换成 ElementTree。

import collections
import xml.etree.ElementTree as ET

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

root_dict = collections.defaultdict( dict )
for line in lines:
    head, subhead, tail = line.split(":")
    root_dict[head][subhead] = tail

root = ET.Element('root')
for head, branch in root_dict.items():
    head_element = ET.SubElement(root, head)
    for subhead, tail in branch.items():
        ET.SubElement(head_element,subhead).text = tail

tree = ET.ElementTree(root)
ET.dump(tree)

回答于 2025-04-16 由 Python大师

分享举报

用Python从文本文件创建XML树

2 个回答

撰写回答