Python：解析一个包含多个属性的XML文件

<foo> <bar> <unit> <structure> <token word="Rocky" att1="noun" att2="name">Rocky</token> <token word="the" att1="article" att2="">the</token> <token word="yellow" att1="adjective" att2="color">yellow</token> <token word="dog" att1="noun" att2="animal">dog</token> </structure> </unit> </bar> </foo>

2条回答

网友

1楼 · 编辑于 2024-05-29 05:52:01

在包含结束标记标记的情况下，假设文本包含在test.xml中，则如下所示：

import xml.etree.ElementTree

e = xml.etree.ElementTree.parse('test.xml').getroot()

listWord = []
listAtt1 = []
listAtt2 = []

for child in e.iter('token'):
    listWord.append(child.attrib['word'])
    listAtt1.append(child.attrib['att1'])
    listAtt2.append(child.attrib['att2'])

print listWord
print listAtt1
print listAtt2

将返回：

^{pr2}$

e.iter()允许您迭代e作为根和它下面的元素-我们指定token标记来只返回token元素。child.attrib返回属性字典，我们将其附加到列表中。在

编辑：关于您问题的第二点，我认为以下几点（尽管可能不是最佳实践）可以满足您的需求：

import xml.etree.ElementTree

e = xml.etree.ElementTree.parse('test.xml').getroot()

listWord = []
listAtt1 = []
listAtt2 = []
animal_structs =[]

for structure in e.iter('structure'):
    for child in structure.iter('token'):
        if 'att2' in child.keys():
            if child.attrib['att2'] == 'animal':
                animal_structs.append(structure)
                break

for structure in animal_structs:
    for child in structure.iter('token'):
        listWord.append(child.attrib['word'])
        listAtt1.append(child.attrib['att1'])
        listAtt2.append(child.attrib['att2'])

print listWord
print listAtt1
print listAtt2

我们首先创建一个包含structure子元素的列表，然后返回每个结构的所有then属性。在

网友

2楼 · 编辑于 2024-05-29 05:52:01

我不确定我是否理解您的问题，但以下是我理解的部分（使用lxml和xpath）：

from lxml import etree
tree = etree.fromstring("""<foo>
  <bar>
      <unit>
          <structure>
              <token word="Rocky" att1="noun" att2="name"></token>
              <token word="the" att1="article" att2=""></token>
              <token word="yellow" att1="adjective" att2="color"></token>
              <token word="dog" att1="noun" att2="animal"></token>
          </structure>
      </unit>
  </bar>
</foo>""")


// get a list of all possible words, att1, att2:
listWord = tree.xpath("//token/@word")
listAtt1 = tree.xpath("//token/@att1")
listAtt2 = tree.xpath("//token/@att2")

// get all the tokens with att2="animal"
for token in tree.xpath('//token[@att2="animal"]'):
    do_your_own_stuff()

相关问题更多 >

编程相关推荐

热门问题

热门文章