解析具有多个标记、属性和值的XML文件

2024-05-29 03:08:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试解析具有以下架构的xml文件:

<game gameId="cricket">
    <Period duration="1year" endTime="2017-12-31"/>
    <repPeriod duration="1year"/>
    <player p="1">sachin</player>
    <player p="2">rahul</player>
    <player p="3">saurav</player>
    <player p="4">kapil</player>
    <player p="5">sanjay</player>
    <player p="6">kartik</player>
    <player p="7">michel</player>
    <player p="8">rickey</player>
    <ranking period="2016">
        <r p="1">3</r>
    </ranking>
    <ranking period="DEFAULT">
        <r p="2">4</r>
        <r p="3">16</r>
        <r p="4">16</r>
        <r p="5">6</r>
        <r p="6">3</r>
        <r p="7">7</r>
        <r p="8">7</r>
    </ranking>
</game>

我无法找到属性p=“1”的玩家如何映射到相应的排名值

我想要的输出是:

玩家:排名

萨钦:3

拉胡尔:4

到目前为止我的代码是:

from xml.dom import minidom

doc = minidom.parse('report.xml')
node = doc.documentElement
gameinfo = doc.getElementsByTagName("game")

counterlist = ['cricket','football']
for gameid in gameinfo:
    for counter in counterlist:
        if gameid.getAttribute('game') == counter:
            itemlist = counter.getElementsByTagName("player")
            i = len(itemlist)
            j = 1
            while j<=i:
                for itemnumber in itemlist:
                    if itemnumber.getAttribute('p') == j:
                        Playername = gameid.getElementsByTagName("player")[j].childNodes[0].data
                        rankid = gameid.getElementsByTagName("r")[j].childNodes[0].data
                        print (playername : rankid)

                j = j+1

Tags: ingamefordoccounter玩家xmlperiod
2条回答

使用ElementTree

例如:

import xml.etree.ElementTree as ET
from collections import defaultdict

tree = ET.parse(filename)
root = tree.getroot()
d = defaultdict(list)

for tag in root.findall(".//*[@p]"):          #Find all tags with 'p' attrib
    d[tag.attrib['p']].append(tag.text)

for i in d.values():
    print("{} : {}".format(i[0], i[1]))

输出:

sachin : 3
saurav : 16
rahul : 4
sanjay : 6
kapil : 16
michel : 7
kartik : 3
rickey : 7

最简单的方法是制作一个字典来存储玩家名称和ID(即将<player p="1">sachin</player>存储为{ '1': 'sachin' }),然后遍历排名并使用存储的玩家名称数据填充输出。你知道吗

# collect player name and ID
pdic = {}
playerlist = doc.getElementsByTagName("player")
for item in playerlist:
    pdic[ item.getAttribute('p') ] = item.childNodes[0].data

# get all the rankings
for r in doc.getElementsByTagName('r'):
    # get attribute `p` and find it in our dictionary
    if r.getAttribute('p') in pdic:
        print( pdic[r.getAttribute('p')] + ": " + r.childNodes[0].data )

输出:

sachin: 3
rahul: 4
saurav: 16
kapil: 16
sanjay: 6
kartik: 3
michel: 7
rickey: 7

相关问题 更多 >

    热门问题