如何正确地从这个xm中提取信息

2024-03-28 18:05:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试查询一个xml文档,以打印出与较低级别元素关联的较高级别元素属性。我得到的结果与xml结构不一致。基本上这就是我到目前为止的代码。你知道吗

import xml.etree.ElementTree as ET

tree = ET.parse('movies2.xml') root = tree.getroot()

for child in root:
    print(child.tag, child.attrib) print()

mov = root.findall("./genre/decade/movie/[year='2000']")
for movie in mov:
    print(child.attrib['category'], movie.attrib['title'])

这就产生了-

genre {'category': 'Action'}
genre {'category': 'Thriller'}
genre {'category': 'Comedy'}

Comedy X-Men
Comedy American Psycho

最后两行应该列出与电影标题相关的两个不同类型属性,如果检查xml的话-

Action X-Men
Thriller American Psycho

这是供参考的xml-

<?xml version='1.0' encoding='utf8'?>
<collection>
    <genre category="Action">
        <decade years="1980s">
            <movie favorite="True" title="Indiana Jones: The raiders of 
              the lost Ark">
                <format multiple="No">DVD</format>
                <year>1981</year>
                <rating>PG</rating>
                <description>
                'Archaeologist and adventurer Indiana Jones 
                is hired by the U.S. government to find the Ark of the 
                Covenant before the Nazis.'
                </description>
            </movie>
               <movie favorite="True" title="THE KARATE KID">
               <format multiple="Yes">DVD,Online</format>
               <year>1984</year>
               <rating>PG</rating>
               <description>None provided.</description>
            </movie>
            <movie favorite="False" title="Back 2 the Future">
               <format multiple="False">Blu-ray</format>
               <year>1985</year>
               <rating>PG</rating>
               <description>Marty McFly</description>
            </movie>
        </decade>
        <decade years="1990s">
            <movie favorite="False" title="X-Men">
               <format multiple="Yes">dvd, digital</format>
               <year>2000</year>
               <rating>PG-13</rating>
               <description>Two mutants come to a private academy for     >                     their kind whose resident superhero team must 
                oppose a terrorist organization with similar powers. 
               </description>
            </movie>
            <movie favorite="True" title="Batman Returns">
               <format multiple="No">VHS</format>
               <year>1992</year>
               <rating>PG13</rating>
               <description>NA.</description>
            </movie>
               <movie favorite="False" title="Reservoir Dogs">
               <format multiple="No">Online</format>
               <year>1992</year>
               <rating>R</rating>
               <description>WhAtEvER I Want!!!?!</description>
            </movie>
        </decade>    
    </genre>

    <genre category="Thriller">
        <decade years="1970s">
            <movie favorite="False" title="ALIEN">
                <format multiple="Yes">DVD</format>
                <year>1979</year>
                <rating>R</rating>
                <description>"""""""""</description>
            </movie>
        </decade>
        <decade years="1980s">
            <movie favorite="True" title="Ferris Bueller's Day Off">
                <format multiple="No">DVD</format>
                <year>1986</year>
                <rating>PG13</rating>
                <description>Funny movie about a funny guy</description>
            </movie>
            <movie favorite="FALSE" title="American Psycho">
                <format multiple="No">blue-ray</format>
                <year>2000</year>
                <rating>Unrated</rating>
                <description>psychopathic Bateman</description>
            </movie>
        </decade>
    </genre>

    <genre category="Comedy">
        <decade years="1960s">
            <movie favorite="False" title="Batman: The Movie">
                <format multiple="Yes">DVD,VHS</format>
                <year>1966</year>
                <rating>PG</rating>
                <description>What a joke!</description>
            </movie>
        </decade>
        <decade years="2010s">
            <movie favorite="True" title="Easy A">
                <format multiple="No">DVD</format>
                <year>2010</year>
                <rating>PG--13</rating>
                <description>Emma Stone = Hester Prynne</description>
            </movie>
            <movie favorite="True" title="Dinner for SCHMUCKS">
                <format multiple="Yes">DVD,digital,Netflix</format>
                <year>2011</year>
                <rating>Unrated</rating>
                <description>Tim (Rudd) is a rising executive who 
                 'succeeds' in finding the perfect guest, IRS employee 
                 Barry (Carell), for his boss' monthly event, a so-called 
                 'dinner for idiots,' which offers certain advantages to 
                 the exec who shows up with the biggest buffoon.
                </description>
            </movie>
        </decade>
        <decade years="1980s">
            <movie favorite="False" title="Ghostbusters">
                <format multiple="No">Online,VHS</format>
                <year>1984</year>
                <rating>PG</rating>
                <description>Who ya gonna call?</description>
            </movie>
        </decade>
        <decade years="1990s">
            <movie favorite="True" title="Robin Hood: Prince of Thieves">
                <format multiple="No">Blu_Ray</format>
                <year>1991</year>
                <rating>Unknown</rating>
                <description>Robin Hood slaying</description>
            </movie>
        </decade>
    </genre>
</collection>

Tags: thenoformattitledescriptionxmlmoviemultiple
1条回答
网友
1楼 · 发布于 2024-03-28 18:05:05

初始循环:

for child in root:
    print(child.tag, child.attrib) print()

child设置为最后一个子级;因此child.attrib['category']将始终是最后一个子级的类别。对你来说,最后一个孩子是个喜剧。对于第二圈中的每一部电影:

for movie in mov:
   print(child.attrib['category'], movie.attrib['title'])

您正在打印第一个循环中找到的最后一个孩子的类别;因此他们都打印“喜剧”。你知道吗

编辑:这将至少选择具有正确类型标记的相同电影,但可能有不同的顺序:

for child in root:
    mov = child.findall("./decade/movie/[year='2000']")
    for movie in mov:
        print(child.attrib['category'], movie.attrib['title'])

另一种方法是使用lxml而不是elementree:

from lxml import etree as ET

tree = ET.parse('movies2.xml')
root = tree.getroot()

mov = root.findall("./genre/decade/movie/[year='2000']")
for movie in mov:
    print(movie.getparent().getparent().attrib['category'], movie.attrib['title'])

相关问题 更多 >