用ElementTree解析pythonxml:如何找到同名元素的值?

2024-04-18 17:19:40 发布

您现在位置:Python中文网/ 问答频道 /正文

免责声明:我对Python、XML和编程都是新手。代码(我从网上偷来的)工作,但有一些问题,我似乎找不到答案或缠绕我的大脑。。。你知道吗

我试图解析grants.gov xml extract website中的XML文件,目的是删除所有不在“无限制”资格类别(在XML中用“EligibilityCategory”标记为“99”)中的授权,并输出一个新的XML文件。你知道吗

下面的代码正确地删除了不感兴趣的融资Opp,但也删除了具有多个EligibilityCategory的融资Opp,其中还包括一个“99”。我想这是因为。只找到第一次发生的东西。我试着用芬德尔,但没办法。提前谢谢你的帮助。你知道吗

import xml.etree.ElementTree as etree
tree = etree.parse('sample.xml')
root = tree.getroot()

for FundingOppSynopsis in root.findall('FundingOppSynopsis'): 
    ID = int(FundingOppSynopsis.find('EligibilityCategory').text)
    if ID != 99:
        root.remove(FundingOppSynopsis)

tree.write("Output/output.xml", xml_declaration=True, encoding='UTF-8', method="xml")

示例(显著缩减)XML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Grants SYSTEM "http://apply07.grants.gov/search/dtd/XMLExtract.dtd">
<Grants>
    <FundingOppSynopsis>
        <FundingOppNumber>USDA-RMA-RME-2008-03</FundingOppNumber>
        <ApplicationsDueDate>03242008</ApplicationsDueDate>
        <Office>Risk Management Agency</Office>
        <Agency>Department of Agriculture</Agency>
        <EligibilityCategory>25</EligibilityCategory>
    </FundingOppSynopsis>
    <FundingOppSynopsis>
        <FundingOppNumber>NPS-ARRAWHIS100315</FundingOppNumber>
        <ApplicationsDueDate>11282009</ApplicationsDueDate>
        <Office>National Park Service</Office>
        <Agency>Department of the Interior</Agency>
        <EligibilityCategory>00</EligibilityCategory>
    </FundingOppSynopsis>
    <FundingOppSynopsis>
        <FundingOppNumber>OFDA-FY08-002-APS</FundingOppNumber>
        <ApplicationsDueDate>10102008</ApplicationsDueDate>
        <Office>None</Office>
        <Agency>Agency for International Development</Agency>
        <EligibilityCategory>99</EligibilityCategory>
    </FundingOppSynopsis>
    <FundingOppSynopsis>
        <FundingOppNumber>AK-NOI08-0004</FundingOppNumber>
        <ApplicationsDueDate>07142008</ApplicationsDueDate>
        <Office>Bureau of Land Management</Office>
        <Agency>Department of the Interior</Agency>
        <EligibilityCategory>99</EligibilityCategory>
    </FundingOppSynopsis>
    <FundingOppSynopsis>
        <FundingOppNumber>RD-RBP-BIOMASS-2007-FULL</FundingOppNumber>
        <ApplicationsDueDate>11162007</ApplicationsDueDate>
        <Office>Business and Cooperative Programs</Office>
        <Agency>Department of Agriculture</Agency>
        <EligibilityCategory>06</EligibilityCategory>
        <EligibilityCategory>12</EligibilityCategory>
        <EligibilityCategory>13</EligibilityCategory>
        <EligibilityCategory>20</EligibilityCategory>
        <EligibilityCategory>22</EligibilityCategory>
        <EligibilityCategory>23</EligibilityCategory>
        <EligibilityCategory>25</EligibilityCategory>
    </FundingOppSynopsis>
    <FundingOppSynopsis>
        <FundingOppNumber>BAA07-10</FundingOppNumber>
        <ApplicationsDueDateExplanation>The due dates and times established for the receipt of White Papers and Full Proposals are as indicated in Section IV, Paragraph 3 of the BAA. </ApplicationsDueDateExplanation>
        <Office>Office of Procurement Operations - Grants Division</Office>
        <Agency>Department of Homeland Security</Agency>
        <EligibilityCategory>00</EligibilityCategory>
        <EligibilityCategory>01</EligibilityCategory>
        <EligibilityCategory>02</EligibilityCategory>
        <EligibilityCategory>04</EligibilityCategory>
        <EligibilityCategory>05</EligibilityCategory>
        <EligibilityCategory>06</EligibilityCategory>
        <EligibilityCategory>07</EligibilityCategory>
        <EligibilityCategory>08</EligibilityCategory>
        <EligibilityCategory>11</EligibilityCategory>
        <EligibilityCategory>12</EligibilityCategory>
        <EligibilityCategory>13</EligibilityCategory>
        <EligibilityCategory>20</EligibilityCategory>
        <EligibilityCategory>21</EligibilityCategory>
        <EligibilityCategory>22</EligibilityCategory>
        <EligibilityCategory>23</EligibilityCategory>
        <EligibilityCategory>25</EligibilityCategory>
        <EligibilityCategory>99</EligibilityCategory>
    </FundingOppSynopsis>
</Grants>

Tags: ofthetreerootxmldepartmentetreeoffice
2条回答

您可以使用xPath请求来实现您想要做的事情。你知道吗

import xml.etree.ElementTree as etree
tree = etree.parse('sample.xml')
root = tree.getroot()

req = tree.findall("./FundingOppSynopsis[EligibilityCategory='99']")

for r in req:
    print r

我提出的请求返回了文档中所有fundingopsynopsis元素的列表,这些元素的子级标记为EligibilityCategory,包含文本“99”。你知道吗

有关xPath请求here的详细信息。你知道吗

有关Pythonhere中xPath用法的详细信息。你知道吗

您需要使用findall提取类别列表,然后检查99是否在该列表中。您可以使用这样的list comprehension

for FundingOppSynopsis in root.findall('FundingOppSynopsis'): 
    IDs = [int(category.text) for category in FundingOppSynopsis.findall('EligibilityCategory')]
    if 99 not in IDs:
        root.remove(FundingOppSynopsis)

相关问题 更多 >