xml上的Python解析器无法返回分支

2024-04-20 16:40:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前正在尝试解析一个下载的xml文件并写入csv文件,但是我对xml格式有点缺乏经验。不管怎样,我都可以从第一个分支Filing返回元素,但无法从以下分支(如Registrant)返回任何内容。你知道吗

下面是我尝试筛选的xml:

<?xml version='1.0' encoding='UTF-16'?>
<PublicFilings>
<Filing ID="146DC558-FB00-4BAB-A393-EC50483FB7A9" Year="2014" Received="2014-10-08T12:31:59.127" Amount="10000" Type="THIRD QUARTER REPORT" Period="3rd Quarter (July 1 - Sep 30)">
<Registrant xmlns="" RegistrantID="36366" RegistrantName="Sports &amp; Fitness Industry Association" GeneralDescription="Representing Manufacturers, retailers and other interests in sports and fitness business" Address="8505 Fenton Street&#xD;&#xA;Suite 211&#xD;&#xA;Silver Spring, MD 20910" RegistrantCountry="USA" RegistrantPPBCountry="USA" />
<Client xmlns="" ClientName="Sports &amp; Fitness Industry Association" GeneralDescription="" ClientID="12" SelfFiler="TRUE" ContactFullname="WILLIAM H. SELLS III" IsStateOrLocalGov="TRUE" ClientCountry="USA" ClientPPBCountry="USA" ClientState="MARYLAND" ClientPPBState="MARYLAND" />
<Lobbyists>
<Lobbyist xmlns="" LobbyistName="Sells, William Howard III" LobbyistCoveredGovPositionIndicator="NOT COVERED" OfficialPosition="" /></Lobbyists>
<GovernmentEntities>
<GovernmentEntity xmlns="" GovEntityName="Education, Dept of" />
<GovernmentEntity xmlns="" GovEntityName="Health &amp; Human Services, Dept of (HHS)" />
<GovernmentEntity xmlns="" GovEntityName="U.S. Trade Representative (USTR)" /><GovernmentEntity xmlns="" GovEntityName="Consumer Product Safety Commission (CPSC)" />
<GovernmentEntity xmlns="" GovEntityName="Internal Revenue Service (IRS)" />
<GovernmentEntity xmlns="" GovEntityName="Federal Trade Commission (FTC)" />
<GovernmentEntity xmlns="" GovEntityName="Intl Trade Administration (ITA)" />
<GovernmentEntity xmlns="" GovEntityName="Interior, Dept of (DOI)" />
<GovernmentEntity xmlns="" GovEntityName="Centers For Disease Control &amp; Prevention (CDC)" />
<GovernmentEntity xmlns="" GovEntityName="Transportation, Dept of (DOT)" />
<GovernmentEntity xmlns="" GovEntityName="Natl Institutes of Health (NIH)" />
<GovernmentEntity xmlns="" GovEntityName="Justice, Dept of (DOJ)" />
<GovernmentEntity xmlns="" GovEntityName="Commerce, Dept of (DOC)" />
<GovernmentEntity xmlns="" GovEntityName="HOUSE OF REPRESENTATIVES" />
<GovernmentEntity xmlns="" GovEntityName="SENATE" />
<GovernmentEntity xmlns="" GovEntityName="U.S. Customs &amp; Border Protection" />
</GovernmentEntities>
<Issues>
<Issue xmlns="" Code="SPORTS/ATHLETICS" SpecificIssue="Physical Activity, Sports, Recreation, Exercise &amp; Fitness, Sedentary Lifestyles, Pay-to-Play, Title IX, Sports Injuries &amp; Concussions" />
<Issue xmlns="" Code="HEALTH ISSUES" SpecificIssue="Childhood Obesity, Obesity, Chronic Disease, Prevention via Physical Activity, Wellness Benefits of Physical Activity" />
<Issue xmlns="" Code="TRANSPORTATION" SpecificIssue="Trail Development, Park &amp; Recreation Access, Highway Fees, Safe Routes to School" />
<Issue xmlns="" Code="TAXATION/INTERNAL REVENUE CODE" SpecificIssue="Physical Activity Tax Incentives, Duties &amp; Tariffs, Tax Relief, Tax Reform, Internet Sales Tax" />
<Issue xmlns="" Code="COPYRIGHT/PATENT/TRADEMARK" SpecificIssue="Intellectual Property Rights, Rogue Websites, False Markings, Counterfeit &amp; Fake Products, Patent Reform" />
<Issue xmlns="" Code="TRADE (DOMESTIC/FOREIGN)" SpecificIssue="Shipping Act Reform, Intellectual Property Rights Enforcement, Free Trade Agreements, Tariffs, Duties, Quotas, Market Access" />
<Issue xmlns="" Code="TORTS" SpecificIssue="Product Liability, Intellectual Property Rights" />
<Issue xmlns="" Code="REAL ESTATE/LAND USE/CONSERVATION" SpecificIssue="Park &amp; Recreation Development &amp; Maintenance, Land &amp; Water Conservation Fund, Urban Planning, Park &amp; Recreation Access, National Park Preservation" />
<Issue xmlns="" Code="TARIFF (MISCELLANEOUS TARIFF BILLS)" SpecificIssue="Tariffs &amp; Duties on Sporting Goods &amp; Ftiness Products and Equipment, Trade Agreements" />
<Issue xmlns="" Code="EDUCATION" SpecificIssue="Phyical Education Funding, ESEA Reauthorization, Physical Activity, Pay-to-Play School Sports, School Sports Injuries" />
<Issue xmlns="" Code="APPAREL/CLOTHING INDUSTRY/TEXTILES" SpecificIssue="Tariffs, Duties, Free Trade Agreements, Chinese Currency Valuation, Market Access, TPP, TTIP, TPA" />
<Issue xmlns="" Code="CONSUMER ISSUES/SAFETY/PRODUCTS" SpecificIssue="CPSIA Compliance, Product Testing, Product Safety Database, Sports Equipment &amp; Helmet Safety" />
<Issue xmlns="" Code="MANUFACTURING" SpecificIssue="Trade Agreements, Product Safety, Domestic Job Creation, Access to Raw Materials, Restrictions on Product Content, Outsourcing" /></Issues></Filing>

下面是我当前在python中使用元素树的python代码

import xml.etree.ElementTree as ET
import xml
import csv
import datetime

e = xml.etree.ElementTree.parse('/Users/Ryan/Downloads/2015_1/2015_1_1_1.xml').getroot()

#filing_elements = ['filing_ID', 'Year', 'Amount', 'Type', 'Period']
#Filing

IDs = []
for atype in e.findall('Filing'):
    IDs.append(atype.get('ID'))
Year = []
for atype in e.findall('Filing'):
    Year.append(atype.get('Year'))
Amount = []
for atype in e.findall('Filing'):
    Amount.append(atype.get('Amount'))
Type = []
for atype in e.findall('Filing'):
    Type.append(atype.get('Type'))
Period = []
for atype in e.findall('Filing'):
    Period.append(atype.get('Period'))
#Registrant 
RegistrantID = []
for ty in e.findall('Registrant'):
    RegistrantID.append(ty.get('RegistrantID'))
RegistrantName = []
for atype in e.findall('Registrant'):
    RegistrantName.append(atype.get('RegistrantName'))
GeneralDescription = []
for atype in e.findall('Registrant'):
    GeneralDescription.append(atype.get('GeneralDescription'))
ClientName = []
for atype in e.findall('Client'):
    ClientName.append(atype.get('ClientName'))

在本例中,搜索Registrant中元素的所有循环都返回空列表


Tags: inforgetcodeissuexmlampappend
1条回答
网友
1楼 · 发布于 2024-04-20 16:40:20

您所做的工作只适用于根目录的直接子目录(即“publicfillings”)。然而,你想要的标签“注册者”是一个孙子,而不是一个直系子女。你知道吗

为了从根目录中找到树中任意位置的标记,请对每个搜索循环使用以下命令:

findall(".//Registrant"):

例如:

RegistrantID = []
for ty in e.findall(".//Registrant"):
    RegistrantID.append(ty.get('RegistrantID'))

相关问题 更多 >