使用etree解析xml
我正在尝试解析亚马逊产品广告API返回的XML响应,这就是那个XML内容。
<?xml version="1.0" ?>
<ItemLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2010-11-01"> <OperationRequest>
<HTTPHeaders>
<Header Name="UserAgent" Value="TSN (Language=Python)"></Header>
</HTTPHeaders>
<RequestId>96ef9bc3-68a8-4bf3-a2c7-c98b8aeae00f</RequestId>
<Arguments>
<Argument Name="Operation" Value="ItemLookup"></Argument>
<Argument Name="Service" Value="AWSECommerceService"></Argument>
<Argument Name="Signature" Value="gjc4wRNum3YT82app1d06vMIDM7v44fOmZTP8Uh3LqE="></Argument><Argument Name="AssociateTag" Value="sneakick-20"></Argument>
<Argument Name="Version" Value="2010-11-01"></Argument>
<Argument Name="ItemId" Value="810056013349,810056013264"></Argument>
<Argument Name="IdType" Value="UPC"></Argument>
<Argument Name="AWSAccessKeyId" Value="AKIAIFMUMJLJOOINRVRA"></Argument>
<Argument Name="Timestamp" Value="2012-01-03T21:26:39Z"></Argument>
<Argument Name="ResponseGroup" Value="ItemIds"></Argument>
<Argument Name="SearchIndex" Value="Apparel"></Argument>
</Arguments>
<RequestProcessingTime>0.0595830000000000</RequestProcessingTime>
</OperationRequest>
<Items>
<Request>
<IsValid>True</IsValid>
<ItemLookupRequest>
<IdType>UPC</IdType>
<ItemId>810056013349</ItemId>
<ItemId>810056013264</ItemId>
<ResponseGroup>ItemIds</ResponseGroup>
<SearchIndex>Apparel</SearchIndex>
<VariationPage>All</VariationPage>
</ItemLookupRequest>
</Request>
<Item>
<ASIN>B000XR4K6U</ASIN>
</Item>
<Item>
<ASIN>B000XR2UU8</ASIN>
</Item>
</Items>
</ItemLookupResponse>
我只关心Items里面的Item标签,所以基本上,亚马逊返回的所有XML内容都是以字符串的形式给我的,我是这样解析的:
from xml.etree.ElementTree import fromstring
response = "xml string returned by amazon"
parsed = fromstring(response)
items = parsed[1] # This is how i get the Items element
# These were my attempts at getting the Item element
items.find('Item')
items.findall('Item')
这里的items就是指Items元素,但到目前为止没有成功,它一直返回None或空值。我是不是漏掉了什么,或者有没有其他方法可以解决这个问题呢?
2 个回答
4
这是一个命名空间的问题。这个代码可以正常运行:
from xml.etree import ElementTree as ET
XML = """<?xml version="1.0" ?>
<ItemLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2010-11-01">
<OperationRequest>
<HTTPHeaders>
<Header Name="UserAgent" Value="TSN (Language=Python)"></Header>
</HTTPHeaders>
<RequestId>96ef9bc3-68a8-4bf3-a2c7-c98b8aeae00f</RequestId>
<Arguments>
<Argument Name="Operation" Value="ItemLookup"></Argument>
<Argument Name="Service" Value="AWSECommerceService"></Argument>
<Argument Name="Signature" Value="gjc4wRNum3YT82app1d06vMIDM7v44fOmZTP8Uh3LqE="></Argument>
<Argument Name="AssociateTag" Value="sneakick-20"></Argument>
<Argument Name="Version" Value="2010-11-01"></Argument>
<Argument Name="ItemId" Value="810056013349,810056013264"></Argument>
<Argument Name="IdType" Value="UPC"></Argument>
<Argument Name="AWSAccessKeyId" Value="AKIAIFMUMJLJOOINRVRA"></Argument>
<Argument Name="Timestamp" Value="2012-01-03T21:26:39Z"></Argument>
<Argument Name="ResponseGroup" Value="ItemIds"></Argument>
<Argument Name="SearchIndex" Value="Apparel"></Argument>
</Arguments>
<RequestProcessingTime>0.0595830000000000</RequestProcessingTime>
</OperationRequest>
<Items>
<Request>
<IsValid>True</IsValid>
<ItemLookupRequest>
<IdType>UPC</IdType>
<ItemId>810056013349</ItemId>
<ItemId>810056013264</ItemId>
<ResponseGroup>ItemIds</ResponseGroup>
<SearchIndex>Apparel</SearchIndex>
<VariationPage>All</VariationPage>
</ItemLookupRequest>
</Request>
<Item>
<ASIN>B000XR4K6U</ASIN>
</Item>
<Item>
<ASIN>B000XR2UU8</ASIN>
</Item>
</Items>
</ItemLookupResponse>"""
NS = "{http://webservices.amazon.com/AWSECommerceService/2010-11-01}"
doc = ET.fromstring(XML)
Item_elems = doc.findall(".//" + NS + "Item") # All Item elements in document
print Item_elems
输出结果是:
[<Element '{http://webservices.amazon.com/AWSECommerceService/2010-11-01}Item' at 0xbf0c50>,
<Element '{http://webservices.amazon.com/AWSECommerceService/2010-11-01}Item' at 0xbf0cd0>]
更接近你自己代码的变体:
NS = "{http://webservices.amazon.com/AWSECommerceService/2010-11-01}"
doc = ET.fromstring(XML)
items = doc[1] # Items element
first_item = items.find(NS + 'Item') # First direct Item child
all_items = items.findall(NS + 'Item') # List of all direct Item children