尝试使用lxml获取标签内的文本
我正在尝试使用 lxml 来获取标签 <ImageSet><LargeImage><URL>这段文字</URL></LargeImage></ImageSet>
中的文本,但我的代码只返回了 None,表示没有找到每个标签下的文本。
这是我写的代码:
# I am trying to get the URL text using lxml
for attr_list in tree.iterfind(".//"+settings.AMAZON_NS+"ImageSet"):
for image_list in tree.find(".//"+settings.AMAZON_NS+"LargeImage"):
print(etree.tostring(image_list))
print(image_list.findtext(".//"+settings.AMAZON_NS+"URL")) # This is only printing None.
这是代码的输出结果:
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
第 11 行、17 行、23 行等等应该显示一个 URL,而不是 None。
编辑 1:让我试着更清楚地说明我的问题...
这是我正在使用的代码:
for item in tree.iterfind(".//"+settings.AMAZON_NS+"ImageSet"):
for image_set in item.find(".//"+settings.AMAZON_NS+"LargeImage"):
print(etree.tostring(image_set))
这是我得到的输出:
http://dpaste.com/289187/我该如何具体获取 URL 标签中的内容呢?
我尝试了以下几种方法(虽然都没有成功,但也许你们能从我失败的尝试中看出我想做的事情):
for item in tree.iterfind(".//"+settings.AMAZON_NS+"ImageSet"):
for image_set in item.find(".//"+settings.AMAZON_NS+"LargeImage"):
for image_url_set in image_set.find(".//"+settings.AMAZON_NS+"URL"):
print(etree.tostring(image_url_set))
这是我遇到的错误:
for image_url_set in image_set.find(".//"+settings.AMAZON_NS+"URL"): TypeError: 'NoneType' object is not iterable
for item in tree.iterfind(".//"+settings.AMAZON_NS+"ImageSet"):
for image_set in item.find(".//"+settings.AMAZON_NS+"LargeImage"):
for image_link in image_set.iter(".//"+settings.AMAZON_NS+"URL"):
print(image_link.text)
用这个什么都没有打印出来。
2 个回答
0
试着把
print(image_list.findtext(".//"+settings.AMAZON_NS+"URL"))
换成
print(image_list.text)
1
from cStringIO import StringIO
from lxml import etree
URL_TAG = "{http://webservices.amazon.com/AWSECommerceService/2009-10-01}URL"
tree = etree.fromstring(body)
print tree.findtext(".//%s" % (URL_TAG,)) # 1st way
for ev, el in etree.iterparse(StringIO(body), tag=URL_TAG): # 2nd approach
print el.text
这里的 body
是你的 XML 文本。
输出
http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg
http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg
http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg
http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg
http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg