尝试使用lxml获取标签内的文本

Question

我正在尝试使用 lxml 来获取标签 <ImageSet><LargeImage><URL>这段文字</URL></LargeImage></ImageSet> 中的文本，但我的代码只返回了 None，表示没有找到每个标签下的文本。

这是我写的代码：

# I am trying to get the URL text using lxml

for attr_list in tree.iterfind(".//"+settings.AMAZON_NS+"ImageSet"):
    for image_list in tree.find(".//"+settings.AMAZON_NS+"LargeImage"):
        print(etree.tostring(image_list))
        print(image_list.findtext(".//"+settings.AMAZON_NS+"URL")) # This is only printing None.

这是代码的输出结果：

<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>

第 11 行、17 行、23 行等等应该显示一个 URL，而不是 None。

编辑 1：让我试着更清楚地说明我的问题...

这是我正在使用的代码：

for item in tree.iterfind(".//"+settings.AMAZON_NS+"ImageSet"):
    for image_set in item.find(".//"+settings.AMAZON_NS+"LargeImage"):
        print(etree.tostring(image_set))

这是我得到的输出：

http://dpaste.com/289187/

我该如何具体获取 URL 标签中的内容呢？

我尝试了以下几种方法（虽然都没有成功，但也许你们能从我失败的尝试中看出我想做的事情）：

for item in tree.iterfind(".//"+settings.AMAZON_NS+"ImageSet"):
    for image_set in item.find(".//"+settings.AMAZON_NS+"LargeImage"):
        for image_url_set in image_set.find(".//"+settings.AMAZON_NS+"URL"):
            print(etree.tostring(image_url_set))

这是我遇到的错误：

for image_url_set in image_set.find(".//"+settings.AMAZON_NS+"URL"): TypeError: 'NoneType' object is not iterable

for item in tree.iterfind(".//"+settings.AMAZON_NS+"ImageSet"):
    for image_set in item.find(".//"+settings.AMAZON_NS+"LargeImage"):
        for image_link in image_set.iter(".//"+settings.AMAZON_NS+"URL"):
            print(image_link.text)

用这个什么都没有打印出来。

错误处理 lxml URL提取编程调试 xpath html解析标签提取文本获取

尝试使用lxml获取标签内的文本

2 个回答

输出

撰写回答