Python + XPath：是否可以选择我实际想要的元素之后的下一个元素？

2 投票

1 回答

1683 浏览

提问于 2025-04-17 02:33

假设我有这样的内容：

<span class="filesize">File<a href="http://example.com/image.jpg" 
target="_blank">image.jpg</a>-(1.61 MB, 1000x1542, <span title="what the file is actually 
called.jpg">what the file is actually called.jpg</span>)</span><br><a href="http://example.com
/image.jpg" target="_blank">

我想从中提取的是 http://example.com/image.jpg 和 what the file is actually called.jpg。这里有一个固定的部分，就是 <span class="filesize">File，我可以通过 xpath("span[text()='File']") 找到它，但这只让我访问到这个 span 标签。有没有办法像 result += 1 这样，接着去获取后面的链接，然后再获取后面的 span 标签里的文件名呢？

数据提取 xpath 网页解析元素选择 HTML结构

1 个回答

你可以使用 following-sibling 和 preceding-sibling 这两个 xpath "轴" 来进行你需要的导航。想了解更多细节，可以点击这里。

编辑：

这里有一个例子，使用纯 xpath 就能得到你想要的结果。不过，这个方法可能不适合你，具体要看你周围的 XML 是什么样的。（我还得补全一些标签，使其成为“真实”的 XML。你也许可以通过将你的 XML 解析器设置为 HTML 模式来实现，不用这样做。）

import lxml.etree

xml = lxml.etree.XML("""<something><span class="filesize">File<a href="http://example.com/image.jpg" target="_blank">image.jpg</a>-(1.61 MB, 1000x1542, <span title="what the file is actually called.jpg">what the file is actually called.jpg</span>)</span><br/><a href="http://example.com/image.jpg" target="_blank"></a></something>""",)

print xml.xpath("a[preceding-sibling::span/text()='File']/@href")

回答于 2025-04-17 由 Python大师

分享举报

Python + XPath：是否可以选择我实际想要的元素之后的下一个元素？

1 个回答

撰写回答