Python Beautiful Soup 打印精确的 TD 标签

1 投票

3 回答

732 浏览

提问于 2025-04-18 03:09

我正在尝试使用BS4（Beautiful Soup 4）这个库，想要从下面的例子中提取出确切的TD标签，内容是AUD/AED。我知道可以用一些解析方法，比如[-1]，这样总是能获取到最后一个标签，但在其他数据中，我想要的TD标签可能在中间位置。有没有什么方法可以特别找到AUD/AED这个标签呢？

例子：

<table class="RESULTS" width="100%">
<tr>
<th align="left">Base Currency</th>
<th align="left">Quote Currency</th>
<th align="left">Instrument</th>
<th align="left">Spot Date</th>
</tr>
<tr>
<td>AUD</td>
<td>AED</td>
<td>AUD/AED</td>
<td>Wednesday 23 APR 2014</td>
</tr>
</table>

我用来获取这个的代码：

soup = BeautifulSoup(r)
table = soup.find(attrs={"class": "RESULTS"})
print(table)
days = table.find_all('tr')

这段代码会获取到最后一个TR标签，但我需要找到包含AUD/AED这个TD标签的TR标签。

我在寻找类似这样的东西：

if td[2] == <td>AUD/AED</td>:
    print(tr[-1])

3 个回答

像这样吗？假设 soup 是你的表格。

cellIndex = 0
cells = soup.find_all('td')
while cellIndex < len(cells):
    if cells[cellIndex].text == u'AUD/AED':
        desiredIndex = cellIndex + 1
        break
    cellIndex += 1
if cellIndex != len(cells):
     #desiredIndex was found
     print(cells[desiredIndex].text)
else:
     print("cell not found")

回答于 2025-04-18 由 Python大师

分享举报

我可能会使用lxml和XPath：

from StringIO import StringIO
from lxml import etree

tree = etree.parse(StringIO(table), etree.HTMLParser())
d = tree.xpath("//table[@class='RESULTS']/tr[./td[3][text()='AUD/AED']]/td[4]/text()")[0]

变量d应该包含字符串"Wednesday 23 APR 2014"。

如果你真的想用BeautifulSoup，也可以把lxml和BeautifulSoup结合起来用，没问题。

回答于 2025-04-18 由 Python大师

分享举报

如果你有一个CSS选择器来帮助你，这种事情会简单很多，但看起来在这里我们不能这样做。

下一个最好的办法就是直接找到你想要的标签：

soup.find(class_='RESULTS').find(text='AUD/AED')

然后可以使用bs4的API从那里进行导航。

tr = soup.find(class_='RESULTS').find(text='AUD/AED').parent.parent

import re

tr.find(text=re.compile(r'\w+ \d{1,2} \w+ \d{4}'))
Out[66]: 'Wednesday 23 APR 2014'

这种方法不依赖于标签里面的布局，它只是寻找与AUD/AED标签相邻的，看起来像日期的标签（根据正则表达式的规则）。

回答于 2025-04-18 由 Python大师

分享举报

Python Beautiful Soup 打印精确的 TD 标签

3 个回答

撰写回答