beautifulsoup：如何获取表头中元素的索引

2 投票

1 回答

11814 浏览

提问于 2025-04-18 15:02

我想提取表头中元素的索引，这样我就可以在表格的主体部分使用这个结果来选择合适的列。虽然列的数量会有所不同，但我需要的那些列的标题是固定的。

举个例子，我想知道“第三个”在表头中的索引是[2]，也就是说在这个表头中有‹th›第一个‹/th›、‹th›第二个‹/th›、‹th›第三个‹/th›、‹th›第四个‹/th›和‹th›第五个‹/th›。这样我就可以通过选择‹td›的索引号来有选择性地获取后面行中的相关‹td›。

这是我尝试的代码：

#TRIAL TO GET INDEXES FROM TABLE HEADERS
from bs4 import BeautifulSoup
html = '<table><thead><tr class="myClass"><th>A</th>'
'<th>B</th><th>C</th><th>D</th></tr></thead></table>'
soup = BeautifulSoup(html)

table = soup.find('table')

for hRow in table.find_all('th'):
hRow = hRow.index('A')
print hRow

结果是：

ValueError: Tag.index: element not in tag

有什么想法吗？

错误处理数据提取网页抓取 html解析 beautifulsoup 表格操作列选择元素索引

1 个回答

你可以找到所有的标题，并获取包含特定文本的标题的位置：

from bs4 import BeautifulSoup

html = """
<table>
    <thead>
        <tr class="myClass">
            <th>A</th>
            <th>B</th>
            <th>C</th>
            <th>D</th>
        </tr>
    </thead>
</table>
"""
soup = BeautifulSoup(html)

header_row = soup.select('table > thead > tr.myClass')[0]

headers = header_row.find_all('th')
header = header_row.find('th', text='A')
print headers.index(header)  # prints 0

回答于 2025-04-18 由 Python大师

分享举报

beautifulsoup：如何获取表头中元素的索引

1 个回答

撰写回答