如何使用BeautifulSoup指定要读取的列

2024-03-29 11:46:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个html文件,里面有一个表。这个表有30列,但我只需要读几列。你知道吗

迄今为止的代码:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("myfile.htm"))
table = soup.find("table", attrs={"class":"myTable"})

# The first tr contains the field names.
headings = [th.get_text() for th in table.find("tr").find_all("th")]

datasets = []
for row in table.find_all("tr")[1:]:
    dataset = zip(headings, (td.get_text() for td in row.find_all("td")))
    datasets.append(dataset)

for dataset in datasets:
    for field in dataset:
        print "{0:<16}: {1}".format(field[0], field[1])  

如何指定要读取的列?你知道吗


Tags: infieldforgettableallfinddataset