如何在Python中使用BeautifulSoup从HTML页面提取表内容？

from bs4 import BeautifulSoup import urllib import csv import requests page_link = 'https://repo.vse.gmu.edu/ait/AIT580/580books.html' page_response = requests.get(page_link, timeout=5) page_content = BeautifulSoup(page_response.content, "html.parser") print(page_content.prettify()) page_content.ul

3条回答

网友

1楼 · 编辑于 2024-04-20 10:18:13

尽管我认为KunduKs answer使用pandas提供了一个优雅的解决方案，但我还是想给您提供另一种方法，因为您明确地询问了如何从当前代码开始（使用csv模块和BeautifulSoup）。你知道吗

from bs4 import BeautifulSoup
import csv
import requests

new_file = '/path/to/new/file.csv'
page_link = 'https://repo.vse.gmu.edu/ait/AIT580/580books.html'
page_response = requests.get(page_link, timeout=5)
page_content = BeautifulSoup(page_response.content, "html.parser")
table = page_content.find('table')

for i,tr in enumerate(table.findAll('tr')):
    row = []
    for td in tr.findAll('td'):
        row.append(td.text)
    if i == 0: # write header
        with open(new_file, 'w') as f:
            writer = csv.DictWriter(f, row)
            writer.writeheader() # header
    else:
        with open(new_file, 'a') as f:
            writer = csv.writer(f)
            writer.writerow(row)

如您所见，我们首先获取整个表，然后首先遍历tr元素，然后遍历td元素。在第一轮迭代（tr）中，我们使用这些信息作为csv文件的头。随后，我们将所有信息作为行写入csv文件。你知道吗

网友

2楼 · 编辑于 2024-04-20 10:18:13

使用list comprehensions的稍微干净的方法：

import csv
import requests
from bs4 import BeautifulSoup

page_link = 'https://repo.vse.gmu.edu/ait/AIT580/580books.html'

page_response = requests.get(page_link)
page_content = BeautifulSoup(page_response.content, "html.parser")

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    for items in page_content.find('table').find_all('tr'):
        data = [item.get_text(strip=True) for item in items.find_all(['th','td'])]
        print(data)
        writer.writerow(data)

网友

3楼 · 编辑于 2024-04-20 10:18:13

可以使用python库将数据导入csv。这是最简单的方法。你知道吗

import pandas as pd
tables=pd.read_html("https://repo.vse.gmu.edu/ait/AIT580/580books.html")
tables[0].to_csv("output.csv",index=False)

安装熊猫只需使用

pip install pandas

相关问题更多 >

编程相关推荐

热门问题

热门文章