用“div”刮桌子

import bs4 import requests res = requests.get('https://www.nascar.com/results/race_center/2018/monster-energy-nascar-cup-series/auto-club-400/stn/race/') soup = bs4.BeautifulSoup(res.text, 'lxml') soup.select('.nrwgt-lbh .practiceDataTable') for i in soup.select('.nrwgt-lbh .practiceDataTable .table-row'): print(i.text)

2条回答

网友

1楼 · 编辑于 2024-04-28 06:43:56

对来自urllib.urlopen对象的源的检查表明该站点是动态的，因为找不到具有table-row类的更新的div对象。因此，您需要使用浏览器操作工具，如selenium：

from bs4 import BeautifulSoup as soup
import re
import urllib
from selenium import webdriver
d = webdriver.Chrome()
classes = ['position', 'chase', 'car-number', 'driver', 'manufacturer', 'start-position not-mobile', 'laps not-mobile', 'laps-led not-mobile', 'final-status', 'points not-mobile', 'bonus not-mobile']
d.get('https://www.nascar.com/results/race_center/2018/monster-energy-nascar-cup-series/auto-club-400/stn/race/')
new_data = [filter(None, [b.text for b in i.find_all('div', {'class':re.compile('|'.join(classes))})]) for i in soup(d.page_source, 'lxml').find_all('div', {'class':'table-row'})]

输出：

^{pr2}$

编辑：要安装selenium，请运行pip install selenium，然后为您的浏览器安装适当的绑定：

Chrome驱动程序：https://sites.google.com/a/chromium.org/chromedriver/downloads

Firefox驱动程序：https://github.com/mozilla/geckodriver/releases

然后，要运行代码，请创建一个具有与所选浏览器对应的类名的驱动程序对象，并将路径传递给驱动程序：

d = webdriver.Firefox("/path/to/driver")

或者

d = webdriver.Chrome("/path/to/driver")

编辑

将数据写入csv：

import csv
write = csv.writer(open('nascarDrivers.csv', 'w'))
write.writerows(new_data) #new_data is the list of lists containing the table data

网友

2楼 · 编辑于 2024-04-28 06:43:56

如果要从每个表行中获取文本，可以执行以下操作：

import bs4
import requests

res = requests.get('https://www.nascar.com/results/race_center/2018/monster-energy-nascar-cup-series/auto-club-400/stn/race/')

soup = bs4.BeautifulSoup(res.text, 'lxml')
tds = soup.find_all('div', class_='table-row')
for td in tds:
    print(td.text)

相关问题更多 >

编程相关推荐

热门问题

热门文章