如果网站有多个表，如何刮取特定的表？

#Options for Chrome Driver (Selenium) options = webdriver.ChromeOptions() driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Program Files\Anaconda\chromedriver\chromedriver.exe') driver.get("https://www.cmegroup.com/trading/interest-rates/cleared-otc.html") current_page = driver.page_source #Grab all the information from website HTML soup = BeautifulSoup(current_page, 'html.parser') tbl = soup.find("div", {"id": "table20"})

1条回答

网友

1楼 · 发布于 2024-05-23 17:45:22

嗯，我看没有理由在这种情况下使用selenium，因为它会减慢您的任务

网站加载了JavaScript事件，该事件在页面加载后动态呈现其数据

requests库将无法动态渲染JavaScript。因此，您可以使用selenium或requests_html。事实上，有很多模块可以做到这一点

现在，我们在表上有另一个选项，用于跟踪数据的呈现位置。我能够找到用于从back-end{}检索数据的XHR请求，并将其呈现给用户端

You can get the XHR request by open Developer-Tools and check Network and check XHR/JS requests made depending of the type of call such as fetch

import requests
import pandas as pd


r = requests.get("https://www.cmegroup.com/CmeWS/mvc/xsltTransformer.do?xlstDoc=/XSLT/md/irs_settlement_TOTALS.xsl&url=/md/Clearing/IRS?date=03/20/2020&exchange=XCME")
df = pd.read_html(r.content, header=0)[1][:-1]

df.iloc[:, :5].to_csv("data.csv", index=False)

输出：view-online

输出样本：

相关问题更多 >

编程相关推荐

热门问题

热门文章