如何刮取aspx文件

2024-04-29 06:56:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从这个aspx文件(法国网站)收集数据: https://www.statsf1.com/fr/2021/emilie-romagne/tour-par-tour.aspx

对于每一圈,我想知道是否部署了安全车,这意味着,根据HTML代码,对于每个<tr class="lap">如果有<td title="Safety Car" class="numlap sc">1</td>,如果有,则收集列表中的圈数

这是我尝试过的代码,但是laps_变量仍然是空的

import requests
from bs4 import BeautifulSoup

html_data=requests.get('https://www.statsf1.com/fr/2021/emilie-romagne/tour-par-tour.aspx')
soup=BeautifulSoup(html_data.content)
laps_=soup.find_all('td',title_='Safety Car')

附言:我试过看Python - Download a file from aspx form,但没能做得更好


Tags: 代码httpscomtitlewwwfrclasstd
1条回答
网友
1楼 · 发布于 2024-04-29 06:56:13

您需要将user-agent添加到请求头中,然后您将得到HTML。此外,还可以使用pandas来解析表

例如:

import pandas as pd
import requests

url = "https://www.statsf1.com/fr/2021/emilie-romagne/tour-par-tour.aspx"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36"
}

tables = pd.read_html(requests.get(url, headers=headers).text, flavor="bs4")
print(tables[0].head(10))

输出:

  Unnamed: 0 HAM1 PER2 VER3 LEC4 GAS5  ... ALO15 RAI16 GIO17 MSC18 MAZ19 TSU20
0          1  VER  HAM  LEC  PER  RIC  ...   TSU   MSC   ALO   VET   MAZ   NaN
1          2  VER  HAM  LEC  PER  RIC  ...   MSC   ALO   VET   MAZ   OCO   NaN
2          3  VER  HAM  LEC  PER  RIC  ...   MSC   ALO   MAZ   OCO   VET   NaN
3          4  VER  HAM  LEC  PER  RIC  ...   ALO   MAZ   OCO   VET   MSC   NaN
4          5  VER  HAM  LEC  PER  RIC  ...   ALO   MAZ   OCO   VET   MSC   NaN
5          6  VER  HAM  LEC  PER  RIC  ...   ALO   MAZ   OCO   VET   MSC   NaN
6          7  VER  HAM  LEC  PER  RIC  ...   ALO   OCO   VET   MAZ   MSC   NaN
7          8  VER  HAM  LEC  PER  RIC  ...   ALO   OCO   VET   MAZ   MSC   NaN
8          9  VER  HAM  LEC  PER  RIC  ...   ALO   OCO   VET   MAZ   MSC   NaN
9         10  VER  HAM  LEC  PER  RIC  ...   OCO   VET   ALO   MAZ   MSC   NaN

相关问题 更多 >