Python网页抓取。从表中获取内容

2024-04-28 02:34:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从一张方格图表格中获取内容。我是一个新手,所以我可能会犯一些错误

网址:https://www.fangraphs.com/standings/playoff-odds

看看网站中的元素,我可以看到有一些叫做“季后赛赔率表”的表格。这一切似乎都被卷进了 id="content".

到目前为止,我的代码是:

`url = 'https://www.fangraphs.com/standings/playoff-odds'  
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
soup.find("div", {"id": "content"})`

输出仅为:

<div class="playoff-odds-page" id="content"><h1>MLB Playoff Odds</h1><div id="root"></div>

很明显,我在这里遗漏了一些重要的东西,我很想学习如何将表格内容拉进去

谢谢你的帮助/建议


Tags: httpsdivcomidurl内容wwwpage
2条回答

Vin的答案是正确的,但我要补充一点,我可能会使用json_normalize将其转换为一个表,以获得更好的输出,您可以进行排序、筛选等:

import json 
import requests
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

from pandas.io.json import json_normalize

def scrap_playoff_odds():

    dateEnd = '2020-07-29'
    dateDelta = ''
    projectionMode = 2
    standingsType = 'div'
    
    url = 'https://www.fangraphs.com/api/playoff-odds/odds?dateEnd=' + str(dateEnd) + '&dateDelta=' + str(dateDelta) + '&projectionMode=' + str(projectionMode) + '&standingsType=' + str(standingsType)
    session = requests.Session()
    response = session.get(url,verify=False)
    result = json.loads(response.text)
    
    df = json_normalize(result)
    
    cols = ['shortName','GB','L','W','WCGB','Wpct','division','league',
            'endData.ExpL','endData.ExpW','endData.csWin','endData.div2Title',
            'endData.divTitle','endData.dsWin','endData.poffTitle',
            'endData.rosW','endData.sos','endData.wcTitle','endData.wcWin',
            'endData.wsWin']
    
    
    df = df[cols]
    print (df.to_string())

输出:

print (df.to_string())
       shortName    GB  L  W  WCGB               Wpct division league endData.ExpL endData.ExpW endData.csWin   endData.div2Title endData.divTitle endData.dsWin   endData.poffTitle       endData.rosW endData.sos endData.wcTitle endData.wcWin endData.wsWin
0         Angels    -1  4  2    -2  0.333333333333333        W     AL      30.6122      29.3878     0.0457791   0.220615580677986        0.0969181      0.103138   0.495029680430889  0.507181485493978    0.505889        0.177496      0.226515     0.0217596
1        Orioles    -1  2  2    -1                0.5        E     AL      36.9089      22.0911   0.000339993  0.0056998860090971      0.000759985    0.00151997  0.0246794705162756  0.365292739868164    0.516855       0.0182196    0.00655987   7.99984E-05
2        Red Sox    -2  4  2    -2  0.333333333333333        E     AL       30.118       29.882     0.0535189   0.178836420178413        0.0705986      0.117218   0.537369027733803  0.516333332768193    0.499204        0.287934      0.252895     0.0254195
3      White Sox  -2.5  4  2    -2  0.333333333333333        C     AL      29.7695      30.2305     0.0471791   0.225135490298271         0.102058      0.114598   0.581128500401974  0.522787023473669    0.482352        0.253935      0.260355     0.0198996
4        Indians  -0.5  2  4     0  0.666666666666667        C     AL      26.3158      33.6842      0.100678   0.366512656211853         0.364553      0.215716   0.875722661614418  0.549707412719727    0.480981        0.144657      0.440991      0.051059
5         Tigers  -0.5  2  4     0  0.666666666666667        C     AL      33.8049      26.1951    0.00385992  0.0592988133430481        0.0161197     0.0152397   0.180256512016058  0.411020384894477     0.50463        0.104838     0.0570389    0.00125997
6         Royals  -2.5  4  2    -2  0.333333333333333        C     AL      34.3729      25.6271    0.00465991  0.0449991002678871        0.0114198     0.0163997   0.142657202668488  0.437538888719347    0.501574       0.0862383      0.050619    0.00139997
7          Twins     0  1  4     0                0.8        C     AL      25.2144      34.7856      0.116978    0.30405393242836          0.50585      0.241535    0.92080195248127  0.559738159179688    0.478055        0.110898       0.48493     0.0602788
8        Yankees     0  1  3     0               0.75        E     AL      24.8138      35.1862      0.164377   0.343113124370575         0.469091      0.294734   0.933022119104862  0.574753556932722    0.486429        0.120818      0.525629     0.0944981
9      Athletics     0  3  3    -1                0.5        W     AL      27.6891      32.3109     0.0960381   0.368252635002136         0.295094      0.199316   0.788903653621674  0.542794474848994    0.491593        0.125557      0.396992      0.049459
10      Mariners    -1  4  2    -2  0.333333333333333        W     AL      36.3763      23.6237    0.00129997  0.0245595090091228       0.00469991     0.0048799  0.0600388199090958  0.400438873856156       0.515       0.0307794     0.0179596   0.000239995
11          Rays     0  2  4     0  0.666666666666667        E     AL       25.089       34.911      0.149837   0.373012542724609         0.428471      0.274774   0.928520545363426  0.572425912927698    0.481148        0.127037       0.50753     0.0854183
12       Rangers  -0.5  3  2  -1.5                0.4        W     AL      32.7494      27.2506     0.0144797   0.117417648434639        0.0412392     0.0404192   0.276534844189882  0.459101832996715      0.5076        0.117878      0.107138    0.00525989
13     Blue Jays    -1  3  3    -1                0.5        E     AL      31.8524      28.1476     0.0154597  0.0993380099534988        0.0310794     0.0471191   0.343413416296244  0.465696299517596    0.496056        0.212996      0.130857    0.00529989
14  Diamondbacks  -2.5  4  2    -2  0.333333333333333        W     NL      31.7505      28.2495     0.0272995   0.128357440233231        0.0309994     0.0661387   0.336252845823765  0.486101856938115    0.515167        0.176896      0.149117     0.0104798
15        Braves  -0.5  3  3    -1                0.5        E     NL       27.512       32.488      0.104518    0.27667447924614         0.362753      0.210756   0.775884479284287  0.546074054859303     0.49513        0.136457      0.408552      0.047719
16          Cubs     0  2  4     0  0.666666666666667        C     NL      26.5336      33.4664      0.107258   0.261894762516022         0.466751      0.224256   0.844843775033951  0.545674076786748     0.49087        0.116198      0.441051      0.047779
17          Reds    -2  4  2    -2  0.333333333333333        C     NL      29.6682      30.3318     0.0574988   0.233455330133438         0.163757      0.131277   0.570129320025444  0.524662971496582    0.494648        0.172917      0.275314     0.0231195
18       Rockies     0  1  4     0                0.8        W     NL      30.6727      29.3273     0.0263995   0.189116224646568        0.0538989     0.0718386   0.449571132659912  0.460496347600763    0.517909        0.206556      0.182316    0.00819984
19       Marlins     0  1  2     0  0.666666666666667        E     NL      34.3068      24.6932    0.00223996  0.0372792556881905        0.0139997    0.00915982   0.094378056935966  0.405235699244908    0.520411       0.0430991     0.0312594   0.000399992
20        Astros     0  3  3    -1                0.5        W     AL      25.5811      34.4189      0.185516   0.269154608249664         0.562049      0.313394   0.911921977996826  0.581831472891348    0.492167       0.0807184      0.533989      0.112718
21       Dodgers  -0.5  2  4     0  0.666666666666667        W     NL      23.1822      36.8178      0.277714   0.205615893006325         0.708326      0.406472    0.97026077657938  0.607737011379666    0.495463       0.0563189      0.619928      0.165337
22       Brewers    -1  3  3    -1                0.5        C     NL      28.9605      31.0395     0.0648587   0.263134747743607         0.213376      0.143057    0.64656774699688  0.519249986719202    0.498833        0.170057      0.307534     0.0262395
23     Nationals  -1.5  4  2    -2  0.333333333333333        E     NL      29.0866      30.9134     0.0713786   0.250714987516403         0.218776      0.152837    0.63434799015522  0.535433345370822     0.49263        0.164857      0.314754     0.0301194
24          Mets  -0.5  3  3    -1                0.5        E     NL      28.0444      31.9556     0.0878982   0.281094372272491         0.303454      0.185256   0.731425389647484  0.536214828491211    0.497037        0.146877      0.373653     0.0384592
25      Phillies    -1  2  1  -1.5  0.333333333333333        E     NL      31.2683      28.7317     0.0289994   0.154236912727356         0.101018     0.0753785    0.39485190808773  0.486521068372225    0.508316        0.139597      0.175676    0.00931981
26       Pirates    -2  4  2    -2  0.333333333333333        C     NL      35.2019      24.7981    0.00307994  0.0361992754042149        0.0123598     0.0109198  0.0936181750148535   0.42218702810782    0.513259       0.0450591     0.0327593   0.000899982
27     Cardinals  -1.5  3  2  -1.5                0.4        C     NL      30.1584      29.8416     0.0414592   0.205315887928009         0.143757      0.101578   0.514809891581535   0.50621091669256    0.498018        0.165737      0.234415     0.0148797
28        Padres  -0.5  2  4     0  0.666666666666667        W     NL      27.3498      32.6502     0.0949581   0.422831535339355         0.198616      0.194596   0.791184529662132  0.530559257224754    0.502778        0.169737      0.402172     0.0419592
29        Giants  -1.5  3  3    -1                0.5        W     NL      34.0368      25.9632    0.00443991  0.0540789179503918       0.00815984     0.0164797   0.151876960881054  0.425244437323676    0.514944       0.0896382      0.051499    0.00103998

尝试下面的方法。在下面的脚本中,我使用了requests方式和JSON通过执行API调用来获取数据

import json 
import requests
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

def scrap_playoff_odds():

dateEnd = '2020-07-29'
dateDelta = ''
projectionMode = 2
standingsType = 'div'

url = 'https://www.fangraphs.com/api/playoff-odds/odds?dateEnd=' + str(dateEnd) + '&dateDelta=' + str(dateDelta) + '&projectionMode=' + str(projectionMode) + '&standingsType=' + str(standingsType)
session = requests.Session()
response = session.get(url,verify=False)
result = json.loads(response.text)
for team in result:
    print('-' * 100)
    print(team['GB'],
          team['L'],
          team['W'],
          team['WCGB'],
          team['Wpct'],
          team['division'],
          team['league'],
          team['shortName'],
          team['endData']['ExpL'],
          team['endData']['ExpW'],
          team['endData']['csWin'],
          team['endData']['div2Title'],
          team['endData']['divTitle'],
          team['endData']['dsWin'],
          team['endData']['poffTitle'],
          team['endData']['rosW'],
          team['endData']['sos'],
          team['endData']['wcTitle'],
          team['endData']['wcWin'],
          team['endData']['wsWin'])
    print('-' * 100)

  scrap_playoff_odds()
  1. 我已经从网站中提取了API url,并将其传递给url变量,该变量是动态的,您可以将结束日期日期增量放入变量中,它将相应地获取该时间段的数据

  2. 然后脚本使用getAPI方法获取结果,并将其传递给JSON,使其成为一个合适的JSON对象

  3. 最后,为每个团队逐个打印所有列(请参阅屏幕截图)。 Result image

相关问题 更多 >