用python beautifulsoup获取NBA高级统计数据

2024-04-26 22:20:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从NBA的高级统计数据中获取。首先,我只想能够刮出球队的名字,我有一个问题,那就是它没有收集任何信息。我可能在find\u all函数中找错了东西。感谢任何帮助!在

import requests from bs4 import BeautifulSoup url = "https://stats.nba.com/teams/elbow-touch/?sort=ELBOW_TOUCHES&dir=-1" result = requests.get(url) c = result.content soup = Beaut ifulSoup(c,"html.parser") title = soup.title.text print(title) teams = soup.find_all('td',{'class':'team'}) for element in teams: print(element.text)

我要抓取的站点:

Site that I want to scrape


Tags: textimporturltitleelementresultallfind
3条回答

站点是动态的,因此您需要使用^{}

from selenium import webdriver
from bs4 import BeautifulSoup as soup 
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://stats.nba.com/teams/elbow-touch/?sort=ELBOW_TOUCHES&dir=-1')
s = soup(d.page_source, 'html.parser').find('table', {'class':'table'})
headers, [_, *data] = [i.text for i in s.find_all('th')], [[i.text for i in b.find_all('td')] for b in s.find_all('tr')]
final_data = [i for i in data if len(i) > 1]

现在,final_data存储所有团队结果:

^{pr2}$

为了得到团队:

teams = [a for a, *_ in final_data]

输出:

['Houston Rockets', 'Milwaukee Bucks', 'New York Knicks', 'Charlotte Hornets', 'Detroit Pistons', 'Washington Wizards', 'Atlanta Hawks', 'Brooklyn Nets', 'San Antonio Spurs', 'Boston Celtics', 'Toronto Raptors', 'Portland Trail Blazers', 'Utah Jazz', 'Minnesota Timberwolves', 'Chicago Bulls', 'LA Clippers', 'Miami Heat', 'New Orleans Pelicans', 'Phoenix Suns', 'Oklahoma City Thunder', 'Dallas Mavericks', 'Golden State Warriors', 'Orlando Magic', 'Los Angeles Lakers', 'Denver Nuggets', 'Indiana Pacers', 'Cleveland Cavaliers', 'Philadelphia 76ers', 'Sacramento Kings', 'Memphis Grizzlies']

要获取特定的统计信息,最简单的方法是通过将头值绑定到数据列表来创建字典列表:

data_attrs = [dict(zip(headers, i)) for i in final_data]
all_touches = [i['Touches'] for i in data_attrs]

另一种方法是向siteapi发送get请求并接收json响应。通过改变参数,你可以得到不同的结果。在

您可以在chrome开发工具下查找浏览器将请求发送到的位置。在

import requests

url = "https://stats.nba.com/stats/leaguedashptstats?"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"
}

params = {
    "PerMode": "PerGame",
    "PlayerOrTeam": "Team",
    "PtMeasureType": "ElbowTouch",
    "Season": "2018-19",
    "SeasonType": "Regular Season",
    "StarterBench": "",
    "PlayerPosition": "",
    "PlayerExperience": "",
    "GameScope": "",
    "VsConference": "",
    "VsDivision": "",
    "DateFrom": "",
    "DateTo": "",
    "SeasonSegment": "",
    "Location": "",
    "Outcome": "",
    "LastNGames": "0",
    "Month": "0",
    "OpponentTeamID": "0"
}

r = requests.get(url, params=params, headers=headers)
data = r.json()
results = data['resultSets'][0]['rowSet']

for result in results:
    print(result)

@Ajax1234答案的变体可以将整个表加载到数据帧中:

import pandas as pd

pd.read_html(str(s))

还有你的桌子。在

相关问题 更多 >