此代码只在范围内刮取一个匹配项,我需要代码在每个范围内循环,刮取指定的数据,将其添加到一个df,然后继续,直到在范围内完成为止
出于某种原因,代码在一个循环处停止,而没有按预期继续
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
colnames = ["x", "y", "xg", "team"]
df = pd.DataFrame(index=colnames)
for id in range(16376,16379):
understat = f'https://understat.com/match/{id}'
res = requests.get(understat)
# parsing the webpage, use .content
soup = BeautifulSoup(res.content, "lxml")
scripts = soup.find_all('script')
# get only shots data, and strip data so we only have json data
strings = scripts[1].string
index_start = strings.index("('") + 2
index_end = strings.index("')")
json_data = strings[index_start:index_end]
json_data = json_data.encode('utf8').decode('unicode_escape')
data1 = json.loads(json_data)
x = []
y = []
xg = []
team = []
data_home = data1["h"]
data_away = data1["a"]
for index in range(len(data_home)):
for key in data_home[index]:
if key == "X":
x.append(data_home[index][key])
if key == "Y":
y.append(data_home[index][key])
if key == "xG":
xg.append(data_home[index][key])
if key == "h_team":
team.append(data_home[index][key])
df_h = (x,y,xg,team)
for index in range(len(data_away)):
for key in data_away[index]:
if key == "X":
x.append(data_away[index][key])
if key == "Y":
y.append(data_away[index][key])
if key == "xG":
xg.append(data_away[index][key])
if key == "a_team":
team.append(data_away[index][key])
df_a = (x, y, xg, team)
continue
# create the df
colnames = ["x", "y", "xg", "team"]
df = pd.DataFrame([x, y, xg, team], index=colnames)
df = df.T
我认为问题在于您重写了列表
x,y,xg,team
,并且只使用最后一次迭代来创建df如果您的最终目标是创建一个包含所有迭代数据的数据框,那么您应该尝试这些步骤
步骤:
list_of_dfs = []
list_of_dfs.append(df)
final_df = pd.concat(list_of_dfs, axis=1)
完整代码应如下所示:
相关问题 更多 >
编程相关推荐