编写一个for循环,将数据附加到唯一的DF,并在指定的范围内继续循环

2024-04-28 12:40:59 发布

您现在位置:Python中文网/ 问答频道 /正文

此代码只在范围内刮取一个匹配项,我需要代码在每个范围内循环,刮取指定的数据,将其添加到一个df,然后继续,直到在范围内完成为止

出于某种原因,代码在一个循环处停止,而没有按预期继续

import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

colnames = ["x", "y", "xg", "team"]
df = pd.DataFrame(index=colnames)

for id in range(16376,16379):
    understat = f'https://understat.com/match/{id}'
    res = requests.get(understat)
    # parsing the webpage, use .content
    soup = BeautifulSoup(res.content, "lxml")
    scripts = soup.find_all('script')

    # get only shots data, and strip data so we only have json data

    strings = scripts[1].string
    index_start = strings.index("('") + 2
    index_end = strings.index("')")
    json_data = strings[index_start:index_end]
    json_data = json_data.encode('utf8').decode('unicode_escape')
    data1 = json.loads(json_data)

    x = []
    y = []
    xg = []
    team = []
    data_home = data1["h"]
    data_away = data1["a"]


    for index in range(len(data_home)):
        for key in data_home[index]:
            if key == "X":
                x.append(data_home[index][key])
            if key == "Y":
                y.append(data_home[index][key])
            if key == "xG":
                xg.append(data_home[index][key])
            if key == "h_team":
               team.append(data_home[index][key])
               df_h = (x,y,xg,team)


    for index in range(len(data_away)):
        for key in data_away[index]:
            if key == "X":
                x.append(data_away[index][key])
            if key == "Y":
                y.append(data_away[index][key])
            if key == "xG":
                xg.append(data_away[index][key])
            if key == "a_team":
               team.append(data_away[index][key])
               df_a = (x, y, xg, team)

            continue
    # create the df
colnames = ["x", "y", "xg", "team"]

df = pd.DataFrame([x, y, xg, team], index=colnames)
df = df.T

Tags: keyinimportjsondfhomefordata
1条回答
网友
1楼 · 发布于 2024-04-28 12:40:59

我认为问题在于您重写了列表x,y,xg,team,并且只使用最后一次迭代来创建df

如果您的最终目标是创建一个包含所有迭代数据的数据框,那么您应该尝试这些步骤

步骤:

  1. 在第一个循环之前创建一个列表,以保留所有数据帧list_of_dfs = []
  2. 在每次迭代结束时创建df。只需将df创建插入循环(一个缩进)
  3. 将创建的数据帧追加到数据帧列表中。在df创建之后:list_of_dfs.append(df)
  4. 循环结束后,将列表中的所有数据帧合并到一个数据帧中。您可以使用:final_df = pd.concat(list_of_dfs, axis=1)

完整代码应如下所示:

import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

colnames = ["x", "y", "xg", "team"]
df = pd.DataFrame(index=colnames)
list_of_dfs = []

for id in range(16376,16379):
    understat = f'https://understat.com/match/{id}'
    res = requests.get(understat)
    # parsing the webpage, use .content
    soup = BeautifulSoup(res.content, "lxml")
    scripts = soup.find_all('script')

    # get only shots data, and strip data so we only have json data

    strings = scripts[1].string
    index_start = strings.index("('") + 2
    index_end = strings.index("')")
    json_data = strings[index_start:index_end]
    json_data = json_data.encode('utf8').decode('unicode_escape')
    data1 = json.loads(json_data)

    x = []
    y = []
    xg = []
    team = []
    data_home = data1["h"]
    data_away = data1["a"]


    for index in range(len(data_home)):
        for key in data_home[index]:
            if key == "X":
                x.append(data_home[index][key])
            if key == "Y":
                y.append(data_home[index][key])
            if key == "xG":
                xg.append(data_home[index][key])
            if key == "h_team":
               team.append(data_home[index][key])
               df_h = (x,y,xg,team)


    for index in range(len(data_away)):
        for key in data_away[index]:
            if key == "X":
                x.append(data_away[index][key])
            if key == "Y":
                y.append(data_away[index][key])
            if key == "xG":
                xg.append(data_away[index][key])
            if key == "a_team":
               team.append(data_away[index][key])
               df_a = (x, y, xg, team)

            continue
    # create the df
    colnames = ["x", "y", "xg", "team"]

    df = pd.DataFrame([x, y, xg, team], index=colnames)
    df = df.T
    list_of_dfs.append(df)

final_df = pd.concat(list_of_dfs, axis=0)

相关问题 更多 >