使用请求模块时python中的JSON解码错误

2024-04-26 20:45:58 发布

您现在位置:Python中文网/ 问答频道 /正文

base_url = "https://github.com/statsbomb/open-data/tree/master/data/"

comp_url =  base_url + "matches/{}/{}.json"
match_url = base_url + "events/{}.json"

这是包含数据的链接

我使用了一个函数来解析其中不同类型的数据

def parsing_data(comp_id,season_id):
    matches = requests.get(url= comp_url.format(comp_id,season_id)).json()
    match_ids =  [m['match_id'] for m in matches]

    for id in match_ids:
        events = requests.get(url= match_url.format(id)).json()
        shots = [x for x in events if x['type']['name'] == 'Shot']

        all_events = []
        for s in shots:
            attribute = {
               'Match_ID' : id,
               'Team' : s['possession_team']['name'],
               'Player': s['player']['name'],
               'Minute': s['minute'],
               'X_shot': s['location'][0],
               'Y_shot': s['location'][1],
               'Shot_with': s['body_part']['name'],
               'Outcome': s['outcome']['name']
            }
            all_events.append(attribute)

    return pd.DataFrame(all_events)

但是我得到一个JSONDecodeError:Expecting值:第6行第1列(char 5)当我调用函数时

comp_id = 43
season_id = 3

df = parsing_data(comp_id,season_id)

有人能帮我吗


Tags: 数据nameinidjsonurlfordata
2条回答

需要更改base_url以获取原始Json内容,而且Shot_withOutcome中存在两个错误

此脚本:

import requests
import pandas as pd


# changed the base_url to get raw content:
base_url = "https://raw.githubusercontent.com/statsbomb/open-data/master/data/"

comp_url =  base_url + "matches/{}/{}.json"
match_url = base_url + "events/{}.json"

def parsing_data(comp_id,season_id):
    url = comp_url.format(comp_id,season_id)
    matches = requests.get(url=url).json()
    match_ids =  [m['match_id'] for m in matches]

    for id in match_ids:
        events = requests.get(url= match_url.format(id)).json()
        shots = [x for x in events if x['type']['name'] == 'Shot']

        all_events = []
        for s in shots:
            attribute = {
               'Match_ID' : id,
               'Team' : s['possession_team']['name'],
               'Player': s['player']['name'],
               'Minute': s['minute'],
               'X_shot': s['location'][0],
               'Y_shot': s['location'][1],
               'Shot_with': s['shot']['body_part']['name'], # <  added 'shot'
               'Outcome': s['shot']['outcome']['name']      # <  added 'shot'
            }
            all_events.append(attribute)

    return pd.DataFrame(all_events)

comp_id = 43
season_id = 3

df = parsing_data(comp_id,season_id)
print(df)

印刷品:

    Match_ID     Team                     Player  Minute  X_shot  Y_shot   Shot_with  Outcome
0       8656  England            Kieran Trippier       4    96.0    43.0  Right Foot     Goal
1       8656  England              Harry Maguire      13   111.0    37.0        Head    Off T
2       8656  Croatia               Ivan Perišić      18    94.0    20.0  Right Foot    Off T
3       8656  Croatia                 Ante Rebić      20    98.0    41.0   Left Foot  Blocked
4       8656  Croatia               Ivan Perišić      22    87.0    26.0  Right Foot    Off T
5       8656  Croatia                 Ante Rebić      31   101.0    50.0   Left Foot    Saved
6       8656  England              Jesse Lingard      35   102.0    41.0  Right Foot    Off T
7       8656  England  Raheem Shaquille Sterling      36   104.0    52.0   Left Foot  Blocked
8       8656  Croatia              Šime Vrsaljko      42    88.0    51.0  Right Foot    Off T
9       8656  England              Jesse Lingard      55    96.0    45.0   Left Foot  Blocked
10      8656  Croatia               Ivan Rakitić      60    97.0    34.0   Left Foot    Off T
11      8656  Croatia               Ivan Perišić      64   103.0    41.0  Right Foot  Blocked
12      8656  England                 Harry Kane      66   118.0    56.0  Right Foot    Off T
13      8656  Croatia               Ivan Perišić      67   114.0    40.0   Left Foot     Goal
14      8656  Croatia               Ivan Perišić      71   112.0    30.0   Left Foot     Post
15      8656  Croatia                 Ante Rebić      71   111.0    44.0   Left Foot    Saved
16      8656  Croatia           Marcelo Brozović      72    98.0    48.0  Right Foot    Off T
17      8656  England              Jesse Lingard      76   115.0    55.0  Right Foot  Wayward
18      8656  England     Jordan Brian Henderson      77    95.0    45.0  Right Foot    Off T
19      8656  Croatia            Mario Mandžukić      82   113.0    52.0  Right Foot    Saved
20      8656  Croatia               Ivan Perišić      83   113.0    24.0  Right Foot    Off T
21      8656  Croatia               Dejan Lovren      89    89.0    57.0  Right Foot    Off T
22      8656  England                 Harry Kane      91   113.0    33.0        Head    Off T
23      8656  England                  Eric Dier      97    92.0    51.0  Right Foot  Blocked
24      8656  England                John Stones      98   113.0    49.0        Head  Blocked
25      8656  Croatia            Andrej Kramarić     101   106.0    58.0   Left Foot  Blocked
26      8656  Croatia            Andrej Kramarić     105   101.0    34.0   Left Foot  Blocked
27      8656  Croatia            Mario Mandžukić     106   114.0    39.0  Right Foot    Saved
28      8656  Croatia           Marcelo Brozović     107   111.0    27.0   Left Foot    Off T
29      8656  Croatia            Mario Mandžukić     108   114.0    33.0   Left Foot     Goal
30      8656  Croatia               Ivan Perišić     113   107.0    32.0   Left Foot  Blocked
31      8656  Croatia           Marcelo Brozović     115    97.0    22.0  Right Foot    Saved
32      8656  Croatia            Andrej Kramarić     119   109.0    52.0  Right Foot    Off T

您已经获取了GitHub链接,您必须获取GitHub文件原始数据的链接,如

https://raw.githubusercontent.com/statsbomb/open-data/master/data/

另一件事是,您必须使用requests.get(url="").content来检索数据。
还有一个是数据主体部分&结果镜头中

可以使用json.loads(string)将其转换为JSON对象

然后您可以将代码编写为

import requests
import pandas as pd
import json

base_url = "https://raw.githubusercontent.com/statsbomb/open-data/master/data/"

comp_url =  base_url + "matches/{}/{}.json"
match_url = base_url + "events/{}.json"

def parsing_data(comp_id,season_id):
    matches = json.loads(requests.get(url=comp_url.format(comp_id,season_id)).content)
    match_ids =  [m['match_id'] for m in matches]

    for id in match_ids:
        events = requests.get(url= match_url.format(id)).json()
        shots = [x for x in events if x['type']['name'] == 'Shot']

        all_events = []
        for s in shots:
            attribute = {
               'Match_ID' : id,
               'Team' : s['possession_team']['name'],
               'Player': s['player']['name'],
               'Minute': s['minute'],
               'X_shot': s['location'][0],
               'Y_shot': s['location'][1],
               'Shot_with': s['body_part']['name'],
               'Outcome': s['outcome']['name']
            }
            all_events.append(attribute)

    # return pd.DataFrame(all_events)

comp_id = 43
season_id = 3

df = parsing_data(comp_id,season_id)

多谢各位

相关问题 更多 >