如何通过组合现有列中的数据来创建新列？

id Price Service Rater Name Cleanliness 401013357 5 3 A 1 401014972 2 1 A 5 401022510 3 4 B 2 401022510 5 1 C 9 401022510 3 1 D 4 401022510 2 2 E 2

Here's the code and data I'm actually using: import requests from pandas import DataFrame import pandas as pd linesinfo_url = 'https://api.collegefootballdata.com/lines?year=2018&seasonType=regular' linesresp = requests.get(linesinfo_url) dflines = DataFrame(linesresp.json()) #nesteddata in lines like game info #setting game ID as index dflines.set_index('id', inplace=True) a = linesresp.json() #defining a as the response to our get request for this data, in JSON format buf = [] #i believe this creates a receptacle for nested data I'm extracting from json for game in a: for line in game['lines']: game_dict = dict(id=game['id']) for cat in ('provider', 'spread','formattedSpread', 'overUnder'): game_dict[cat] = line[cat] buf.append(game_dict) dflinestable = pd.DataFrame(buf) dflinestable.set_index(['id', 'provider'])

formattedSpread overUnder spread id provider 401013357 consensus UMass -21 68.0 -21.0 401014972 consensus Rice -22.5 58.5 -22.5 401022510 Caesars Colorado State -17.5 57.5 -17.5 consensus Colorado State -17 57.5 -17.0 numberfire Colorado State -17 58.5 -17.0 teamrankings Colorado State -17 58.0 -17.0 401013437 numberfire Wyoming -5 47.0 5.0 teamrankings Wyoming -5 47.0 5.0 401020671 consensus Ball State -19.5 61.5 -19.5 401019470 Caesars UCF -22.5 NaN 22.5 consensus UCF -22.5 NaN 22.5 numberfire UCF -24 70.0 24.0 teamrankings UCF -24 70.0 24.0 401013328 numberfire Minnesota -21.5 47.0 -21.5 teamrankings Minnesota -21.5 49.0 -21.5

formattedSpread 0 UMass -21 1 Rice -22.5 2 Colorado State -17.5 3 Colorado State -17 4 Colorado State -17 5 Colorado State -17 6 Wyoming -5 7 Wyoming -5 8 Ball State -19.5 9 UCF -22.5 10 UCF -22.5 11 UCF -24 12 UCF -24

1条回答

网友

1楼 · 发布于 2024-04-28 07:07:30

*已编辑，基于已编辑的问题*

假设您的数据帧是df：

df = df.set_index(['id', 'Rater Name']) # Make it a Multi Index
df_unstacked = df.unstack()

您编辑的代码的问题是，您没有将dflinestable.set_index(['id', 'provider'])赋值给任何东西。因此，当您使用dflinestable.unstack()时，您就是在拆散原始的dflinestable。你知道吗

所以你的整个代码应该是：

import requests
import pandas as pd


linesinfo_url = 'https://api.collegefootballdata.com/lines?year=2018&seasonType=regular'
linesresp = requests.get(linesinfo_url)

dflines = pd.DataFrame(linesresp.json())
#nesteddata in lines like game info
#setting game ID as index
dflines.set_index('id', inplace=True)

a = linesresp.json()
#defining a as the response to our get request for this data, in JSON format 
buf = []
#i believe this creates a receptacle for nested data I'm extracting from json
for game in a:
    for line in game['lines']:
        game_dict = dict(id=game['id'])
        for cat in ('provider', 'spread','formattedSpread', 'overUnder'):
           game_dict[cat] = line[cat]
        buf.append(game_dict)

dflinestable = pd.DataFrame(buf)
dflinestable.set_index(['id', 'provider'], inplace=True) # Add inplace=True
dflinestable_unstacked = dflinestable.unstack() # unstack (you could also reassign to the same df

# Flatten columns to single level, in the order as described
dflinestable_unstacked.columns = [f'{j}_{i}' for i, j in dflinestable_unstacked.columns]

这将为您提供一个数据帧，如（缩写）：

          Caesars_formattedSpread  ... teamrankings_spread
id                                 ...                    
401012246             Alabama -24  ...               -23.5
401012247            Arkansas -34  ...                 NaN
401012248               Auburn -1  ...                -1.5
401012249                     NaN  ...                 NaN
401012250             Georgia -44  ...                 NaN

相关问题更多 >

编程相关推荐

热门问题

热门文章