我有一个数据集,包括5列:
id Price Service Rater Name Cleanliness
401013357 5 3 A 1
401014972 2 1 A 5
401022510 3 4 B 2
401022510 5 1 C 9
401022510 3 1 D 4
401022510 2 2 E 2
我希望每个ID只有一行。因此,我需要为每个评分员的姓名和评分类别(例如,评分员姓名价格、评分员姓名服务、评分员姓名清洁度)创建列,每个列都有自己的列。非常感谢。你知道吗
我已经研究了groupby,但不知道如何将它们转换成新的列。谢谢您!你知道吗
Here's the code and data I'm actually using:
import requests
from pandas import DataFrame
import pandas as pd
linesinfo_url = 'https://api.collegefootballdata.com/lines?year=2018&seasonType=regular'
linesresp = requests.get(linesinfo_url)
dflines = DataFrame(linesresp.json())
#nesteddata in lines like game info
#setting game ID as index
dflines.set_index('id', inplace=True)
a = linesresp.json()
#defining a as the response to our get request for this data, in JSON format
buf = []
#i believe this creates a receptacle for nested data I'm extracting from json
for game in a:
for line in game['lines']:
game_dict = dict(id=game['id'])
for cat in ('provider', 'spread','formattedSpread', 'overUnder'):
game_dict[cat] = line[cat]
buf.append(game_dict)
dflinestable = pd.DataFrame(buf)
dflinestable.set_index(['id', 'provider'])
从这里,我明白了
formattedSpread overUnder spread
id provider
401013357 consensus UMass -21 68.0 -21.0
401014972 consensus Rice -22.5 58.5 -22.5
401022510 Caesars Colorado State -17.5 57.5 -17.5
consensus Colorado State -17 57.5 -17.0
numberfire Colorado State -17 58.5 -17.0
teamrankings Colorado State -17 58.0 -17.0
401013437 numberfire Wyoming -5 47.0 5.0
teamrankings Wyoming -5 47.0 5.0
401020671 consensus Ball State -19.5 61.5 -19.5
401019470 Caesars UCF -22.5 NaN 22.5
consensus UCF -22.5 NaN 22.5
numberfire UCF -24 70.0 24.0
teamrankings UCF -24 70.0 24.0
401013328 numberfire Minnesota -21.5 47.0 -21.5
teamrankings Minnesota -21.5 49.0 -21.5
我要寻找的结果是,4个不同的提供者中的每一个都有三列,因此它是caesars\u formattedSpread,caesars\u overUnder,caesars spread,numberfire\u formattedSpread,numberfire\u overUnder,numberfire\u spread,等等
当我像建议的那样跑的时候,我没有得到我所期望的。相反,我得到:
formattedSpread 0 UMass -21
1 Rice -22.5
2 Colorado State -17.5
3 Colorado State -17
4 Colorado State -17
5 Colorado State -17
6 Wyoming -5
7 Wyoming -5
8 Ball State -19.5
9 UCF -22.5
10 UCF -22.5
11 UCF -24
12 UCF -24
*已编辑,基于已编辑的问题*
假设您的数据帧是
df
:您编辑的代码的问题是,您没有将
dflinestable.set_index(['id', 'provider'])
赋值给任何东西。因此,当您使用dflinestable.unstack()
时,您就是在拆散原始的dflinestable
。你知道吗所以你的整个代码应该是:
这将为您提供一个数据帧,如(缩写):
相关问题 更多 >
编程相关推荐