如何通过组合现有列中的数据来创建新列?

2024-04-28 07:07:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据集,包括5列:

id     Price    Service Rater Name  Cleanliness
401013357   5   3   A   1
401014972   2   1   A   5
401022510   3   4   B   2
401022510   5   1   C   9
401022510   3   1   D   4
401022510   2   2   E   2

我希望每个ID只有一行。因此,我需要为每个评分员的姓名和评分类别(例如,评分员姓名价格、评分员姓名服务、评分员姓名清洁度)创建列,每个列都有自己的列。非常感谢。你知道吗

我已经研究了groupby,但不知道如何将它们转换成新的列。谢谢您!你知道吗

Here's the code and data I'm actually using:

import requests
from pandas import DataFrame
import pandas as pd


linesinfo_url = 'https://api.collegefootballdata.com/lines?year=2018&seasonType=regular'
linesresp = requests.get(linesinfo_url)

dflines = DataFrame(linesresp.json())
#nesteddata in lines like game info
#setting game ID as index
dflines.set_index('id', inplace=True)

a = linesresp.json()
#defining a as the response to our get request for this data, in JSON format 
buf = []
#i believe this creates a receptacle for nested data I'm extracting from json
for game in a:
    for line in game['lines']:
        game_dict = dict(id=game['id'])
        for cat in ('provider', 'spread','formattedSpread', 'overUnder'):
           game_dict[cat] = line[cat]
        buf.append(game_dict)

dflinestable = pd.DataFrame(buf)
dflinestable.set_index(['id', 'provider'])

从这里,我明白了

                              formattedSpread  overUnder  spread
id        provider                                                
401013357 consensus                   UMass -21       68.0   -21.0
401014972 consensus                  Rice -22.5       58.5   -22.5
401022510 Caesars          Colorado State -17.5       57.5   -17.5
          consensus          Colorado State -17       57.5   -17.0
          numberfire         Colorado State -17       58.5   -17.0
          teamrankings       Colorado State -17       58.0   -17.0
401013437 numberfire                 Wyoming -5       47.0     5.0
          teamrankings               Wyoming -5       47.0     5.0
401020671 consensus            Ball State -19.5       61.5   -19.5
401019470 Caesars                     UCF -22.5        NaN    22.5
          consensus                   UCF -22.5        NaN    22.5
          numberfire                    UCF -24       70.0    24.0
          teamrankings                  UCF -24       70.0    24.0
401013328 numberfire            Minnesota -21.5       47.0   -21.5
          teamrankings          Minnesota -21.5       49.0   -21.5

我要寻找的结果是,4个不同的提供者中的每一个都有三列,因此它是caesars\u formattedSpread,caesars\u overUnder,caesars spread,numberfire\u formattedSpread,numberfire\u overUnder,numberfire\u spread,等等

当我像建议的那样跑的时候,我没有得到我所期望的。相反,我得到:

formattedSpread  0                  UMass -21
                 1                 Rice -22.5
                 2       Colorado State -17.5
                 3         Colorado State -17
                 4         Colorado State -17
                 5         Colorado State -17
                 6                 Wyoming -5
                 7                 Wyoming -5
                 8           Ball State -19.5
                 9                  UCF -22.5
                 10                 UCF -22.5
                 11                   UCF -24
                 12                   UCF -24

Tags: inidgamefor评分dict姓名state
1条回答
网友
1楼 · 发布于 2024-04-28 07:07:30

*已编辑,基于已编辑的问题*

假设您的数据帧是df

df = df.set_index(['id', 'Rater Name']) # Make it a Multi Index
df_unstacked = df.unstack()

您编辑的代码的问题是,您没有将dflinestable.set_index(['id', 'provider'])赋值给任何东西。因此,当您使用dflinestable.unstack()时,您就是在拆散原始的dflinestable。你知道吗

所以你的整个代码应该是:

import requests
import pandas as pd


linesinfo_url = 'https://api.collegefootballdata.com/lines?year=2018&seasonType=regular'
linesresp = requests.get(linesinfo_url)

dflines = pd.DataFrame(linesresp.json())
#nesteddata in lines like game info
#setting game ID as index
dflines.set_index('id', inplace=True)

a = linesresp.json()
#defining a as the response to our get request for this data, in JSON format 
buf = []
#i believe this creates a receptacle for nested data I'm extracting from json
for game in a:
    for line in game['lines']:
        game_dict = dict(id=game['id'])
        for cat in ('provider', 'spread','formattedSpread', 'overUnder'):
           game_dict[cat] = line[cat]
        buf.append(game_dict)

dflinestable = pd.DataFrame(buf)
dflinestable.set_index(['id', 'provider'], inplace=True) # Add inplace=True
dflinestable_unstacked = dflinestable.unstack() # unstack (you could also reassign to the same df

# Flatten columns to single level, in the order as described
dflinestable_unstacked.columns = [f'{j}_{i}' for i, j in dflinestable_unstacked.columns]

这将为您提供一个数据帧,如(缩写):

          Caesars_formattedSpread  ... teamrankings_spread
id                                 ...                    
401012246             Alabama -24  ...               -23.5
401012247            Arkansas -34  ...                 NaN
401012248               Auburn -1  ...                -1.5
401012249                     NaN  ...                 NaN
401012250             Georgia -44  ...                 NaN

相关问题 更多 >