如何对相关组应用按键分组

2024-05-23 14:26:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,我用groupby把它们分组如下

Name      Nationality    age
Peter     UK             28
John      US             29 
Wiley     UK             28 
Aster     US             29 

grouped = self_ex_df.groupby([Nationality, age])
  1. 现在,我想针对每个值附加一个唯一的ID

我正在尝试这个,但不确定它是否有效?你知道吗

uniqueID = 'ID_'+ grouped.groups.keys().astype(str)

    uniqueID    Name      Nationality    age
     ID_UK28    Peter       UK             28
     ID_US29    John        US             29 
     ID_UK28    Wiley       UK             28 
     ID_US29    Aster       US             29 
  1. 我现在想把它合并成一个新的DF,类似这样的东西

     uniqueID   Nationality    age   Text
      ID_UK28     UK           28    Peter and Whiley have a combined age of 56
      ID_US_29    US           29    John and Aster have a combined age of 58
    

如何实现上述目标?你知道吗


Tags: nameidagejohnpeterusgroupbyuk
2条回答

您不需要groupby来创建uniqueID,您可以稍后对uniqueID进行groupby,以获得基于年龄和国籍的组。我使用了一个自定义函数来构建文本str。这是一种方法。你知道吗

df1 = df.assign(uniqueID='ID_'+df.Nationality+df.age.astype(str))

def myText(x):
    str = ' and '.join(x.Name)
    str += ' have a combined age of {}.'.format(x.age.sum())
    return str

df2 = df1.groupby(['uniqueID', 'Nationality','age']).apply(myText).reset_index().rename(columns={0:'Text'})
print(df2)

输出:

  uniqueID Nationality  age                                        Text
0  ID_UK28          UK   28  Peter and Wiley have a combined age of 56.
1  ID_US29          US   29   John and Aster have a combined age of 58.

希望足够接近,不能得到平均年龄:

import pandas as pd

#create dataframe
df = pd.DataFrame({'Name': ['Peter', 'John', 'Wiley', 'Aster'], 'Nationality': ['UK', 'US', 'UK', 'US'], 'age': [28, 29, 28, 29]})

#make uniqueID
df['uniqueID'] = 'ID_' + df['Nationality'] + df['age'].astype(str)

#groupby has agg method that can take dict and preform multiple aggregations
df = df.groupby(['uniqueID', 'Nationality']).agg({'age': 'sum', 'Name': lambda x: ' and '.join(x)})

#to get text you just combine new Name and sum of age
df['Text'] = df['Name'] + ' have a combined age of ' + df['age'].astype(str)

相关问题 更多 >