将一定数量的变量从一个组添加到另一个组

2024-06-11 06:50:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个pandas数据框,在这个数据框中,我将相同的objecttype分为若干组(例如,3)。例如,组ball_1包含来自相同类型的3个唯一对象:soccerbasketbouncy。其余的对象进入组ball_2,在本例中,该组只有1个对象tennis

对于包含少于3个唯一对象的组,我想用第一个组的前k个唯一对象填充它们。例如,组ball_2将填充tennis,然后填充组ball_1中的soccerbasket。因此,目标是使所有组具有相同数量的唯一对象

# chunk into groups of 3
N = 3
g = df.groupby('type')['object'].transform(lambda x: pd.factorize(x)[0]) // N + 1
df['group'] = df['type'].str.cat(g.astype(str), '_')

# identify which groups need more objects
for name, batch in df.groupby(['group']):
    subset = df[df.group.isin([name])]
    batch = batch.assign(check = subset['object'].nunique() < 3)
    batch = batch.assign(need = 3 - subset['object'].nunique())
    needmore = batch.loc[batch['check'] == True]
    if needmore.empty:
          continue 
    print('{} needs {} more objects'.format(batch['group'].unique(), batch['need'].unique()))

当前df(此玩具数据集具有选定列,但实际数据集具有更多列)

     type  object  index    group
0    ball  soccer      1   ball_1
1    ball  soccer      2   ball_1
2    ball  basket      1   ball_1
3    ball  bouncy      1   ball_1
4    ball  tennis      1   ball_2
5    ball  tennis      2   ball_2
6   chair  office      1  chair_1
7   chair  office      2  chair_1
8   chair  office      3  chair_1
9   chair  lounge      1  chair_1
10  chair  dining      1  chair_1
... ...    ...         ......

所需的df(已将对象添加到组ball_2

     type  object  index    group
0    ball  soccer      1   ball_1
1    ball  soccer      2   ball_1
2    ball  basket      1   ball_1
3    ball  bouncy      1   ball_1
4    ball  tennis      1   ball_2
5    ball  tennis      2   ball_2
6    ball  soccer      1   ball_2
7    ball  soccer      2   ball_2
8    ball  basket      1   ball_2
9    chair office      1  chair_1
10   chair office      2  chair_1
11   chair office      3  chair_1
12   chair lounge      1  chair_1
13   chair dining      1  chair_1
... ...    ...         ......


Tags: 数据对象dfobjecttypebatchgroupneed
1条回答
网友
1楼 · 发布于 2024-06-11 06:50:42

你可以试试这个:

def addfisrtgroup(x):
    missing=np.arange(3-x.nunique().object)
    typegroup=x.iloc[0,0]
    msk=np.isin(df.loc[df.group.eq(f'{typegroup}_1')].object.factorize()[0],missing)
    return pd.concat([x,df.loc[df.group.eq(f'{typegroup}_1')][msk]])


temp=df.groupby('group')
       .apply(lambda x: addfirstgroup(x) if x.nunique().object<3 else x)
       .drop(columns='group')


groups=temp.index.get_level_values(0).to_frame().reset_index(drop=True)

pd.concat([temp.reset_index(drop=True), groups],1)

输出:

     type  object  index    group
0    ball  soccer      1   ball_1
1    ball  soccer      2   ball_1
2    ball  basket      1   ball_1
3    ball  bouncy      1   ball_1
4    ball  tennis      1   ball_2
5    ball  tennis      2   ball_2
6    ball  soccer      1   ball_2
7    ball  soccer      2   ball_2
8    ball  basket      1   ball_2
9   chair  office      1  chair_1
10  chair  office      2  chair_1
11  chair  office      3  chair_1
12  chair  lounge      1  chair_1
13  chair  dining      1  chair_1

相关问题 更多 >