我有一个pandas数据框,在这个数据框中,我将相同的object
的type
分为若干组(例如,3)。例如,组ball_1
包含来自相同类型的3个唯一对象:soccer
、basket
和bouncy
。其余的对象进入组ball_2
,在本例中,该组只有1个对象tennis
对于包含少于3个唯一对象的组,我想用第一个组的前k个唯一对象填充它们。例如,组ball_2
将填充tennis
,然后填充组ball_1
中的soccer
和basket
。因此,目标是使所有组具有相同数量的唯一对象
# chunk into groups of 3
N = 3
g = df.groupby('type')['object'].transform(lambda x: pd.factorize(x)[0]) // N + 1
df['group'] = df['type'].str.cat(g.astype(str), '_')
# identify which groups need more objects
for name, batch in df.groupby(['group']):
subset = df[df.group.isin([name])]
batch = batch.assign(check = subset['object'].nunique() < 3)
batch = batch.assign(need = 3 - subset['object'].nunique())
needmore = batch.loc[batch['check'] == True]
if needmore.empty:
continue
print('{} needs {} more objects'.format(batch['group'].unique(), batch['need'].unique()))
当前df(此玩具数据集具有选定列,但实际数据集具有更多列)
type object index group
0 ball soccer 1 ball_1
1 ball soccer 2 ball_1
2 ball basket 1 ball_1
3 ball bouncy 1 ball_1
4 ball tennis 1 ball_2
5 ball tennis 2 ball_2
6 chair office 1 chair_1
7 chair office 2 chair_1
8 chair office 3 chair_1
9 chair lounge 1 chair_1
10 chair dining 1 chair_1
... ... ... ......
所需的df(已将对象添加到组ball_2
)
type object index group
0 ball soccer 1 ball_1
1 ball soccer 2 ball_1
2 ball basket 1 ball_1
3 ball bouncy 1 ball_1
4 ball tennis 1 ball_2
5 ball tennis 2 ball_2
6 ball soccer 1 ball_2
7 ball soccer 2 ball_2
8 ball basket 1 ball_2
9 chair office 1 chair_1
10 chair office 2 chair_1
11 chair office 3 chair_1
12 chair lounge 1 chair_1
13 chair dining 1 chair_1
... ... ... ......
你可以试试这个:
输出:
相关问题 更多 >
编程相关推荐