从Pandas数据帧中获取随机样本，但每个值只有一个

sample_fem = pd.DataFrame total = 0 while total <= 31: sample = female_dec.sample(n=1, replace=False) sample = sample.reset_index() if sample["AthleteName"][0] not in sample_fem["AthleteName"]: sample_fem.append(sample) total +=1 File "<ipython-input-561-249bb5b47652>", line 6, in <module> if sample["AthleteName"][0] not in sample_fem["AthleteName"]: TypeError: 'type' object is not subscriptable

1条回答

网友

1楼 · 发布于 2024-05-16 19:04:04

听起来你想要的“随机样本”是：

数据中只出现一次的所有运动员记录
每个运动员在数据中出现两次或两次以上的单个记录，随机选择

为此，首先我们构建一个数据帧，并指出一个记录是否出现了多次

import pandas as pd
import numpy as np


df = pd.DataFrame({'a':[0,1,2,3,4,4,5,6,2]})
df['dup_flag'] = df.duplicated(keep=False)
df
    a   dup_flag
0   0   False
1   1   False
2   2   True
3   3   False
4   4   True
5   4   True
6   5   False
7   6   False
8   2   True

接下来，根据我们创建的flag变量，我们将其分为“uniques”和“dups”

uniques = df.loc[df.dup_flag == False]
dups = df.loc[df.dup_flag == True]

因此，在对dups数据帧使用drop_duplicates之前，只需为索引定义一个随机顺序。然后，我们可以把结果结合起来

random_order_idx = np.random.permutation(np.arange(len(dups)))
random_dups_deduped = dups.iloc[random_order_idx].drop_duplicates()

pd.concat([uniques, random_dups_deduped])
    a   dup_flag
0   0   False
1   1   False
3   3   False
6   5   False
7   6   False
5   4   True
2   2   True

相关问题更多 >

编程相关推荐

热门问题

热门文章