使用唯一值子集dataframe，并返回每个唯一值的所有行

2条回答

网友

1楼 · 编辑于 2024-06-16 12:59:33

如果哪3个ID不重要，您可以使用 unique_3 = df['ID'].unique[:3]，然后选择带有df_id = df[df["ID"].isin(unique_3)]的行

网友

2楼 · 编辑于 2024-06-16 12:59:33

使用np.random.RandomState和种子进行重复性，使用np.random.choice和replace=False选择不同的元素，然后pd.Series.unique形成候选，使用pd.Series.isin掩盖所需的ID三联体：

def get_unique_id_subset(df, k=3, seed=51):
    id_list = np.random.RandomState(seed).choice(df.ID.unique(), k, replace=False)
    return df[df.ID.isin(id_list)]

使用：

>>> get_unique_id_subset(df)
  Text  ID
0  bla   1
1  blu   1
2  ble   1
3  bli   3
4  bly   3
9  blw   6

>>> get_unique_id_subset(df)  # same result as before
  Text  ID
0  bla   1
1  blu   1
2  ble   1
3  bli   3
4  bly   3
9  blw   6

>>> get_unique_id_subset(df, seed=19)  # changed the seed
  Text  ID
0  bla   1
1  blu   1
2  ble   1
5  bln   2
6  blt   2
7  blk   2
8  blv   2
9  blw   6

>>> get_unique_id_subset(df, seed=19)  # result consistent with the seed
  Text  ID
0  bla   1
1  blu   1
2  ble   1
5  bln   2
6  blt   2
7  blk   2
8  blv   2
9  blw   6

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用唯一值子集dataframe，并返回每个唯一值的所有行

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >