保留前N个出现次数

2 投票

2 回答

1514 浏览

提问于 2025-04-18 09:29

下面的代码会（当然）只保留按‘日期’排序后，‘Item1’的第一次出现。有没有什么建议可以让我保留，比如前5次出现呢？

## Sort the dataframe by Date and keep only the earliest appearance of 'Item1'
## drop_duplicates considers the column 'Date' and keeps only first occurence

coocdates = data.sort('Date').drop_duplicates(cols=['Item1'])

数据处理重复数据出现次数统计

2 个回答

使用 groupby() 和 nth()：

根据 Pandas 的文档，nth()

如果 n 是一个整数，它会从每个组中取出第 n 行；如果 n 是一个整数列表，它会取出这些行的子集。

所以你只需要：

df.groupby('Date').nth([0,1,2,3,4]).reset_index(drop=False, inplace=True)

回答于 2025-04-18 由 Python大师

分享举报

你想要使用 head 方法，可以直接在数据框（dataframe）上使用，或者在分组（groupby）的时候使用：

In [11]: df = pd.DataFrame([[1, 2], [1, 4], [1, 6], [2, 8]], columns=['A', 'B'])

In [12]: df
Out[12]:
   A  B
0  1  2
1  1  4
2  1  6
3  2  8

In [13]: df.head(2)  # the first two rows
Out[13]:
   A  B
0  1  2
1  1  4

In [14]: df.groupby('A').head(2)  # the first two rows in each group
Out[14]:
   A  B
0  1  2
1  1  4
3  2  8

注意：在0.14版本中，groupby的head方法的行为发生了变化（之前它并不是像过滤器那样工作，而是修改了索引），所以如果你使用的是早期版本，记得要重置索引。

回答于 2025-04-18 由 Python大师

分享举报

保留前N个出现次数

2 个回答

撰写回答