如何更新计数大于x的值

2条回答

网友

1楼 · 编辑于 2024-04-20 11:18:31

我怀疑有一种更有效的方法可以做到这一点，但简单的方法是构建一个计数dict，然后在这些值低于计数阈值时进行修剪。以df为例：

df= pd.DataFrame([12,11,4,15,6,12,4,7],columns=['foo'])

    foo
0   12
1   11
2   4
3   15
4   6
5   12
6   4
7   7

# make a dict with counts
count_dict = {d:(df['foo']==d).sum() for d in df.foo.unique()}
# assign that dict to a column
df['bar'] = [count_dict[d] for d in df.foo]
# loc in the 'pruned' tag
df.loc[df.bar < 2, 'foo']='pruned'

按需返回：

    foo bar
0   12      2
1   pruned  1
2   4       2
3   pruned  1
4   pruned  1
5   12      2
6   4       2
7   pruned  1

（当然，如果需要，您可以将2改为5并转储bar列）。你知道吗

更新

对于每个就地版本的请求，这里有一个一行程序，它可以在不指定另一列或直接创建dict的情况下完成（感谢@trumonaminima提供的values_count()提示）：

df= pd.DataFrame([12,11,4,15,6,12,4,7],columns=['foo'])
print(df)
df.foo = df.foo.apply(lambda row: 'pruned' if (df.foo.value_counts() < 2)[row] else row)
print(df)

根据需要再次返回：

网友
2楼 · 编辑于 2024-04-20 11:18:31

这是我最后使用的基于上述答案的解决方案。你知道吗
import pandas as pd df= pd.DataFrame([12,11,4,15,6,12,4,7],columns=['foo']) # make a dict with counts count_dict = dict(df.foo.value_counts()) # assign that dict to a column df['temp_count'] = [count_dict[d] for d in df.foo] # loc in the 'pruned' tag df.loc[df.temp_count < 2, 'foo']='pruned' df = df.drop(["temp_count"], axis=1)

相关问题更多 >

编程相关推荐

热门问题

热门文章