计算条件下的重复次数

2024-04-25 23:54:09 发布

您现在位置:Python中文网/ 问答频道 /正文

您好首先,我向您展示我的数据:

^{tb1}$

知道这些数据是可以重复的(这是我的情况所需要的,我想保留它们)

我想做的是一个专栏,告诉我每个人有多少个孩子(显然没有孩子的人会有na,但这没关系),所以它看起来像这样:

^{tb2}$

我怎样才能继续

警告!!!我没有在我的示例中提到这一点,但在我的数据中,这些行重复出现,我不必删除它们——想象一下Sara的另一行包含相同的信息


Tags: 数据信息警告示例情况孩子na想象
2条回答

如果原始数据存储在数据帧(df)中,则可以使用:

df = df.replace('Na', pd.NA)
df['children'] = df['registered'].map(df.drop_duplicates(subset='registered')['Daughter of'].value_counts().to_dict())

您可以创建一个按每个“已注册”的“子项数”分组的辅助数据框,以便以后将其与原始数据框合并。其内容如下:

import pandas as pd

# Setting the data
all_rows = [["7D","Sara","8A"],
            ["8A","Rosa","Na"],
            ["4D","Jess","8A"],
            ["6B","Veronica","Na"],
            ["8L","Sophia","6B"],
            ["7N","iria","Na"],
            ["7D","Sara","8A"],
            ["8A","Rosa","Na"]] 

df = pd.DataFrame(all_rows, columns=["registered","name","daughter_of"])

# Df aux
df_grouped = df.drop_duplicates().groupby(["daughter_of"])["daughter_of"].count().reset_index(name="children")

# Renaming columns so the join is made correctly
df_grouped.columns = ["registered", "children"]

# Joining
df = pd.merge(df,df_grouped[df_grouped["registered"]!="Na"],on=["registered"],how='left')

这是我收到的输出

  registered      name daughter_of  children
0         7D      Sara          8A       NaN
1         8A      Rosa          Na       2.0
2         4D      Jess          8A       NaN
3         6B  Veronica          Na       1.0
4         8L    Sophia          6B       NaN
5         7N      iria          Na       NaN
6         7D      Sara          8A       NaN
7         8A      Rosa          Na       2.0

“已注册”字段只考虑一次行数

相关问题 更多 >