从分组的datafram中获取选定的行

2024-06-16 10:03:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的数据帧,我想过滤组,并采取从左组只选定的行。你知道吗

  state  city  voting_majority_status_fk  other
0     A    A1                          4   True
1     A    A1                          4   True
2     A    A1                          2  False
3     A    A2                          3   True
4     B    B2                          4  False
5     B    B2                          2   True
6     C    C1                          4   True
7     C    C1                          4   True
8     C    C1                          2  False

我想对它进行分组,只从正面分组中选取一行:

我只希望我的最终结果是:

2     A    A1                          2  False
8     C    C1                          2  False

我的代码到现在为止:

columns = ['state', ' city', 'voting_majority_status_fk', 'other']
        data = [['A', 'A1', 4, True],
                ['A', 'A1', 4, True],
                ['A', 'A1', 2, False],
                ['A', 'A2', 3, True],
                ['B', 'B2', 4, False],
                ['B', 'B2', 2, True],
                ['C', 'C1', 4, True],
                ['C', 'C1', 4, True],
                ['C', 'C1', 2, False],
                ['C', 'C3', 2, False]]

        df = pd.DataFrame(data=data, columns=columns)
        grouped_df = df.groupby(['state', ' city'])
        filtered_data = grouped_df.filter(VotingDataFetcher.my_filter)

@staticmethod
    def my_filter(group):
        if 3 in group.voting_majority_status_fk.unique():
            return False
        if 2 not in group.voting_majority_status_fk.unique():
            return False
        if 4 in group.voting_majority_status_fk.unique():
            majority = group[group.voting_majority_status_fk == 4].head(1)
            if not majority.other.tolist()[0]:
                    return False
            else:
                minority = group[group.voting_majority_status_fk == 2]
                tt = minority.head(1) <= I only want those lines.
                return True
        return False

我得到以下输出,我得到整个组,但我只需要从组中选择行。你知道吗

0     A    A1                          4   True
1     A    A1                          4   True
2     A    A1                          2  False <= only this one
6     C    C1                          4   True
7     C    C1                          4   True
8     C    C1                          2  False <= and this one

Tags: falsetruedfdatareturnifa1status
1条回答
网友
1楼 · 发布于 2024-06-16 10:03:04

您需要带有返回tt的自定义函数的apply

def my_filter(group):
    vuniq = group.voting_majority_status_fk.unique()
    if (4 in vuniq) and (2 in vuniq) and not (3 in vuniq):
        majority = group[group.voting_majority_status_fk == 4].head(1)
        if majority.other.tolist()[0]:
            minority = group[group.voting_majority_status_fk == 2]
            tt = minority.head(1) #<= I only want those lines.
            return tt

df = pd.DataFrame(data=data, columns=columns)
grouped_df = df.groupby(['state', ' city'])
filtered_data = grouped_df.apply(my_filter).reset_index(drop=True)
print (filtered_data)
  state  city  voting_majority_status_fk  other
0     A    A1                          2  False
1     C    C1                          2  False

不能使用filter,因为它为每个组返回TrueFalse,并决定是否删除组。你知道吗

您可以通过以下方式进行测试:

filtered_data = grouped_df.apply(my_filter)
print (filtered_data)
state   city
A      A1        True
       A2       False
B      B2       False
C      C1        True
       C3        None
dtype: object

相关问题 更多 >