Pandas：从数据帧返回行，其中多个列子集不是z

3条回答

网友

1楼 · 编辑于 2024-04-19 15:35:22

创建了一个包含示例数据的csv文件

样本输入：

ID  a1  a2  a3  a4  a5  a6  a7  a8  a9
1   1   1   1   1   1   1   1   1   1
2   0   0   0   1   0   0   0   1   0
3   0   1   0   0   0   0   1   0   0
4   0   0   0   0   1   0   1   0   1
5   1   1   0   1   1   1   1   0   1
6   0   0   0   0   1   0   0   1   0
7   1   0   1   1   1   0   1   1   1
8   1   1   1   0   1   1   1   0   1
9   0   0   0   1   0   1   0   0   0
10  0   0   1   0   0   0   0   0   0
11  1   0   1   0   1   1   0   1   1
12  1   1   0   1   0   1   1   0   1

import pandas as pd
df = pd.read_csv('check.csv')
df['sumA'] = df.a1+df.a2+df.a3
df['sumB'] = df.a4+df.a5+df.a6
df['sumC'] = df.a7+df.a8+df.a9
new_df = df[(df.sumA>1)&(df.sumB>1)&(df.sumC>1)]
new_df = new_df.drop(['sumA','sumB','sumC'],axis=1)

输出：

    ID  a1  a2  a3  a4  a5  a6  a7  a8  a9
0   1   1   1   1   1   1   1   1   1   1
4   5   1   1   0   1   1   1   1   0   1
6   7   1   0   1   1   1   0   1   1   1
7   8   1   1   1   0   1   1   1   0   1
10  11  1   0   1   0   1   1   0   1   1
11  12  1   1   0   1   0   1   1   0   1

网友

2楼 · 编辑于 2024-04-19 15:35:22

设置

np.random.seed([3, 1415])

df = pd.DataFrame(
    np.random.randint(2, size=(10, 9)),
    columns=[f"col{i + 1}" for i in range(9)]
)

df

   col1  col2  col3  col4  col5  col6  col7  col8  col9
0     0     1     0     1     0     0     1     0     1
1     1     1     1     0     1     1     0     1     0
2     0     0     0     0     0     0     0     0     0
3     1     0     1     1     1     1     0     0     0
4     0     0     1     1     1     1     1     0     1
5     1     1     0     1     1     1     1     1     1
6     1     0     1     0     0     0     1     1     0
7     0     0     0     0     0     1     0     1     0
8     1     0     1     0     1     0     0     1     1
9     1     0     1     0     0     1     0     1     0

解决方案

创建词典

m = {
    **dict.fromkeys(['col1', 'col2', 'col3'], 'A'),
    **dict.fromkeys(['col4', 'col5', 'col6'], 'B'),
    **dict.fromkeys(['col7', 'col8', 'col9'], 'C'),
}

然后groupby基于axis=1

df[df.groupby(m, axis=1).any().all(1)]

   col1  col2  col3  col4  col5  col6  col7  col8  col9
0     0     1     0     1     0     0     1     0     1
1     1     1     1     0     1     1     0     1     0
4     0     0     1     1     1     1     1     0     1
5     1     1     0     1     1     1     1     1     1
8     1     0     1     0     1     0     0     1     1
9     1     0     1     0     0     1     0     1     0

注意那些没有成功的

   col1  col2  col3  col4  col5  col6  col7  col8  col9
2     0     0     0     0     0     0     0     0     0
3     1     0     1     1     1     1     0     0     0
6     1     0     1     0     0     0     1     1     0
7     0     0     0     0     0     1     0     1     0

您也可以有这样的列：

cols = [['col1', 'col2', 'col3'], ['col4', 'col5', 'col6'], ['col7', 'col8', 'col9']]
m = {k: v for v, c in enumerate(cols) for k in c}

执行相同的groupby

网友

3楼 · 编辑于 2024-04-19 15:35:22

请尝试以下操作：

column_groups = [A, B, C]
masks = [(df[cols] != 0).any(axis=1) for cols in column_groups]
full_mask = np.logical_and.reduce(masks)
full_df = df[full_mask]

设置

解决方案

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pandas：从数据帧返回行，其中多个列子集不是z

设置

解决方案

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >