如何在pandas中实现多列布尔搜索

68 投票

5 回答

146457 浏览

提问于 2025-04-17 23:12

我有一个 pandas 数据框，想要做一些类似于 SQL 的操作：

SELECT * FROM df WHERE column1 = 'a' OR column2 = 'b' OR column3 = 'c' etc.

现在，这个方法可以处理一个列和一个值的组合：

foo = df.loc[df['column']==value]

不过，我不太确定怎么把这个方法扩展到多个列和多个值的组合。

为了更清楚，每一列都对应一个不同的值。

pandas 数据框 SQL操作布尔搜索多列查询

5 个回答

这个 query() 方法用起来非常简单直观。你只需要把条件写成一个字符串，就像下面这个例子：

df = df.query("columnNameA <= @x or columnNameB == @y")

在这里，x 和 y 是你已经声明的变量，你可以用 @ 符号来引用它们。

回答于 2025-04-17 由 Python大师

分享举报

2014年，@EdChum 提出的所有观点现在依然有效，但从 pandas 版本 0.0.20 开始，pandas.Dataframe.ix 方法已经不再推荐使用。根据官方文档的说明：

警告：从 0.20.0 版本开始，.ix 索引器不再推荐使用，取而代之的是更严格的 .iloc 和 .loc 索引器。

在后续的 pandas 版本中，这个方法被新的索引方法所替代，分别是pandas.Dataframe.loc 和 pandas.Dataframe.iloc。

如果你想了解更多，可以在这篇帖子中找到对上述方法的比较。

总的来说，到目前为止（而且在即将发布的 pandas 版本中似乎也没有变化），这个问题的答案如下：

foo = df.loc[(df['column1']==value) | (df['columns2'] == 'b') | (df['column3'] == 'c')]

回答于 2025-04-17 由 Python大师

分享举报

一种更简洁的方法——虽然不一定更快——是使用 DataFrame.isin() 和 DataFrame.any()。

In [27]: n = 10

In [28]: df = DataFrame(randint(4, size=(n, 2)), columns=list('ab'))

In [29]: df
Out[29]:
   a  b
0  0  0
1  1  1
2  1  1
3  2  3
4  2  3
5  0  2
6  1  2
7  3  0
8  1  1
9  2  2

[10 rows x 2 columns]

In [30]: df.isin([1, 2])
Out[30]:
       a      b
0  False  False
1   True   True
2   True   True
3   True  False
4   True  False
5  False   True
6   True   True
7  False  False
8   True   True
9   True   True

[10 rows x 2 columns]

In [31]: df.isin([1, 2]).any(1)
Out[31]:
0    False
1     True
2     True
3     True
4     True
5     True
6     True
7    False
8     True
9     True
dtype: bool

In [32]: df.loc[df.isin([1, 2]).any(1)]
Out[32]:
   a  b
1  1  1
2  1  1
3  2  3
4  2  3
5  0  2
6  1  2
8  1  1
9  2  2

[8 rows x 2 columns]

回答于 2025-04-17 由 Python大师

分享举报

115

你需要把多个条件用大括号括起来，这是因为运算符的优先级问题，同时要使用按位与（&）和按位或（|）运算符：

foo = df[(df['column1']==value) | (df['columns2'] == 'b') | (df['column3'] == 'c')]

如果你使用 and 或 or，那么 pandas 可能会抱怨说比较不明确。在这种情况下，不清楚我们是否在比较条件中的每一个值，以及如果只有一个值匹配条件或者所有值都匹配条件，这意味着什么。这就是为什么你应该使用按位运算符，或者使用 numpy 的 np.all 或 np.any 来明确匹配的标准。

还有一种方法叫做查询（query）：http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.query.html

不过这个方法有一些限制，主要是因为可能会出现列名和索引值之间的混淆。

回答于 2025-04-17 由 Python大师

分享举报

-1

最简单的方法

如果觉得有帮助，请点个赞！谢谢！！

students = [ ('jack1', 'Apples1' , 341) ,
             ('Riti1', 'Mangos1'  , 311) ,
             ('Aadi1', 'Grapes1' , 301) ,
             ('Sonia1', 'Apples1', 321) ,
             ('Lucy1', 'Mangos1'  , 331) ,
             ('Mike1', 'Apples1' , 351),
              ('Mik', 'Apples1' , np.nan)
              ]
#Create a DataFrame object
df = pd.DataFrame(students, columns = ['Name1' , 'Product1', 'Sale1']) 
print(df)


    Name1 Product1  Sale1
0   jack1  Apples1    341
1   Riti1  Mangos1    311
2   Aadi1  Grapes1    301
3  Sonia1  Apples1    321
4   Lucy1  Mangos1    331
5   Mike1  Apples1    351
6     Mik  Apples1    NaN

# Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’,
subset = df[df['Product1'] == 'Apples1']
print(subset)

 Name1 Product1  Sale1
0   jack1  Apples1    341
3  Sonia1  Apples1    321
5   Mike1  Apples1    351
6     Mik  Apples1    NA

# Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’, AND notnull value in Sale

subsetx= df[(df['Product1'] == "Apples1")  & (df['Sale1'].notnull())]
print(subsetx)
    Name1   Product1    Sale1
0   jack1   Apples1      341
3   Sonia1  Apples1      321
5   Mike1   Apples1      351

# Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’, AND Sale = 351

subsetx= df[(df['Product1'] == "Apples1")  & (df['Sale1'] == 351)]
print(subsetx)

   Name1 Product1  Sale1
5  Mike1  Apples1    351

# Another example
subsetData = df[df['Product1'].isin(['Mangos1', 'Grapes1']) ]
print(subsetData)

Name1 Product1  Sale1
1  Riti1  Mangos1    311
2  Aadi1  Grapes1    301
4  Lucy1  Mangos1    331

这是我找到的原始链接。我稍微编辑了一下 -- https://thispointer.com/python-pandas-select-rows-in-dataframe-by-conditions-on-multiple-columns/

回答于 2025-04-17 由 Python大师

分享举报

如何在pandas中实现多列布尔搜索

5 个回答

最简单的方法

如果觉得有帮助，请点个赞！谢谢！！

撰写回答