从pandas datafram中选择包含特定值的行

2024-06-06 00:16:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个pandas数据框,它的条目都是字符串:

   A     B      C
1 apple  banana pear
2 pear   pear   apple
3 banana pear   pear
4 apple  apple  pear

等等,我想选择包含某个字符串的所有行,比如“banana”。我不知道它每次都会出现在哪个栏目。当然,我可以编写for循环并遍历所有行。但有没有一种更简单或更快的方法来做到这一点呢?


Tags: 数据方法字符串applepandasfor条目banana
3条回答

对于单个搜索值

df[df.values  == "banana"]

或者

 df[df.isin(['banana'])]

对于多个搜索词:

  df[(df.values  == "banana")|(df.values  == "apple" ) ]

或者

df[df.isin(['banana', "apple"])]

  #         A       B      C
  #  1   apple  banana    NaN
  #  2     NaN     NaN  apple
  #  3  banana     NaN    NaN
  #  4   apple   apple    NaN

从Divakar:返回两个都有的行。

select_rows(df,['apple','banana'])

 #         A       B     C
 #   0  apple  banana  pear

使用NumPy,它可以被矢量化来搜索任意多的字符串,就像-

def select_rows(df,search_strings):
    unq,IDs = np.unique(df,return_inverse=True)
    unqIDs = np.searchsorted(unq,search_strings)
    return df[((IDs.reshape(df.shape) == unqIDs[:,None,None]).any(-1)).all(0)]

样本运行-

In [393]: df
Out[393]: 
        A       B      C
0   apple  banana   pear
1    pear    pear  apple
2  banana    pear   pear
3   apple   apple   pear

In [394]: select_rows(df,['apple','banana'])
Out[394]: 
       A       B     C
0  apple  banana  pear

In [395]: select_rows(df,['apple','pear'])
Out[395]: 
       A       B      C
0  apple  banana   pear
1   pear    pear  apple
3  apple   apple   pear

In [396]: select_rows(df,['apple','banana','pear'])
Out[396]: 
       A       B     C
0  apple  banana  pear

您可以通过将整个df与字符串进行比较来创建布尔掩码,并调用dropna传递参数how='all'来删除字符串未出现在所有列中的行:

In [59]:
df[df == 'banana'].dropna(how='all')

Out[59]:
        A       B    C
1     NaN  banana  NaN
3  banana     NaN  NaN

要测试多个值,可以使用多个掩码:

In [90]:
banana = df[(df=='banana')].dropna(how='all')
banana

Out[90]:
        A       B    C
1     NaN  banana  NaN
3  banana     NaN  NaN

In [91]:    
apple = df[(df=='apple')].dropna(how='all')
apple

Out[91]:
       A      B      C
1  apple    NaN    NaN
2    NaN    NaN  apple
4  apple  apple    NaN

您可以使用index.intersection仅索引公共索引值:

In [93]:
df.loc[apple.index.intersection(banana.index)]

Out[93]:
       A       B     C
1  apple  banana  pear

相关问题 更多 >