基于多列字符串匹配的数据帧行选择方法

d = {'id':["1", "2", "3", "4"], 'title': ["Horses are good", "Cats are bad", "Frogs are nice", "Turkeys are the best"], 'description':["Horse epitome", "Cats bad but horses good", "Frog fancier", "Turkey tome, not about horses"], 'tags':["horse, cat, frog, turkey", "horse, cat, frog, turkey", "horse, cat, frog, turkey", "horse, cat, frog, turkey"], 'date':["2019-01-01", "2019-10-01", "2018-08-14", "2016-11-29"]} dataframe = pandas.DataFrame(d)

id title description tag date 1 "Horses are good" "Horse epitome" "horse, cat" 2019-01-01 2 "Cats are bad" "Cats bad" "horse, cat" 2019-10-01 3 "Frogs are nice" "Frog fancier, horses good" "horse, frog" 2018-08-14 4 "Turkey are best" "Turkey tome" "turkey, horse" 2016-11-29

id title description tag date 1 "Horses are good" "Horse epitome" "horse, cat" 2019-01-01 3 "Frogs are nice" "Frog fancier, horses good" "horse, frog" 2018-08-14

2条回答

网友

1楼 · 编辑于 2024-06-01 00:41:42

可以对与每列对应的序列使用“逻辑或”运算符|：

filtered = df[df['title'].str.contains('horse', case=False) | 
              df['description'].str.contains('horse', case=False)]

如果有许多列，可以使用reduce操作：

import functools
import operator

colnames = ['title', 'description']
mask = functools.reduce(operator.or_, (df[col].str.contains('horse', case=False) for col in colnames))
filtered = df[mask]

网友

2楼 · 编辑于 2024-06-01 00:41:42

如果要为测试指定列，一种可能的解决方案是连接所有列，然后使用^{}和case=False进行测试：

s = dataframe['title'] + dataframe['description']
df = dataframe[s.str.contains('horse', case=False)]

或者为每列创建条件，并通过OR与|按位链接：

df = dataframe[dataframe['title'].str.contains('horse', case=False) | 
               dataframe['description'].str.contains('horse', case=False)]

另外，如果要为具有逐位AND的not test chain解决方案指定列，并通过~为NOT MATCH反转条件：

df = dataframe[s.str.contains('horse', case=False) &
               ~dataframe['tags'].str.contains('horse', case=False)]

对于第二种解决方案，在所有以OR链接的列周围添加()：

df = dataframe[(dataframe['title'].str.contains('horse', case=False) | 
               dataframe['description'].str.contains('horse', case=False)) &
              ~dataframe['tags'].str.contains('horse', case=False)]]

编辑：

就像@WeNYoBen评论的那样，您可以在preventSettingWithCopyWarning的末尾添加^{}，比如：

s = dataframe['title'] + dataframe['description']
df = dataframe[s.str.contains('horse', case=False)].copy()

相关问题更多 >

编程相关推荐

热门问题

热门文章