pandas dataframe str.contains()和操作

2024-04-29 15:58:48 发布

您现在位置:Python中文网/ 问答频道 /正文

df(Pandas Dataframe)有三行。

some_col_name
"apple is delicious"
"banana is delicious"
"apple and banana both are delicious"

df.col_name.str.contains("apple|banana")

将捕获所有行:

"apple is delicious",
"banana is delicious",
"apple and banana both are delicious".

如何在str.contains方法上应用AND运算符,以便它只获取同时包含apple&banana的字符串?

"apple and banana both are delicious"

我想抓住包含10-20个不同单词的字符串(葡萄、西瓜、浆果、橘子等等)


Tags: and字符串nameappledataframepandasdfis
3条回答

也可以使用regex表达式样式执行此操作:

df[df['col_name'].str.contains(r'^(?=.*apple)(?=.*banana)')]

然后,您可以将单词列表构建为一个regex字符串,如下所示:

base = r'^{}'
expr = '(?=.*{})'
words = ['apple', 'banana', 'cat']  # example
base.format(''.join(expr.format(w) for w in words))

将呈现:

'^(?=.*apple)(?=.*banana)(?=.*cat)'

然后你可以动态地做你的事情。

你可以这样做:

df[(df['col_name'].str.contains('apple')) & (df['col_name'].str.contains('banana'))]
df = pd.DataFrame({'col': ["apple is delicious",
                           "banana is delicious",
                           "apple and banana both are delicious"]})

targets = ['apple', 'banana']

# Any word from `targets` are present in sentence.
>>> df.col.apply(lambda sentence: any(word in sentence for word in targets))
0    True
1    True
2    True
Name: col, dtype: bool

# All words from `targets` are present in sentence.
>>> df.col.apply(lambda sentence: all(word in sentence for word in targets))
0    False
1    False
2     True
Name: col, dtype: bool

相关问题 更多 >