Python从lis中搜索数据帧中的字符串

网友

1楼 · 编辑于 2024-04-26 13:22:37

一种方法是

def get_word(my_string):
    for word in search_list:
         if word.lower() in my_string.lower():
               return word
    return None

new_df["c"]= new_df["b"].apply(get_word)

你也可以按照

^{pr2}$

对于第一个，您可以选择先将列c添加到df，然后过滤掉{}，而如果{}不包含任何单词，第二个将抛出错误。在

你也可以看到这个问题：Get the first item from an iterable that matches a condition

从最高分的答案中运用这个方法

new_df["c"]= new_df["b"].apply(lambda my_string: next(word for word in search_list if word.lower() in my_string.lower())

网友

2楼 · 编辑于 2024-04-26 13:22:37

您可以使用extract并过滤掉那些nan（即不匹配）：

search_list = ['STEEL','IRON','GOLD','SILVER']

df['c'] = df.b.str.extract('({0})'.format('|'.join(search_list)), flags=re.IGNORECASE)
result = df[~pd.isna(df.c)]

print(result)

输出

^{pr2}$

请注意，您必须导入re模块才能使用re.IGNORECASE标志。作为替代，您可以直接使用2，这是re.IGNORECASE标志的值。在

更新

如@user3483203所述，您可以使用以下方法保存导入：

df['c'] = df.b.str.extract('(?i)({0})'.format('|'.join(search_list)))

网友

3楼 · 编辑于 2024-04-26 13:22:37

您可以使用set.intersection查找列b中出现的单词：

search_list = set(['STEEL','IRON','GOLD','SILVER'])
df['c'] = df['b'].apply(lambda x: set.intersection(set(x.upper().split(' ')), search_list))

输出：

^{pr2}$

如果要删除没有匹配项的行，请使用df[df['c'].astype(bool)]

     a                b        c
0  123  Blah Blah Steel  {STEEL}
2  789   Blah Blah Gold   {GOLD}

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python从lis中搜索数据帧中的字符串

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >