在Python数据框中查找匹配的相似关键字

网友

1楼 · 编辑于 2024-06-16 14:25:39

from difflib import get_close_matches 

def closeMatches(patterns, word): 
     print(get_close_matches(word, patterns)) 

 list_patterns = joined_Gravity1[joined_Gravity1["Comments"].str.contains("ender", na=False)]

 word = 'Sender'
 patterns = list_patterns
 closeMatches(patterns, word)

网友

2楼 · 编辑于 2024-06-16 14:25:39

我看不出regex=True函数中的contains在这里不起作用的原因

joined_Gravity1[joined_Gravity1["Comments"].str.contains(pat="ender|snder|bndr", na=False, regex=True)]

我只使用了"ender|snder|bnder"。您可以列出所有这些单词，比如list_words，并在上面的contains函数中传入pat='|'.join(list_words)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html

网友

3楼 · 编辑于 2024-06-16 14:25:39

这类单词中的字母组合可能会出现大量的可能性。您试图做的是两个字符串之间的模糊匹配。我建议使用以下方法：

#!pip install fuzzywuzzy
from fuzzywuzzy import fuzz, process

word = 'sender'
others = ['bnder', 'snder', 'sender', 'hello']

process.extractBests(word, others)

[('sender', 100), ('snder', 91), ('bnder', 73), ('hello', 18)]

基于此，您可以决定选择哪个阈值，然后将高于阈值的阈值标记为匹配（使用上面使用的代码）

这里有一个方法可以在你的问题陈述中用一个函数做到这一点-

df = pd.DataFrame(['hi there i am a sender', 
                   'I dont wanna be a bnder', 
                   'can i be the snder?', 
                   'i think i am a nerd'], columns=['text'])

#s = sentence, w = match word, t = match threshold
def get_match(s,w,t):
    ss = process.extractBests(w,s.split())
    return any([i[1]>t for i in ss])

#What its doing - Match each word in each row in df.text with 
#the word sender and see of any of the words have a match greater 
#than threshold ratio 70.
df['match'] = df['text'].apply(get_match, w='sender', t=70)
print(df)

                      text  match
0   hi there i am a sender   True
1  I dont wanna be a bnder   True
2      can i be the snder?   True
3      i think i am a nerd  False

t如果想要更精确的匹配，请将t值从70调整到80；如果想要更轻松的匹配，请将t值从70调整到80

最后你可以过滤掉-

df[df['match']==True][['text']]

                      text
0   hi there i am a sender
1  I dont wanna be a bnder
2      can i be the snder?

相关问题更多 >

编程相关推荐

热门问题

热门文章

在Python数据框中查找匹配的相似关键字

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >