在Python中如何对Levenshtein距离超过80%的单词进行分组

2024-04-20 02:09:12 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个列表:-在

person_name = ['zakesh', 'oldman LLC', 'bikash', 'goldman LLC', 'zikash','rakesh']

我试图以这样的方式对列表进行分组,这样两个字符串之间的Levenshtein distance最大。为了计算两个单词之间的比率,我使用了python包fuzzywuzzy。在

示例:-

^{pr2}$

我的最终目标:

My end goal is to group the words such that Levenshtein distance between them is more than 80 percent?

我的名单应该是这样的:

person_name = ['bikash', 'zikash', 'rakesh', 'zakesh', 'goldman LLC', 'oldman LLC'] because the distance between `bikash` and `zikash` is very high so they should be together.

代码:

我试图通过排序来实现这一点,但是键函数应该是fuzz.ratio。下面的代码不起作用,但我正从这个角度来解决这个问题。在

from fuzzywuzzy import fuzz
combined_list = ['rakesh', 'zakesh', 'bikash', 'zikash', 'goldman LLC', 'oldman LLC']
combined_list.sort(key=lambda x, y: fuzz.ratio(x, y))
print combined_list

Could anyone help me to combine the words so that Levenshtein distance between them is more than 80 percent?


Tags: theisbetweenlistlevenshteindistancecombinedllc
1条回答
网友
1楼 · 发布于 2024-04-20 02:09:12

这将对名称进行分组

from fuzzywuzzy import fuzz

combined_list = ['rakesh', 'zakesh', 'bikash', 'zikash', 'goldman LLC', 'oldman LLC']
combined_list.append('bakesh')
print('input names:', combined_list)

grs = list() # groups of names with distance > 80
for name in combined_list:
    for g in grs:
        if all(fuzz.ratio(name, w) > 80 for w in g):
            g.append(name)
            break
    else:
        grs.append([name, ])

print('output groups:', grs)
outlist = [el for g in grs for el in g]
print('output list:', outlist)

生产

^{2}$

如您所见,这些名称被正确地分组,但顺序可能不是您想要的顺序。在

相关问题 更多 >