如何在Python中查找列表中字符串之间的相似性

2024-04-20 12:18:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我比较Python中的两个dataframe列,目的是为第一列的每个元素找到第二列的最佳匹配。第一列包含19000行,我需要检查其中的每个字符串,第二列的最佳匹配是什么。因此,需要检查19000行,每行检查19000次,考虑到字符串本身必须是另一个,而不是相同的。在

我从一个简单的比较开始,在列表中找到一个字符串,我成功了。然后我将它应用到一个列表中,只是为了比较两者,但是很明显,由于比较字符串和列表,会出现错误“TypeError:expected string or bytes like object”。最后,我尝试创建一个循环,但错误是相同的。有没有办法创建一个具有预期结果的列表?也许有更好的方法用另一个图书馆来做,但是,到目前为止,我什么也没找到。以下是目前的代码:

#simple example
from fuzzywuzzy import process
string = "appl"
compare = ["adfad.","apple","asple","tab"]
Ratios = process.extract(string,compare)
print(Ratios)
[('apple', 89), ('asple', 67), ('tab', 29), ('adfad.', 22)]

highest = process.extractOne(string,compare)
print(highest)
('apple', 89)

#data frame
from fuzzywuzzy import process
dataframecolumn = ["appl","tb"]
compare = ["adfad.","apple","asple","tab"]
Ratios = process.extract(dataframecolumn,compare)
TypeError: expected string or bytes-like object

#expected (but I need a list)
highest = process.extractOne(dataframecolumn[0],compare)
print(highest)
('apple', 89)
highest = process.extractOne(dataframecolumn[1],compare)
print(highest)
('tab', 80)

#Result expected
results = ["apple, 89","tab, 80"]

#Error
myl = ["appl","tb"]
compare = ["adfad.","apple","asple","tab"]
results = []
for x in myl:
    results.append(process.extractOne(myl,compare)[1])
TypeError: expected string or bytes-like object


Tags: 字符串apple列表stringprocesstabcompareexpected
1条回答
网友
1楼 · 发布于 2024-04-20 12:18:39
from operator import itemgetter 

dataframecolumn = ["appl","tb"]
compare = ["adfad.","apple","asple","tab"]
Ratios = [process.extract(x,compare) for x in dataframecolumn]
print ([max(ratios, key = itemgetter(1)) for ratios in Ratios])

# Or oneliner
#Ratios = [max(process.extract(x,compare),key = itemgetter(1)) for x in dataframecolumn]

如果extract总是返回排序结果,那么我们可以避免调用max

^{pr2}$

输出:

[('apple', 89), ('tab', 80)]

如果你想跳过精确的匹配,只得到模糊匹配,那么跳过分数为100%的匹配,并得到第一个非100%匹配,因为它已经排序。在

dataframecolumn = ["apple","tb"]
compare = ["adfad","apple","asple","tab"]
Ratios = [process.extract(x,compare) for x in dataframecolumn]
result = list()
for ratio in Ratios:
    for match in ratio:
        if match[1] != 100:
            result.append(match)
            break
print (result) 

相关问题 更多 >