我比较Python中的两个dataframe列,目的是为第一列的每个元素找到第二列的最佳匹配。第一列包含19000行,我需要检查其中的每个字符串,第二列的最佳匹配是什么。因此,需要检查19000行,每行检查19000次,考虑到字符串本身必须是另一个,而不是相同的。在
我从一个简单的比较开始,在列表中找到一个字符串,我成功了。然后我将它应用到一个列表中,只是为了比较两者,但是很明显,由于比较字符串和列表,会出现错误“TypeError:expected string or bytes like object”。最后,我尝试创建一个循环,但错误是相同的。有没有办法创建一个具有预期结果的列表?也许有更好的方法用另一个图书馆来做,但是,到目前为止,我什么也没找到。以下是目前的代码:
#simple example
from fuzzywuzzy import process
string = "appl"
compare = ["adfad.","apple","asple","tab"]
Ratios = process.extract(string,compare)
print(Ratios)
[('apple', 89), ('asple', 67), ('tab', 29), ('adfad.', 22)]
highest = process.extractOne(string,compare)
print(highest)
('apple', 89)
#data frame
from fuzzywuzzy import process
dataframecolumn = ["appl","tb"]
compare = ["adfad.","apple","asple","tab"]
Ratios = process.extract(dataframecolumn,compare)
TypeError: expected string or bytes-like object
#expected (but I need a list)
highest = process.extractOne(dataframecolumn[0],compare)
print(highest)
('apple', 89)
highest = process.extractOne(dataframecolumn[1],compare)
print(highest)
('tab', 80)
#Result expected
results = ["apple, 89","tab, 80"]
#Error
myl = ["appl","tb"]
compare = ["adfad.","apple","asple","tab"]
results = []
for x in myl:
results.append(process.extractOne(myl,compare)[1])
TypeError: expected string or bytes-like object
如果
^{pr2}$extract
总是返回排序结果,那么我们可以避免调用max
输出:
[('apple', 89), ('tab', 80)]
如果你想跳过精确的匹配,只得到模糊匹配,那么跳过分数为100%的匹配,并得到第一个非100%匹配,因为它已经排序。在
相关问题 更多 >
编程相关推荐