在pandas DataFrame中比较一列的每个元素与第二列的所有元素

0 投票
1 回答
30 浏览
提问于 2025-04-12 12:06

我在pandas中有一个这样的数据表:

df = pd.DataFrame([{'A':'horses','B':'car crash'},
                       {'A':'red cars in street','B':'One horse'},
                       {'A':'Lionel Messi','B':'an octopus in a bag'},
                       {'A':'white octopus in red box','B':'messi'},
                       {'A':'Estudiantes de La Plata','B':''}])

我需要用difflib这个函数来比较每一列A的值和每一列B的值。也就是说,我需要把horsescar crashOne horse等进行比较,然后找出在difflib比较中得分更高的名字。

我不知道该从哪里开始……

1 个回答

0
import pandas as pd
from difflib import get_close_matches

df = pd.DataFrame([{'A':'horses','B':'car crash'},
                       {'A':'red cars in street','B':'One horse'},
                       {'A':'Lionel Messi','B':'an octopus in a bag'},
                       {'A':'white octopus in red box','B':'messi'},
                       {'A':'Estudiantes de La Plata','B':''}])

s = pd.Series([get_close_matches(word, df["B"], n=1, cutoff=0.0)[0] for word in df["A"]])
print(pd.DataFrame({"A": df["A"], "closest_match": s}))
                          A        closest_match
0                    horses            One horse
1        red cars in street            car crash
2              Lionel Messi                messi
3  white octopus in red box  an octopus in a bag
4   Estudiantes de La Plata  an octopus in a bag

返回:

撰写回答