在pandas DataFrame中比较一列的每个元素与第二列的所有元素

0 投票

1 回答

30 浏览

提问于 2025-04-12 12:06

我在pandas中有一个这样的数据表：

df = pd.DataFrame([{'A':'horses','B':'car crash'},
                       {'A':'red cars in street','B':'One horse'},
                       {'A':'Lionel Messi','B':'an octopus in a bag'},
                       {'A':'white octopus in red box','B':'messi'},
                       {'A':'Estudiantes de La Plata','B':''}])

我需要用difflib这个函数来比较每一列A的值和每一列B的值。也就是说，我需要把horses和car crash、One horse等进行比较，然后找出在difflib比较中得分更高的名字。

我不知道该从哪里开始……

1 个回答

import pandas as pd
from difflib import get_close_matches

df = pd.DataFrame([{'A':'horses','B':'car crash'},
                       {'A':'red cars in street','B':'One horse'},
                       {'A':'Lionel Messi','B':'an octopus in a bag'},
                       {'A':'white octopus in red box','B':'messi'},
                       {'A':'Estudiantes de La Plata','B':''}])

s = pd.Series([get_close_matches(word, df["B"], n=1, cutoff=0.0)[0] for word in df["A"]])
print(pd.DataFrame({"A": df["A"], "closest_match": s}))

                          A        closest_match
0                    horses            One horse
1        red cars in street            car crash
2              Lionel Messi                messi
3  white octopus in red box  an octopus in a bag
4   Estudiantes de La Plata  an octopus in a bag

回答于 2025-04-12 由 Python大师

分享举报

在pandas DataFrame中比较一列的每个元素与第二列的所有元素

1 个回答

撰写回答