python中基于部分字符串匹配的数据帧连接

import pandas as pd df1=pd.DataFrame({'Product_Name1': ['Mini Wireless Bluetooth Sports Stereo Headset', 'VR Box 3D Smart Glass With Remote Controller', 'OnePlus 6 Sandstone Protective Case'],'Price1': [40000, 50000, 42000]}) df2=pd.DataFrame({'Product_Name2': ['Mini Wireless Sports Stereo Headset', 'VR Box 3D Smart Glass With Remote Controller', 'OnePlus 6 1Sandstone Protective Case'], 'Price2': [40000, 50000, 42000]}) df1set=df1.set_index('Product_Name1') df2set=df2.set_index('Product_Name2') df3=df1set.join(df2set,how='inner') df3 df1 df2

1条回答

网友

1楼 · 发布于 2024-04-24 07:58:39

你需要的是模糊匹配。模糊匹配用于比较彼此非常相似的字符串。您可以使用fuzzy wuzzy进行此操作。在

模糊匹配实例

from fuzzywuzzy import process
process.extractOne('Mini Wireless Bluetooth Sports Stereo Headset', df2.Product_Name2)

('Mini  Wireless Sports Stereo Headset', 95, 0)

此值与95%匹配。在

我把df2的顺序改为演示。在

^{pr2}$

现在我们编写一个函数，它将df1 Product_Name1的每个值与df2 Product_Name2的每个值相匹配，并返回df2的索引，其中它与最高值匹配。在

def fuzzy(x):
    closest_match = process.extractOne(x, df2.Product_Name2.values)[0]
    index = pd.Index(df2.Product_Name2).get_loc(closest_match)
    return index

我们使用apply得到结果

df1['match'] = df1['Product_Name1'].apply(fuzzy)
df1

Product_Name1                                      Price1   match
0   Mini Wireless Bluetooth Sports Stereo Headset   40000   0
1   VR Box 3D Smart Glass With Remote Controller    50000   2
2   OnePlus 6 Sandstone Protective Case            42000    1

因为我没有你期望的输出，所以我要合并它们。在

pd.merge(df1, df2, left_on='match', right_on=df2.index)

   Product_Name1                                  Price1    match   Product_Name2   Price 2 

0   Mini Wireless Bluetooth Sports Stereo Headset   40000   0        Mini Wireless Sports Stereo Headset            40000
1   VR Box 3D Smart Glass With Remote Controller    50000   2        VR Box 3D Smart Glass With Remote Controller    50000
2   OnePlus 6 Sandstone Protective Case             42000   1        OnePlus 6 1Sandstone Protective Case        42000

如果对你有用，请告诉我

相关问题更多 >

编程相关推荐

热门问题

热门文章