在数据框的两列之间应用模糊搜索,如果分数高于某个阈值,则从数据框的第三列中提取值

2024-05-23 15:41:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据框,即lindt和coles,列名为“name”。我需要在两列之间应用模糊搜索,因为它们包含一些不同的名称,如果模糊分数高于某个阈值,那么我希望从lindt数据帧访问条形码,并将其粘贴到coles数据帧的GTIN列中

lindt = 

    Barcode                 Name
3046920029780   Lindt Excellence Dark Chocolate 70% Smooth 

8013108697005   Lindt Milk No Sugar Added Block
8013108697012   Lindt Dark No Sugar Added Block
9323966405971   Lindt Hot Chocolate Flakes Milk
3046920022569   Lindt Creation Dark Chocolate Sumptuous Orange
3046920022545   Lindt Creation Milk Chocolate Divine Hazelnut
3046920044752   Lindt Creation Dark Chocolate Sublime Mint
3046920022538   Lindt Creation Milk Chocolate Heavenly Creme 
9542005955  Lindt Excellence Dark Chocolate Orange Intense
9542005979  Lindt Excellence Dark Chocolate A Touch Of Sea

coles = 

     Name
lindt dessert premium 70% cocoa cooking chocolate
lindt excellence milk sea salt caramel milk chocolate
lindt chocolates creations 170g
lindt excellence dark hazelnut chocolate block
lindt coconut excellence dark chocolate block 
lindt excellence extra fine 70% cocoa dark chocolate
lindt excellence mint intense chocolate block 
lindt excellence sea salt caramel dark chocolate
lindt excellence roasted almond dark chocolate
lindt creation devine hazelnut chocolate block
lindt creation sumptuous orange chocolate block
lindt creation creme brulee chocolate block 100g
lindt creation salted caramel sundae milk chocolate
 for names in coles["Name"]:
    for name in lindt["Name"]:
     score = fuzz.token_sort_ratio(names, name)
     #print(names, name)
     #print(score)
     #print("#############################################")
     if score > 80:
         match = lindt.loc[lindt['Name'] == name]
         print(match["Barcode"])
         #coles["GTIN"][8:] = coles.iloc[coles[] == match['Name'], match["Barcode"]]
         coles['GTIN'] = coles['GTIN'].map(lambda x: match["Barcode"] if x else "NA")


Expected output :
          Name                                              GTIN
 lindt dessert premium 70% cocoa cooking chocolate  3046920029780
 lindt excellence milk sea salt caramel milk chocolate    xxxxxx
 lindt chocolates creations 170g                          xxxxxx
 lindt excellence dark hazelnut chocolate block           xxxxxx
 lindt coconut excellence dark chocolate block            xxxxxx
 lindt excellence extra fine 70% cocoa dark chocolate     ......
 lindt excellence mint intense chocolate block            ......
 lindt excellence sea salt caramel dark chocolate         ......
 lindt excellence roasted almond dark chocolate           ......
 lindt creation devine hazelnut chocolate block           ......
 lindt creation sumptuous orange chocolate block          ......
 lindt creation creme brulee chocolate block 100g         ......
 lindt creation salted caramel sundae milk chocolate      ..so on

Tags: nameblockcreationdarkgtinmilkcaramelchocolate