字符串匹配并获取多个列

2024-06-16 12:34:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要将df1中的Name匹配到df2中的Item_Name。只要名称匹配,我还需要df2中的Item_IdMaterial_Name。你知道吗

我有两个数据帧:

Df1型:

原始df有1000+个名称

   Id    Name
    1     Paper
    2     Paper Bag
    3     Scissors
    4     Mat
    5     Cat
    6     Good Cat

第二测向:

原始数据框有1000+项目名称

Item_ID   Item_Name    Material_Name
1         Paper Bag      Office
2         wallpaper      Decor
3         paper          Office
4         cat cage       Animal Misc
5         good cat       Animal

预期产量:

Id Name              Item_ID      Material_Name
1  Paper              1,2,3       Office,Decor,Office 
2  Paper Bag          1,2,3       Office,Decor,Office 
3  Scissors            NA         NA 
4  Mat                 NA         NA  
5  Cat                4,5         Animal Misc, Animal
6  Good Cat           4,5         Animal Misc,Animal

代码:

def matcher(query):

    matches = [i['Item_ID'] for i in df2[['Item_ID','Name']].to_dict('records') if any(q in i['Name'].lower() for q in query.lower().split())]
    if matches:
        return ','.join(map(str, matches))
    else:
        return 'NA'

df1['Item_ID'] = df1['Name'].apply(matcher)

当我需要一个列,并且当前我运行了两次此代码以获得Item_IDMaterial_Name时,这一点可以正常工作。你知道吗

询问:

如果有其他方法不运行函数两次,需要帮助,但我可以一次获得2或3列


Tags: nameiditemcatpapermiscmaterialbag
2条回答

下面是使用^{}和重用布尔掩码的一种方法:

def matcher(x):

    # construct 2-way mask
    m1 = df2['Item_Name'].str.contains(x, regex=False, case=False)
    m2 = [any(w in i.lower() for w in x.lower().split()) for i in df2['Item_Name']]

    # apply 2-way mask
    res_id = df2.loc[m1 | m2, 'Item_ID']
    res_mat = df2.loc[m1 | m2, 'Material_Name']

    return ','.join(res_id.astype(str)), ','.join(res_mat.astype(str))

df1[['Item_ID', 'Material_Name']] = pd.DataFrame(df1['Name'].apply(matcher).tolist())

print(df1)

   Id       Name Item_ID        Material_Name
0   1      Paper   1,2,3  Office,Decor,Office
1   2  Paper Bag   1,2,3  Office,Decor,Office
2   3   Scissors                             
3   4        Mat                             
4   5        Cat     4,5   Animal Misc,Animal
5   6   Good Cat     4,5   Animal Misc,Animal

您可以尝试从查询中获取Item_IDMaterial_Name作为元组,然后使用[i[0] for i in matches][i[1] for i in matches]应用适当的列。你知道吗

def matcher(query):
    matches = [(i['Item_ID'], i['Material_Name']) for i in df2[['Item_ID','Name']].to_dict('records') if any(q in i['Name'].lower() for q in df1['Name'].lower().split())]

    if matches:
        df1['Material_Name'].apply(','.join(map(str, [i[1] for i in matches])))
        return ','.join(map(str, [i[0] for i in matches]))
    else:
        df1['Material_Name'].apply("NA")
        return 'NA'

df1['Item_ID'] = df1['Name'].apply(matcher)

相关问题 更多 >