如果字符串包含关键字，则搜索不同ID的不同关键字集

IDs Name Indicator 0 1234 APPLE ABCD ONE True 1 5346 APPLE ABCD False 2 1234 STRAWBERRY YES False 3 8793 ORANGE AVAILABLE False 4 8793 TEA AVAILABLE False 5 8793 TEA COFFEE True

3条回答

网友

1楼 · 编辑于 2024-04-19 10:35:42

在使用groupby和lambda时，可以使用merge，如下所示：

>>> df.merge(df2).groupby(['IDs','Name']).apply(lambda x: any(x['Name'].str.contains('|'.join(x['Keywords'])))).rename('Indicator').reset_index()
    IDs              Name  Indicator
0  1234        APPLE ABCD       True
1  1234    STRAWBERRY YES      False
2  5346        APPLE ABCD      False
3  8793  ORANGE AVAILABLE      False
4  8793     TEA AVAILABLE       True

网友

2楼 · 编辑于 2024-04-19 10:35:42

您需要：

# create a list of tuples from 1st dataframe
kw = list(zip(df1.IDs, df1.Keywords))

def func(ids, name):
    if (ids,name.split(" ")[0]) in kw:
        return True
    return False

df2['Indicator'] = df2.apply(lambda x: func(x['IDs'],x['Names']), axis=1)

编辑

创建具有id和关键字组合的元组列表

kw = list(zip(df1.IDs, df1.Keywords))
# [(1234, 'APPLE ABCD'), (1234, 'ORANGE'), (1234, 'LEMONS'), (5346, 'ORANGE'), (5346, 'STRAWBERRY'), (5346, 'BLUEBERRY'), (8793, 'TEA COFFEE')]

unique_kw = list(df1['Keywords'].unique())
# ['APPLE ABCD', 'ORANGE', 'LEMONS', 'STRAWBERRY', 'BLUEBERRY', 'TEA COFFEE']

def samp(x):
    for u in unique_kw:
        if u in x:
            return u
    return None

# This will fetch the keywords from column which will be used for compare  
df2['indicator'] = df2['Names'].apply(lambda x: samp(x))

df2['indicator'] = df2.apply(lambda x: True if (x['IDs'], x['indicator']) in kw else False, axis=1)

输出：

    IDs     Names               indicator
0   1234    APPLE ABCD ONE      True
1   5346    APPLE ABCD          False
2   1234    NO STRAWBERRY YES   False
3   8793    ORANGE AVAILABLE    False
4   8793    TEA AVAILABLE       False
5   8793    TEA COFFEE          True

网友
3楼 · 编辑于 2024-04-19 10:35:42

您可以主要使用`pandas`操作来实现这一点，这样效率也会更高。

# Let there be two DataFrames: kw_df, name_df

# Group all keywords of each ID in a list, associate it with the names
kw_df = kw_df.groupby('IDs').aggregate({'Keywords': list})
merge_df = name_df.join(kw_df, on='IDs')

# Check if any keyword is in the name
def is_match(name, kws):
    return any(kw in name for kw in kws)

merge_df['Indicator'] = merge_df.apply(lambda row: is_match(row['Name'], row['Keywords']), axis=1)
print(merge_df)

其输出如下：

    IDs              Name                         Keywords  Indicator
0  1234    APPLE ABCD ONE     [APPLE ABCD, ORANGE, LEMONS]       True
1  5346        APPLE ABCD  [ORANGE, STRAWBERRY, BLUEBERRY]      False
2  1234    STRAWBERRY YES     [APPLE ABCD, ORANGE, LEMONS]      False
3  8793  ORANGE AVAILABLE                     [TEA COFFEE]      False
4  8793     TEA AVAILABLE                     [TEA COFFEE]      False
5  8793        TEA COFFEE                     [TEA COFFEE]       True

您可以主要使用`pandas`操作来实现这一点，这样效率也会更高。

相关问题更多 >

编程相关推荐

热门问题

热门文章

如果字符串包含关键字，则搜索不同ID的不同关键字集

您可以主要使用pandas操作来实现这一点，这样效率也会更高。

相关问题 更多 >

编程相关推荐

热门问题

热门文章

您可以主要使用`pandas`操作来实现这一点，这样效率也会更高。

相关问题更多 >