我有两个大数据帧(1000行),我需要通过子字符串来匹配它们,例如:
df1:
Id Title
1 The house of pump
2 Where is Andijan
3 The Joker
4 Good bars in Andijan
5 What a beautiful house
df2:
Keyword
house
andijan
joker
预计产量为:
Id Title Keyword
1 The house of pump house
2 Where is Andijan andijan
3 The Joker joker
4 Good bars in Andijan andijan
5 What a beautiful house house
现在,我写了一种非常不高效的方法来匹配它,但是对于数据帧的实际大小,它运行了非常长的时间:
for keyword in df2.to_dict(orient='records'):
df1['keyword'] = np.where(creative_df['title'].str.contains(keyword['keyword']), keyword['keyword'], df1['keyword'])
现在,我相信有一种更友好、更有效的方法可以做到这一点,并且在合理的时间内运行
让我们试试
findall
进一步开发@BENY的解决方案,以便能够获得每个标题的多个关键字:
相关问题 更多 >
编程相关推荐