考虑两个数据帧:
df1 = pd.DataFrame(['apple and banana are sweet fruits','how fresh is the banana','cherry from japan'],columns=['fruits_names'])
df2 = pd.DataFrame([['apple','red'],['banana','yellow'],['cherry','black']],columns=['fruits','colors'])
然后是代码:
colors =[]
for f in df1.fruits_names.str.split().apply(set): #convert content in a set with splitted words
color = [df2[df2['fruits'].isin(f)]['colors']] #matching fruits in a list
colors.append(color)
我可以很容易地在df1中插入颜色
df1['color'] = colors
output:
fruits_names color
0 apple and banana are sweet fruits [[red, yellow]]
1 how fresh is the banana [[yellow]]
2 cherry from japan [[black]]
问题是,如果列“fruits”有其他值,例如:
df2 = pd.DataFrame([[['green apple|opal apple'],'red'],[['banana|cavendish banana'],'yellow'],['cherry','black']],columns=['fruits','colors'])
如何保持此代码正常工作?你知道吗
我上一次尝试的是创建一个新列,其中包含水果的独立值:
df2['Types'] = cf['fruits'].str.split('|')
And.在此处应用(元组):
color = [df[df['Types'].apply(tuple).isin(f)]['colors']]
但不匹配。你知道吗
我想你需要:
使用
split
和df.explode()
输出:
把它转换成
dict
基于条件创建列
最终输出:
相关问题 更多 >
编程相关推荐