复制列元素并基于相关列表应用于另一列

dct = {'Store': ('A','A','A','A','A','A','B','B','B','C','C','C'), 'code_num':('INC101','INC102','INC103','INC104','INC105','INC106','INC201','INC202','INC203','INC301','INC302','INC303'), 'days':('4','18','9','15','3','6','10','5','3','1','8','5'), 'products': ('remote','antenna','remote, antenna','TV','display','TV','display, touchpad','speaker','Cell','display','speaker','antenna') } df = pd.DataFrame(dct) pts = {'Primary': ('TV','TV','TV','Cell','Cell'), 'Related' :('remote','antenna','speaker','display','touchpad') } parts = pd.DataFrame(pts) print(df) Store code_num days products 0 A INC101 4 remote 1 A INC102 18 antenna 2 A INC103 9 remote, antenna 3 A INC104 15 TV 4 A INC105 3 display 5 A INC106 6 TV 6 B INC201 10 display, touchpad 7 B INC202 5 speaker 8 B INC203 3 Cell 9 C INC301 1 display 10 C INC302 8 speaker 11 C INC303 5 antenna

Store code_num days products refer 0 A INC101 4 remote INC106 1 A INC102 18 antenna -> omitted in 1st pass; because >10 days 2 A INC103 9 remote, antenna INC106 3 A INC104 15 TV -> omitted in 1st pass; because >10 days 4 A INC105 3 display 5 A INC106 6 TV INC106 6 B INC201 10 display, touchpad INC203 7 B INC202 5 speaker 8 B INC203 3 Cell INC203 9 C INC301 1 display -> blank because no primary present 10 C INC302 8 speaker -> blank because no primary present 11 C INC303 5 antenna -> blank because no primary present

1条回答

网友

1楼 · 发布于 2024-04-27 13:41:38

复制了该场景：

你的意见：

dct = {'Store': ('A','A','A','A','A','A','B','B','B','C','C','C'),
       'code_num':('INC101','INC102','INC103','INC104','INC105','INC106','INC201','INC202','INC203','INC301','INC302','INC303'),
       'days':('4','18','9','15','3','6','10','5','3','1','8','5'),
       'products': ('remote','antenna','remote,antenna','TV','display','TV','display,touchpad','speaker','Cell','display','speaker','antenna')
}

df = pd.DataFrame(dct)
pts = {'Primary': ('TV','TV','TV','Cell','Cell'),
         'Related' :('remote','antenna','speaker','display','touchpad')
    
}

parts = pd.DataFrame(pts)
store = {'A':'TV','B':'Cell'}

解决方案：

将零件df转换为字典：

 parts_df_dict = dict(zip(parts['Related'],parts['Primary']))

拆分逗号分隔的子产品并使其分隔行：

new_df = pd.DataFrame(df.products.str.split(',').tolist(), index=df.code_num).stack()
new_df = new_df.reset_index([0, 'code_num'])
new_df.columns = ['code_num', 'Prod_seperated']
new_df = new_df.merge(df, on='code_num', how='left')

创建引用列的逻辑：

store_prod = {}
for k,v in store.items():
    store_prod[k] = k+'_'+v
new_df['prod_store'] = new_df['Store'].map(store_prod)
new_df['p_store'] = new_df['Store'].map(store)
new_df['main_ind'] = ' '
new_df.loc[(new_df['prod_store']==new_df['Store']+'_'+new_df['Prod_seperated'])&(new_df['days'].astype('int')<10),'main_ind']=new_df['code_num']
refer_dic = new_df.groupby('Store')['main_ind'].max().to_dict()
new_df['prod_subproducts'] = new_df['Prod_seperated'].map(parts_df_dict)
new_df['refer']  = np.where((new_df['p_store']==new_df['prod_subproducts'])&(new_df['days'].astype('int')<=10),new_df['Store'].map(refer_dic),np.nan) 

new_df['refer'].fillna(new_df['main_ind'],inplace=True)
new_df.drop(['Prod_seperated','prod_store','p_store','main_ind','prod_subproducts'],axis=1,inplace=True)
new_df.drop_duplicates(inplace=True)

新的或所需的输出：

如果你有任何疑问，请告诉我

相关问题更多 >

编程相关推荐

热门问题

热门文章