Pandas像嵌套的国家一样出类拔萃

=IF(COUNTIFS(advisor!$C:$C,$A3)=0,"0 disclosed", IF(COUNTIFS(advisor!$C:$C,$A3,advisor!$E:$E,2)>0,"Dependent", IF(IF(COUNTIFS(advisor!$C:$C,$A3,advisor!$B:$B,"auditor")>0,1,0)+IF(COUNTIFS(advisor!$C:$C,$A3,advisor!$B:$B,"compensation")>0,1,0)=2,"Independent","1 disclosed")))

df['auditor_compensation'] = np.where(df['id'].isin(df_advisor['company_id']).count() == 0, '0 disclosed', np.where(df_advisor['dependent'] == 2, 'dependent', np.where((np.where(df_advisor['type']=='auditor', 1, 0)+np.where(df_advisor['type']=='compensation', 1, 0)) == 2, 'independent', '1 disclosed')))

id ticker iq_id company auditor_compensation 48299 ENXTAM:AALB IQ881736 Aalberts Industries ? 48752 ENXTAM:ABN IQ1090191 ABN AMRO Group ? 48865 ENXTAM:ACCEL IQ4492981 Accell Group ? 49226 ENXTAM:AGN IQ247906 AEGON ? 49503 ENXTAM:AD IQ373545 Koninklijke ?

id type company_id advisor_company_id dependent 1 auditor 4829 6091 1 17 auditor 4875 16512 1 6359 auditor 4886 7360 1 37 auditor 4922 8187 1 4415 compensation 4922 9025 1 53 auditor 4950 8187 1

1条回答

网友

1楼 · 发布于 2024-05-16 16:04:14

numpy.where函数不会生成与原始数据帧长度相同的数组或序列。这是因为它试图组合不一致的条件，例如df['id']和df_advisor['dependent']将具有不同的长度。你知道吗

尽管将Excel公式翻译成Pandas/NumPy很有诱惑力，但使用^{}、^{}和^{}可能会更高效、更可读。你知道吗

步骤1：组映射数据帧

df_advisor_grouped = df_advisor.groupby('company_id')\
                               .agg({'type': '|'.join, 'dependent': 'sum'})\
                               .reset_index()

print(df_advisor_grouped)

   company_id                  type  dependent
0        4829               auditor          1
1        4875               auditor          1
2        4886               auditor          1
3        4922  auditor|compensation          2
4        4950               auditor          1

步骤2：与主数据帧合并

# merge dataframes based on key column
res = df.merge(df_advisor_grouped, left_on='id', right_on='company_id', how='left')

步骤3：应用条件逻辑

# define 3 conditions
conds = [res['company_id'].isnull(), res['dependent'].eq(2),
         res['type'].str.contains('auditor') & res['type'].str.contains('compensation')]

# define 3 choices
choices = ['0 disclosed', 'dependent', 'independent'] 

# apply np.select logic, including default argument if 3 conditions are not met
res['auditor_compensation'] = np.select(conds, choices, '1 disclosed')

相关问题更多 >

编程相关推荐

热门问题

热门文章