在我的数据集中,我有两列,分别称为“账户名称”和“机构部门代码”,其中一些账户名称的单词类似于列表1中的元素,相应的机构部门代码类似于列表2中的元素。
list1 = ['limited','Hotel','Company','Restaurant','Sacco'] and list2 = [''NFCPVT','OLENT','MENT','SENT','MFI']
,我目前正在硬编码它(请查看我在github上所做的代码示例),但我想简化我的函数。我的思维过程是能够在列表1中循环获得前四个元素,并将它们与列表2中的相应元素进行比较,即列表1中的第一个元素可以与列表2中的前四个元素中的任何一个配对,而列表1中的最后一个元素(Sacco)应该只对应于列表2中的“MFI”。然后输出列表2中不存在的机构部门代码的列表或值计数
你知道我如何简化它吗?谢谢你的帮助
代码样本
import pandas as pd
import numpy as np
def acc_name_and_inst_sec_codes_consistency(df):
#account name
Comp_acc_name = ['ltdt','limited','Hotel','Company','Restaurant','Consultant','technologies','Service','atelier','quincaillerie']
Sacco_acc_name = ['SACCO']
inst_sector_code_comp_acc_name = ['NFCPVT','OLENT','MENT','SENT','NFCPUB']
inst_sector_code_Sacco_acc_name = ['MFI']
Filtered_comp_acc_name =
df[df['ACCOUNT_NAME'].str.contains('|'.join(Comp_acc_name),
case=False)]
Filtered_sacco_acc_name =
df[df['ACCOUNT_NAME'].str.contains('|'.join(Sacco_acc_name),
case=False)]
print('Below are institutional sector codes that shouldnot be
included for the Account name with',\
Comp_acc_name,'in it\n\n',Filtered_comp_acc_name[~Filtered_comp_acc_name['INSTITUTIONAL_SECTOR_CODE'].\
isin(inst_sector_code_comp_acc_name)]['INSTITUTIONAL_SECTOR_CODE_DESC'].value_counts().\
rename_axis('INSTITUTIONAL_SECTOR_CODE_DESC').reset_index(name='counts').sort_values('counts', ascending = False).\
to_string(index=False),'\n\n' 'Below are institutional sector codes that shouldnot be included for the Account name with', Sacco_acc_name,'in it\n\n',Filtered_sacco_acc_name[~Filtered_sacco_acc_name['INSTITUTIONAL_SECTOR_CODE'].\
isin(inst_sector_code_Sacco_acc_name)]['INSTITUTIONAL_SECTOR_CODE_DESC'].value_counts().\
rename_axis('INSTITUTIONAL_SECTOR_CODE_DESC').reset_index(name='counts').sort_values('counts', ascending = False).\
to_string(index=False))
样本输出
Below are institutional sector codes that shouldnot be included for the Account name with ['ltdt', 'limited', 'Hotel', 'Company', 'Restaurant', 'Consultant', 'technologies', 'Service', 'atelier', 'quincaillerie'] in it
INSTITUTIONAL_SECTOR_CODE_DESC counts
Not Applicable 11012
Individuals 1537
Insurance 9
MFI/SACCO (Deposit Taking) 3
Below are institutional sector codes that shouldnot be included for the Account name with ['SACCO'] in it
INSTITUTIONAL_SECTOR_CODE_DESC counts
Not Applicable 1001
Individuals 28
Non Financial Companies - Private 1
目前没有回答
相关问题 更多 >
编程相关推荐