比较包含多个元素的多个列表

2024-05-14 23:52:29 发布

您现在位置:Python中文网/ 问答频道 /正文

在我的数据集中,我有两列,分别称为“账户名称”和“机构部门代码”,其中一些账户名称的单词类似于列表1中的元素,相应的机构部门代码类似于列表2中的元素。 list1 = ['limited','Hotel','Company','Restaurant','Sacco'] and list2 = [''NFCPVT','OLENT','MENT','SENT','MFI'],我目前正在硬编码它(请查看我在github上所做的代码示例),但我想简化我的函数。我的思维过程是能够在列表1中循环获得前四个元素,并将它们与列表2中的相应元素进行比较,即列表1中的第一个元素可以与列表2中的前四个元素中的任何一个配对,而列表1中的最后一个元素(Sacco)应该只对应于列表2中的“MFI”。然后输出列表2中不存在的机构部门代码的列表或值计数

你知道我如何简化它吗?谢谢你的帮助

代码样本

import pandas as pd
import numpy as np


def acc_name_and_inst_sec_codes_consistency(df):
    
    #account name
    
    Comp_acc_name = ['ltdt','limited','Hotel','Company','Restaurant','Consultant','technologies','Service','atelier','quincaillerie']
    
    Sacco_acc_name = ['SACCO']

    inst_sector_code_comp_acc_name = ['NFCPVT','OLENT','MENT','SENT','NFCPUB']
    
    inst_sector_code_Sacco_acc_name = ['MFI']

    Filtered_comp_acc_name = 
    df[df['ACCOUNT_NAME'].str.contains('|'.join(Comp_acc_name), 
    case=False)]
    
    Filtered_sacco_acc_name = 
    df[df['ACCOUNT_NAME'].str.contains('|'.join(Sacco_acc_name), 
    case=False)]

    print('Below are institutional sector codes that shouldnot be 
    included for the Account name with',\
      Comp_acc_name,'in it\n\n',Filtered_comp_acc_name[~Filtered_comp_acc_name['INSTITUTIONAL_SECTOR_CODE'].\
      isin(inst_sector_code_comp_acc_name)]['INSTITUTIONAL_SECTOR_CODE_DESC'].value_counts().\
      rename_axis('INSTITUTIONAL_SECTOR_CODE_DESC').reset_index(name='counts').sort_values('counts', ascending = False).\
      to_string(index=False),'\n\n' 'Below are institutional sector codes that shouldnot be included for the Account name with', Sacco_acc_name,'in it\n\n',Filtered_sacco_acc_name[~Filtered_sacco_acc_name['INSTITUTIONAL_SECTOR_CODE'].\
    isin(inst_sector_code_Sacco_acc_name)]['INSTITUTIONAL_SECTOR_CODE_DESC'].value_counts().\
      rename_axis('INSTITUTIONAL_SECTOR_CODE_DESC').reset_index(name='counts').sort_values('counts', ascending = False).\
      to_string(index=False))

样本输出

Below are institutional sector codes that shouldnot be included for the Account name with ['ltdt', 'limited', 'Hotel', 'Company', 'Restaurant', 'Consultant', 'technologies', 'Service', 'atelier', 'quincaillerie'] in it

 INSTITUTIONAL_SECTOR_CODE_DESC  counts
                Not Applicable   11012
                   Individuals    1537
                     Insurance       9
    MFI/SACCO (Deposit Taking)       3 

Below are institutional sector codes that shouldnot be included for the Account name with ['SACCO'] in it

     INSTITUTIONAL_SECTOR_CODE_DESC  counts
                    Not Applicable    1001
                       Individuals      28
 Non Financial Companies - Private       1 


Tags: 代码namefalse元素列表codedescfiltered

热门问题