基于不同关键字合并数据帧的函数

2024-04-23 18:24:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试创建一个函数,该函数基于出现在另一个数据帧的某一列中的不同单词列表来创建一个数据帧。你知道吗

在我的示例中,我希望根据“未分类”数据框的“描述”列中出现的单词“chandos”和“electronics”创建一个数据框。你知道吗

这个函数的重点是我希望能够在不同的单词列表上运行它,这样我就可以得到不同的数据帧,其中只包含我想要的单词。你知道吗

words_Telephone = ["tfl", "electronics"] 
df_Telephone = pd.DataFrame(columns=['date','description','paid out'])



def categorise(word_list, df_name):
    """ takes the denoted terms from the "uncategorised" df and puts it into new df"""
    for word in word_list:
        df_name = uncategorised[uncategorised['description'].str.contains(word)]
        return(df_name)

#apply the function    
categorise(words_Telephone, df_Telephone)

我希望数据帧包含:

d = {'date': {0: '05/04/2017',
1: '06/04/2017',

2:'08/04/2017', 3: '08/04/2017', 4: '08/04/2017', 5: '10/04/2017', 6: '10/04/2017', 7: '10/04/2017'}, 'description':{0:'tfl', 1:'tfl', 2:'tfl', 3:'tfl', 4:'交流电子', 5:'交流电子',}, '索引':{0:1,1:2,2:3,3:4,4:5,5:6,6:7,7:8,8:9,9:10}, '支付':{0:3.0, 1: 4.3, 2: 6.1, 3: 1.5, 4: 16.39, 5:20.4,}

可复制df:

d = {'date': {0: '05/04/2017',
  1: '06/04/2017',
  2: '06/04/2017',
  3: '08/04/2017',
  4: '08/04/2017',
  5: '08/04/2017',
  6: '10/04/2017',
  7: '10/04/2017',
  8: '10/04/2017'},
 'description': {0: 'tfl',
  1: 'mu subscription',
  2: 'tfl',
  3: 'tfl',
  4: 'tfl',
  5: 'ac electronics ',
  6: 'itunes',
  7: 'ac electronics ',
  8: 'google adwords'},
 'index': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10},
 'paid out': {0: 3.0,
  1: 16.9,
  2: 4.3,
  3: 6.1,
  4: 1.5,
  5: 16.39,
  6: 12.99,
  7: 20.4,
  8: 39.68}}

解决方案:

def categorise(word_list):
    """ takes the denoted terms from the "uncategorised" df and puts it into new df then deletes from the uncategorised df"""
    global uncategorised
    new_dfs = []
    for word in word_list:
        new_dfs.append(uncategorised[uncategorised['description'].str.contains(word)])
        uncategorised= uncategorised[ ~uncategorised['description'].str.contains(word)]

    return (uncategorised)
    return (pd.concat(new_dfs).reset_index())

#apply the function    
df_Telephone = categorise(words_Telephone)

df_Telephone

Tags: the数据函数dfnewdescription单词list
1条回答
网友
1楼 · 发布于 2024-04-23 18:24:44
words_Telephone = ["tfl", "electronics"] 
original_df = pd.DataFrame().from_dict({'date': ['05/04/2017','06/04/2017','06/04/2017','08/04/2017','08/04/2017','08/04/2017','10/04/2017','10/04/2017','10/04/2017'], 'description': ['tfl','mu subscription','tfl','tfl','tfl','ac electronics','itunes','ac electronics','google adwords'], 'paid out' :[ 3.0,16.9, 4.3,6.1,1.5,16.39,12.99,20.4,39.68]})

def categorise(word_list, original_df):
    """ takes the denoted terms from the "uncategorised" df and puts it into new df"""
    new_dfs = []
    for word in word_list:
        new_dfs.append(original_df[original_df['description'].str.contains(word)])

    return pd.concat(new_dfs).reset_index()

#apply the function    
df_Telephone = categorise(words_Telephone, original_df)
print(df_Telephone)


         date     description  paid out
0  05/04/2017             tfl      3.00
1  06/04/2017             tfl      4.30
2  08/04/2017             tfl      6.10
3  08/04/2017             tfl      1.50
4  08/04/2017  ac electronics     16.39
5  10/04/2017  ac electronics     20.40

相关问题 更多 >