Groupby创建lis

import numpy as np import pandas as pd pd.set_option('display.max_columns', 500) df_fofs = pd.read_excel('SampleDF.xlsx') df_extract = pd.read_excel('SampleID_ex.xlsx') df_extract original_id 0 A1 1 B2 2 C3 3 D4 df_fofs brand country id holding 0 AAA UK A1 2000000 1 AAA CH B2 4000000 2 BBB UK A1 7000000 3 CCC DE C3 5000000 4 BBB IT A1 10000000 5 EEE UK D4 11000000 6 EEE CH D4 3000000 7 EEE CH D4 6000000 8 AAA CH B2 8000000 fund_ids = list(df_extract['original_id']) result = {} for fund in fund_ids: temp = [] df_funds = df_fofs[(df_fofs['id'] == fund )][['country', 'brand', 'holding']] domicile_fof = df_fofs[df_fofs['id'] == fund ][['country', 'holding']] df_funds = df_funds.groupby(['country', 'brand'])["holding"].sum() domicile_fof = domicile_fof.groupby('country')["holding"].sum() s = '' for i in range(len(df_funds)): row = df_funds.reset_index().iloc[i] if row['holding'] >= 5000000: s += row['country'] + ' (' + str(row['brand']) + ', ' + str(round(((row['holding'])/1000000), 2)) + '), ' result[fund] = [s] df_result = pd.DataFrame.from_dict(result, orient = 'index') df_result.reset_index(inplace = True) df_result.columns = ['FundID', 'Domicile (brand, AUM)'] df_result FundID Domicile (brand, AUM) 0 A1 IT (BBB, 10.0), UK (BBB, 7.0), 1 B2 CH (AAA, 12.0), 2 C3 DE (CCC, 5.0), 3 D4 CH (EEE, 9.0), UK (EEE, 11.0),

3条回答

网友

1楼 · 编辑于 2024-05-15 10:37:45

您可以组合id、groupbyid和country上的表来生成内部项，然后只保存groupbyid来创建外部级别

def f(x):
    n = x.apply(lambda r: '{} ({})'.format(r['brand'],int(r['holding']/1000000)), axis=1)
    return '{} [{}]'.format(x.iloc[0]['country'],', '.join(n))


df_extract.merge(df_fofs, left_on='original_id', right_on='id') 
   .groupby(['original_id','country']).apply(f) \ 
   .groupby(level=0).apply(', '.join)

original_id
A1    IT [BBB (10)], UK [AAA (2), BBB (7)]
B2                   CH [AAA (4), AAA (8)]
C3                            DE [CCC (5)]
D4    CH [EEE (3), EEE (6)], UK [EEE (11)]
dtype: object

网友

2楼 · 编辑于 2024-05-15 10:37:45

试试这个

>>> df   
  brand country  id  holding
0   AAA      UK  A1        2
1   AAA      CH  B2        4
2   BBB      UK  A1        7
3   CCC      DE  C3        5
4   BBB      IT  A1       10
5   EEE      UK  D4       11
6   EEE      CH  D4        3
7   EEE      CH  D4        6
8   AAA      CH  B2        8

>>> final_df = df.groupby(by='id').apply(lambda x: x.groupby(by='country')
                                                    .apply(lambda y: y.groupby(by='brand')
                                                  .agg(sum))).reset_index()

>>> final_df.groupby(by='id')\
        .apply(lambda x: ", ".join([f"{row['country']} [{row['brand']}({row['holding']})]"
                                      for _, row in x.iterrows()]))

id
A1    IT [BBB(10)], UK [AAA(2)], UK [BBB(7)]
B2                              CH [AAA(12)]
C3                               DE [CCC(5)]
D4                 CH [EEE(9)], UK [EEE(11)]
dtype: object

网友

3楼 · 编辑于 2024-05-15 10:37:45

您需要的是函数pandas.DataFrame.pivot_table。文档在这里pandas/pivot_table。你知道吗

这段代码解决了您的示例（而不是我使用的多索引列表）

import pandas as pd
df = pd.DataFrame([
    ('AAA','UK','A1',2000000),
    ('AAA','CH','B2',4000000),
    ('BBB','UK','A1',7000000),
    ('CCC','DE','C3',5000000),
    ('BBB','IT','A1',10000000),
    ('EEE','UK','D4',11000000),
    ('EEE','CH','D4',3000000),
    ('EEE','CH','D4',6000000),
    ('AAA','CH','B2',8000000)],
    columns=['brand',   'country', 'id',  'holding'])
df.pivot_table(values='holding',index=['id','country','brand'])

结果是 resulting dataframe

相关问题更多 >

编程相关推荐

热门问题

热门文章