在python中，如何合并两个数据帧，同时使用另一个数据帧的权重在其中传播值？

import pandas as pd df1 = pd.DataFrame({'animal': {0: 'bird', 1: 'bird', 2: 'bird', 3: 'bird', 4: 'bird', 5: 'bird', 6: 'dog', 7: 'dog', 8: 'dog', 9: 'dog', 10: 'dog', 11: 'dog'}, 'cat1': {0: 'a', 1: 'a', 2: 'a', 3: 'b', 4: 'b', 5: 'b', 6: 'a', 7: 'a', 8: 'a', 9: 'b', 10: 'b', 11: 'b'}, 'cat2': {0: 'x', 1: 'y', 2: 'z', 3: 'x', 4: 'y', 5: 'z', 6: 'x', 7: 'y', 8: 'z', 9: 'x', 10: 'y', 11: 'z'}, 'val1': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 12}}) df2 = pd.DataFrame({'cat3': {0: 'i', 1: 'i', 2: 'j', 3: 'j', 4: 'i', 5: 'i', 6: 'j', 7: 'j'}, 'animal': {0: 'bird', 1: 'bird', 2: 'bird', 3: 'bird', 4: 'dog', 5: 'dog', 6: 'dog', 7: 'dog'}, 'cat1': {0: 'a', 1: 'b', 2: 'a', 3: 'b', 4: 'a', 5: 'b', 6: 'a', 7: 'b'}, 'val2': {0: 13, 1: 14, 2: 15, 3: 16, 4: 17, 5: 18, 6: 19, 7: 20}}) m = pd.merge(df1, df2[df2['cat3']=='i'], how='left', on=['animal', 'cat1']) print('df1, a few records\n', df1.head(3)) print('df2, a few records\n', df2.head()) print('merged and grouped\n', m.groupby(['animal', 'cat1', 'cat2']).sum()) -------- **OUTPUT:** df1, a few records animal cat1 cat2 val1 0 bird a x 1 1 bird a y 2 2 bird a z 3 df2, a few records cat3 animal cat1 val2 0 i bird a 13 1 i bird b 14 2 j bird a 15 3 j bird b 16 4 i dog a 17 merged and grouped val1 val2 animal cat1 cat2 bird a x 1 13 y 2 13 z 3 13 b x 4 14 y 5 14 z 6 14 dog a x 7 17 y 8 17 z 9 17 b x 10 18 y 11 18 z 12 18

1条回答

网友

1楼 · 发布于 2024-05-15 00:25:10

在这种情况下，可以使用groupby().transform()：

out = m.groupby(['animal', 'cat1', 'cat2']).sum()
out['val3'] = out['val2']*out['val1'] / out.groupby(['animal','cat1'])['val1'].transform('sum')

输出：

                  val1  val2      val3
animal cat1 cat2                      
bird   a    x        1    13  2.166667
            y        2    13  4.333333
            z        3    13  6.500000
       b    x        4    14  3.733333
            y        5    14  4.666667
            z        6    14  5.600000
dog    a    x        7    17  4.958333
            y        8    17  5.666667
            z        9    17  6.375000
       b    x       10    18  5.454545
            y       11    18  6.000000
            z       12    18  6.545455

更新：如果您也希望为所有cat3执行此操作，则只需将cat3放入groupby的：

# merge the whole df2
m = pd.merge(df1, df2, how='left', on=['animal', 'cat1'])

# cat3 here
out = m.groupby(['cat3','animal', 'cat1', 'cat2']).sum()

# sum across `cat2`:
# also throw `cat3` in groupby
sums = out.groupby(['cat3', 'animal','cat1'])['val1'].transform('sum')

out['val3'] = out['val2']*out['val1'] / sums

输出：

                       val1  val2      val3
cat3 animal cat1 cat2                      
i    bird   a    x        1    13  2.166667
                 y        2    13  4.333333
                 z        3    13  6.500000
            b    x        4    14  3.733333
                 y        5    14  4.666667
                 z        6    14  5.600000
     dog    a    x        7    17  4.958333
                 y        8    17  5.666667
                 z        9    17  6.375000
            b    x       10    18  5.454545
                 y       11    18  6.000000
                 z       12    18  6.545455
j    bird   a    x        1    15  2.500000
                 y        2    15  5.000000
                 z        3    15  7.500000
            b    x        4    16  4.266667
                 y        5    16  5.333333
                 z        6    16  6.400000
     dog    a    x        7    19  5.541667
                 y        8    19  6.333333
                 z        9    19  7.125000
            b    x       10    20  6.060606
                 y       11    20  6.666667
                 z       12    20  7.272727

相关问题更多 >

编程相关推荐

热门问题

热门文章