在python中,如何合并两个数据帧,同时使用另一个数据帧的权重在其中传播值?

2024-05-15 00:25:10 发布

您现在位置:Python中文网/ 问答频道 /正文

import pandas as pd

df1 = pd.DataFrame({'animal': {0: 'bird', 1: 'bird', 2: 'bird', 3: 'bird', 4: 'bird', 5: 'bird', 6: 'dog', 7: 'dog',
                               8: 'dog', 9: 'dog',
                               10: 'dog', 11: 'dog'},
                    'cat1': {0: 'a', 1: 'a', 2: 'a', 3: 'b', 4: 'b', 5: 'b', 6: 'a', 7: 'a', 8: 'a', 9: 'b', 10: 'b',
                             11: 'b'},
                    'cat2': {0: 'x', 1: 'y', 2: 'z', 3: 'x', 4: 'y', 5: 'z', 6: 'x', 7: 'y', 8: 'z', 9: 'x', 10: 'y',
                             11: 'z'},
                    'val1': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 12}})

df2 = pd.DataFrame({'cat3': {0: 'i', 1: 'i', 2: 'j', 3: 'j', 4: 'i', 5: 'i', 6: 'j', 7: 'j'},
                    'animal': {0: 'bird', 1: 'bird', 2: 'bird', 3: 'bird', 4: 'dog', 5: 'dog', 6: 'dog', 7: 'dog'},
                    'cat1': {0: 'a', 1: 'b', 2: 'a', 3: 'b', 4: 'a', 5: 'b', 6: 'a', 7: 'b'},
                    'val2': {0: 13, 1: 14, 2: 15, 3: 16, 4: 17, 5: 18, 6: 19, 7: 20}})
m = pd.merge(df1, df2[df2['cat3']=='i'], how='left', on=['animal', 'cat1'])

print('df1, a few records\n', df1.head(3))
print('df2, a few records\n', df2.head())
print('merged and grouped\n', m.groupby(['animal', 'cat1', 'cat2']).sum())
--------

**OUTPUT:**

df1, a few records
   animal cat1 cat2  val1
0   bird    a    x     1
1   bird    a    y     2
2   bird    a    z     3
df2, a few records
   cat3 animal cat1  val2
0    i   bird    a    13
1    i   bird    b    14
2    j   bird    a    15
3    j   bird    b    16
4    i    dog    a    17
merged and grouped
                   val1  val2
animal cat1 cat2            
bird   a    x        1    13
            y        2    13
            z        3    13
       b    x        4    14
            y        5    14
            z        6    14
dog    a    x        7    17
            y        8    17
            z        9    17
       b    x       10    18
            y       11    18
            z       12    18

我想创建一个包含val2扩展的新列val3 在所有叶记录中,以val1加权

val3列的前4个值为:

2.166667 = 13 * (1/(1+2+3))
4.333333 = 13 * (2/(1+2+3))
6.5 = 13 * (3/(1+2+3))
3.7333 = 14 * (4/(4+5+6)))...

我该怎么做?我不知道从哪里开始

虽然在本例中我是针对单个cat3元素进行此操作的,但实际上cat3对于dataframes和每个元素都是通用的,并且每个元素都会像本例中一样扩展


Tags: 元素pddf1df2printdogrecordscat1
1条回答
网友
1楼 · 发布于 2024-05-15 00:25:10

在这种情况下,可以使用groupby().transform()

out = m.groupby(['animal', 'cat1', 'cat2']).sum()
out['val3'] = out['val2']*out['val1'] / out.groupby(['animal','cat1'])['val1'].transform('sum')

输出:

                  val1  val2      val3
animal cat1 cat2                      
bird   a    x        1    13  2.166667
            y        2    13  4.333333
            z        3    13  6.500000
       b    x        4    14  3.733333
            y        5    14  4.666667
            z        6    14  5.600000
dog    a    x        7    17  4.958333
            y        8    17  5.666667
            z        9    17  6.375000
       b    x       10    18  5.454545
            y       11    18  6.000000
            z       12    18  6.545455

更新:如果您也希望为所有cat3执行此操作,则只需将cat3放入groupby的:

# merge the whole df2
m = pd.merge(df1, df2, how='left', on=['animal', 'cat1'])

# cat3 here
out = m.groupby(['cat3','animal', 'cat1', 'cat2']).sum()

# sum across `cat2`:
# also throw `cat3` in groupby
sums = out.groupby(['cat3', 'animal','cat1'])['val1'].transform('sum')

out['val3'] = out['val2']*out['val1'] / sums

输出:

                       val1  val2      val3
cat3 animal cat1 cat2                      
i    bird   a    x        1    13  2.166667
                 y        2    13  4.333333
                 z        3    13  6.500000
            b    x        4    14  3.733333
                 y        5    14  4.666667
                 z        6    14  5.600000
     dog    a    x        7    17  4.958333
                 y        8    17  5.666667
                 z        9    17  6.375000
            b    x       10    18  5.454545
                 y       11    18  6.000000
                 z       12    18  6.545455
j    bird   a    x        1    15  2.500000
                 y        2    15  5.000000
                 z        3    15  7.500000
            b    x        4    16  4.266667
                 y        5    16  5.333333
                 z        6    16  6.400000
     dog    a    x        7    19  5.541667
                 y        8    19  6.333333
                 z        9    19  7.125000
            b    x       10    20  6.060606
                 y       11    20  6.666667
                 z       12    20  7.272727

相关问题 更多 >

    热门问题