如何根据索引的最大值差异创建新列？

index_1 index_2 cum_value 0 2020-01 100.00 0 2020-02 50.00 0 2020-03 -50.00 0 2020-04 150.00 0 2020-05 200.00 1 2020-01 25.00 1 2020-02 50.00 1 2020-03 -100.00 1 2020-04 50.00 1 2020-05 200.00

index_1 index_2 cum_value new_col 0 2020-01 100.00 100.00 --> first positive value on index_1 [0] 0 2020-02 50.00 0.00 0 2020-03 -50.00 0.00 0 2020-04 150.00 50.00 --> (150 - 100) 0 2020-05 200.00 50.00 --> (200 - 150) 1 2020-01 25.00 25.00 --> first positive value on index_1 [1] 1 2020-02 50.00 25.00 --> (50 - 25) 1 2020-03 -100.00 0.00 1 2020-04 50.00 0.00 1 2020-05 200.00 150.00 --> (200 - 50)

1条回答

网友

1楼 · 发布于 2024-05-15 09:11:04

代码

c = df.groupby(level=0)['cum_value'].cummax()
m = df['cum_value'].ge(c) & df['cum_value'].ge(0)
df['new_col'] = df.loc[m, 'cum_value'].groupby(level=0).diff()
df['new_col'] = df['new_col'].fillna(df['cum_value']).mask(~m, 0)

解释

让我们group在level=0上的数据帧，即index_1并使用cummax转换列cum_value，以计算每个level=0组的累积最大值：

>>> c

index_1  index_2
0        2020-01    100.0
         2020-02    100.0
         2020-03    100.0
         2020-04    150.0
         2020-05    200.0
1        2020-01     25.0
         2020-02     50.0
         2020-03     50.0
         2020-04     50.0
         2020-05    200.0
Name: cum_value, dtype: float64

现在，将cum_value列与上面计算的累积最大值进行比较，以创建布尔掩码。注意，我们只考虑cum_value中的正值。此布尔掩码背后的思想是，如果当前月份的值大于或等于前几个月的最大值，则此掩码的输出将为True，否则False

>>> m

index_1  index_2
0        2020-01     True
         2020-02    False
         2020-03    False
         2020-04     True
         2020-05     True
1        2020-01     True
         2020-02     True
         2020-03    False
         2020-04     True
         2020-05     True
Name: cum_value, dtype: bool

因为我们只对满足上述条件的cum_value列中的值感兴趣，所以我们可以使用布尔屏蔽来过滤这些值

>>> df.loc[m, 'cum_value']

index_1  index_2
0        2020-01    100.0
         2020-04    150.0
         2020-05    200.0
1        2020-01     25.0
         2020-02     50.0
         2020-04     50.0
         2020-05    200.0
Name: cum_value, dtype: float64

现在group在level=0上，即index_1上，使用cum_value列上的diff来计算当前值和先前最大值之间的差异：

>>> df.loc[m, 'cum_value'].groupby(level=0).diff()

index_1  index_2
0        2020-01      NaN
         2020-04     50.0
         2020-05     50.0
1        2020-01      NaN
         2020-02     25.0
         2020-04      0.0
         2020-05    150.0
Name: cum_value, dtype: float64

最后，在新创建的new_col中填充NaN值，并用不满足条件m的0屏蔽这些值：

>>> df
                 cum_value  new_col
index_1 index_2                    
0       2020-01      100.0    100.0
        2020-02       50.0      0.0
        2020-03      -50.0      0.0
        2020-04      150.0     50.0
        2020-05      200.0     50.0
1       2020-01       25.0     25.0
        2020-02       50.0     25.0
        2020-03     -100.0      0.0
        2020-04       50.0      0.0
        2020-05      200.0    150.0

代码

解释

相关问题更多 >

编程相关推荐

热门问题

热门文章