计算与电流有关的骨料值

2024-03-29 14:42:58 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我们有这些数据:

df = pd.DataFrame({
    'group_id': [100,100,100,101,101,101,101],
    'amount': [30,40,10,20,25,80,40]
})
df.index.name = 'id'
df.set_index(['group_id', df.index], inplace=True)

看起来是这样的:

             amount
group_id id        
100      0       30
         1       40
         2       10
101      3       20
         4       25
         5       80
         6       40

我们的目标是计算一个新的列,它是所有小于当前列的量的总和。也就是说,我们想要这个结果。你知道吗

             amount  sum_of_smaller_amounts
group_id id                                
100      0       30                      10  
         1       40                      40  # 30 + 10
         2       10                       0  # smallest amount
101      3       20                       0  # smallest
         4       25                      20  
         5       80                      85  # 20 + 25 + 40
         6       40                      45  # 20 + 25

理想情况下,这应该是(非常)有效的,因为真正的数据帧可以是数百万行。你知道吗


Tags: 数据nameidtrue目标dataframedfindex
3条回答

我在pandas方面处于中等水平,不确定效率,但这里有一个解决方案:

temp_df = df.sort_values(['group_id','amount'])
temp_df = temp_df.mask(temp_df['amount'] == temp_df['amount'].shift(), other=0).groupby(level='group_id').cumsum()

df['sum'] = temp_df.sort_index(level='id')['amount'] - df['amount']

结果:

             amount  sum
group_id id             
100      0       30   10
         1       40   40
         2       10    0
101      3       20    0
         4       25   20
         5       80   85
         6       40   45
         7       40   45

如果最后一行有助于提高效率,您可以用它们代替:

df['sum'] = df.subtract(temp_df).multiply(-1)

# or

df['sum'] = (~df).add(temp_df + 1)

更好的解决方案(我认为):

df['sum_smaller_amount'] = (df_sort.groupby('group_id')['amount']
                                   .transform(lambda x: x.mask(x.duplicated(),0).cumsum()) - 
                                   df['amount'])

输出:

             amount  sum_smaller_amount
group_id id                            
100      0       30                10.0
         1       40                40.0
         2       10                 0.0
101      3       20                 0.0
         4       25                20.0
         5       80                85.0
         6       40                45.0

另一种方法是使用笛卡尔积和过滤器:

df.merge(df.reset_index(), on='group_id', suffixes=('_sum_smaller',''))\
  .query('amount_sum_smaller < amount')\
  .groupby(['group_id','id'])[['amount_sum_smaller']].sum()\
  .join(df, how='right').fillna(0)

输出:

             amount_sum_smaller  amount
group_id id                            
100      0                 10.0      30
         1                 40.0      40
         2                  0.0      10
101      3                  0.0      20
         4                 20.0      25
         5                 85.0      80
         6                 45.0      40

您需要sort_valuescumsum

df['new_amount']= (df.sort_values('amount')
                    .groupby(level='group_id')
                    ['amount'].cumsum() - df['amount'])

输出:

             amount  new_amount
group_id id                    
100      0       30          10
         1       40          40
         2       10           0
101      3       20           0
         4       25          20
         5       80          85
         6       40          45

更新:修复重复值:

# the data
df = pd.DataFrame({
    'group_id': [100,100,100,100,101,101,101,101],
    'amount': [30,40,10,30,20,25,80,40]
})
df.index.name = 'id'
df.set_index(['group_id', df.index], inplace=True)

# sort values:
df_sorted = df.sort_values('amount')

# cumsum
s1 = df_sorted.groupby('group_id')['amount'].cumsum()

# value counts
s2 = df_sorted.groupby(['group_id', 'amount']).cumcount() + 1

# instead of just subtracting df['amount'], we subtract amount * counts
df['new_amount'] = s1 - df['amount'].mul(s2)

输出(注意组100中的两个值30)

             amount  new_amount
group_id id                    
100      0       30          10
         1       40          70
         2       10           0
         3       30          10
101      4       20           0
         5       25          20
         6       80          85
         7       40          45

相关问题 更多 >