如何在子集(切片)计算之后更新原始数据帧?

2024-04-25 19:14:44 发布

您现在位置:Python中文网/ 问答频道 /正文

考虑以下示例:

df = pd.DataFrame(
            {'a': ['one', 'one', 'one', 'one', 'two', 'two', 'two', 'three', 'four'],
            'b': ['x', 'y','x', 'y', 'x', 'y', 'x', 'x', 'x'],
            'c': np.random.randn(9)}
         )

df['sum_c_3'] = 99.99

输出:

>>> df
       a  b         c  sum_c_3
0    one  x  1.296379    99.99
1    one  y  0.201266    99.99
2    one  x  0.953963    99.99
3    one  y  0.322922    99.99
4    two  x  0.887728    99.99
5    two  y -0.154389    99.99
6    two  x -2.390790    99.99
7  three  x -1.218706    99.99
8   four  x -0.043964    99.99

现在我要做很多操作,所以举一个例子,我将计算3条next记录的总和,并将结果保存在新列中,如下所示:

for w in ['one','two','three','four']:
    x = df.loc[df['a']==w]
    size = x.iloc[:]['a'].count()
    print("Records %s: %s" %(w,size))
    target_column = x.columns.get_loc('c')
    for i in range(0,size):
        idx = x.index
        acum = x.iloc[i:i+3,target_column].sum()
        x.loc[x.loc[idx,'sum_c_3'].index[i],'sum_c_3'] = acum
    print (x) 

输出:

Records one: 4
     a  b         c   sum_c_3
0  one  x  1.296379  2.451607
1  one  y  0.201266  1.478151
2  one  x  0.953963  1.276885
3  one  y  0.322922  0.322922
Records two: 3
     a  b         c   sum_c_3
4  two  x  0.887728 -1.657452
5  two  y -0.154389 -2.545180
6  two  x -2.390790 -2.390790
Records three: 1
       a  b         c   sum_c_3
7  three  x -1.218706 -1.218706
Records four: 1
      a  b         c   sum_c_3
8  four  x -0.043964 -0.043964

最后我的疑问是:如何更新原始数据帧?你知道吗

我能自动切片并保存总和吗?或者我应该使用series(slice)by索引进行更新?你知道吗

原版保持不变,无任何更新,请参见:

>>> df
       a  b         c  sum_c_3
0    one  x  1.296379    99.99
1    one  y  0.201266    99.99
2    one  x  0.953963    99.99
3    one  y  0.322922    99.99
4    two  x  0.887728    99.99
5    two  y -0.154389    99.99
6    two  x -2.390790    99.99
7  three  x -1.218706    99.99
8   four  x -0.043964    99.99
>>> 

Tags: intargetdfforsizeonelocthree
1条回答
网友
1楼 · 发布于 2024-04-25 19:14:44

for loop末尾添加update

for w in ['one','two','three','four']:
    x = df.loc[df['a']==w]
    size = x.iloc[:]['a'].count()
    print("Records %s: %s" %(w,size))
    target_column = x.columns.get_loc('c')
    for i in range(0,size):
        idx = x.index
        acum = x.iloc[i:i+3,target_column].sum()
        x.loc[x.loc[idx,'sum_c_3'].index[i],'sum_c_3'] = acum
    print (x)
    df.update(x)# here is the one need to add

df
Out[979]: 
       a  b         c   sum_c_3
0    one  x  0.127171  0.210872
1    one  y -0.576157  1.212010
2    one  x  0.659859  1.788168
3    one  y  1.128309  1.128309
4    two  x  0.333521 -0.846657
5    two  y  0.753613 -1.180178
6    two  x -1.933791 -1.933791
7  three  x  0.549009  0.549009
8   four  x  0.895742  0.895742

相关问题 更多 >