大熊猫的累计和

2021-02-25 04:34:04 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的数据帧表:

col1 col2 col3 col4 col5 col6 col7 
1      1    1    1   137  500  11
1      1    1    1   120  500  11
1      1    2    1   101  500  11
1      1    3    1   55   500  11
1      2    2    1   133  340  12
1      2    2    1   125  340  12
1      2    1    1   63   340  12

我必须使用col6值和col5的累计和之间的差值来更新数据帧,并将该值存储在名为“updated”的单独列中。例如,累积和应达到Col7中的值: col8值为:

^{pr2}$

你能提出一些解决办法吗? 我得用Python熊猫。在

3条回答
网友
1楼 ·

它代表SQL Server

样本表

CREATE TABLE #TEMP(col1 INT, col2 INT, col3 INT, col4 INT, col5 INT, col6 INT, col7 INT)

INSERT INTO #TEMP
SELECT 1,      1,    1,    1,   137,  500,  11
UNION ALL
SELECT 1,      1,    1,    1,   120,  500,  11
UNION ALL
SELECT 1,      1,    2,    1,   101,  500,  11
UNION ALL
SELECT 1,      1,    3,    1,   55,   500,  11
UNION ALL
SELECT 1,      2,    2,    1,   133,  340,  12
UNION ALL
SELECT 1,      2,    2,    1,   125,  340,  12
UNION ALL
SELECT 1,      2,    1,    1,   63,   340,  12

查询

^{pr2}$

编辑:

如果要运行total并将其存储在COL8中,可以尝试下面的查询

;WITH CTE AS
(
     Retrieve row number for each type of COL6 in default order
   SELECT ROW_NUMBER() OVER(PARTITION BY COL6 ORDER BY (SELECT 0))rNO,*
   FROM #TEMP
)
SELECT col1 , col2, col3 , col4 , col5 , col6 , col7,
(
     SELECT SUM(COL5)+col6 
     FROM CTE   
     WHERE RNO<=C2.RNO AND COL6=C2.COL6
     GROUP BY col6
)COL8
FROM CTE C2

编辑2:使用表的更新查询更新

;WITH CTE AS
(
     Retrieve row number for each type of COL6 in default order
   SELECT ROW_NUMBER() OVER(PARTITION BY COL6 ORDER BY (SELECT 0))rNO,*
   FROM #TEMP
)
UPDATE #TEMP SET COL8 = TAB.COL8 
FROM
(
    SELECT col1 , col2, col3 , col4 , col5 , col6 , col7,
    (
       SELECT SUM(COL5)+col6 
       FROM CTE   
       WHERE RNO<=C2.RNO AND COL6=C2.COL6
       GROUP BY col6
    )COL8
    FROM CTE C2
)TAB
WHERE TAB.COL5=#TEMP.COL5 AND TAB.COL6=#TEMP.COL6
网友
2楼 ·

我认为您使用cumsum的尝试可能行不通,因为您没有按col7分组-从示例计算中可以明显看出,您只计算col7每个值内的累计和,因此我认为您希望:

df['cumsums'] = df.groupby('col7')['col5'].cumsum()

df['updated'] = df['col6'] - df['cumsums']

df
Out[8]: 
   col1  col2  col3  col4  col5  col6  col7  updated  cumsums
0     1     1     1     1   137   500    11      363      137
1     1     1     1     1   120   500    11      243      257
2     1     1     2     1   101   500    11      142      358
3     1     1     3     1    55   500    11       87      413
4     1     2     2     1   133   340    12      207      133
5     1     2     2     1   125   340    12       82      258
6     1     2     1     1    63   340    12       19      321
网友
3楼 ·

试试这个。找到running total有不同的方法。在

;WITH cte
     AS (SELECT *,
                Row_number()OVER(partition BY col6 ORDER BY col6)rn
         FROM   Yourtable) As Col8
SELECT *,
       col6 - (SELECT Sum(b.col5)
               FROM   cte b
               WHERE  a.col6 = b.col6
                      AND b.rn <= a.rn)
FROM   cte a;

如果您正在使用Sql Server 2012+,请尝试此操作。在

^{pr2}$

SqlFiddle Demo

相关问题