在SQL Update循环中动态设置具有多个版本的记录字段

2024-04-26 03:53:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我在Postgresql中计算SQL函数时遇到问题。我已经成功地用python实现了这一点,但是在一个有数百万条记录的表上需要很长时间。你知道吗

我拥有的是一个“示例表”,其结构如下,数据类似于下面的示例:

示例表

id  | version  | valid_from          | valid_to             | time_valid
1   | 1        | 2010-03-21 19:00:00 | 2010-03-21 19:00:00  | NULL
1   | 2        | 2011-02-02 09:00:00 | 2011-02-02 09:00:00  | NULL
1   | 3        | 2012-04-20 15:00:00 | 2012-04-20 15:00:00  | NULL
2   | 1        | 2012-07-02 04:00:00 | 2012-07-02 04:00:00  | NULL
3   | 1        | 2011-05-05 05:00:00 | 2011-05-05 05:00:00  | NULL`

如您所见,我有3个id为“1”的记录,每个记录都是相应的版本(即,在本例中为1:3)

我想更新版本2和版本1,在以后的版本中将'valid\u to'设置为'valid\u from'值。你知道吗

更新表

id | version  | valid_from             | valid_to                | time_valid
1  | 1        | 2010-03-21 19:00:00    | **2011-02-02 09:00:00** | **Some Time**
1  | 2        |**2011-02-02 09:00:00** | **2012-04-20 15:00:00** | **Some Time**
1  | 3        |**2012-04-20 15:00:00** | 2012-04-20 15:00:00     | NULL
2  | 1        |2012-07-02 04:00:00     | 2012-07-02 04:00:00     | NULL
3  | 1        |2011-05-05 05:00:00     | 2011-05-05 05:00:00     | NULL

有些记录有许多版本,而有些记录可能没有(只有一个)。同时计算一个time\u valid字段也很方便,我假设这是通过从时间戳中减去valid\u to和valid\u来完成的。再一次,我有数以百万计的记录和多个表格,我需要这样做-所以更快确实是更好的。你知道吗

非常感谢任何工作代码的例子!你知道吗

根据请求,这里是我目前拥有的python代码。我尝试过对它的一些修改,比如使用limits、executemany、fetchmany、iterators……但在所有情况下,它要么消耗了所有的本地内存,要么崩溃了,要么速度非常慢:

cur.execute('''SELECT id, valid_from,valid_to, version FROM hist_line where valid_to = valid_from limit 10000;''')
for rec in cur.fetchall():
        r = 'SELECT id, valid_from, valid_to, version FROM hist_line WHERE id = %s and version = %s;' % (rec['id'],rec['version']+1)
        cur1.execute(r)
        r = cur1.fetchone()
        if r:
            out = {'id': rec['id'], 'valid_from':rec['valid_from'],'valid_to':r['valid_from'],'version':rec['version'],'time_valid':r[1]-rec[1]}
            cur1.execute('''UPDATE hist_line SET valid_to = %(valid_to)s
        WHERE id = %(id)s and version = %(version)s and valid_from = %(valid_from)s and valid_from = valid_to''', out)

Tags: andtofrom版本id示例executetime
1条回答
网友
1楼 · 发布于 2024-04-26 03:53:18

Lead()lag()window-functions中的两个。它们允许您(在本例中)访问“previous”或“next”记录,给定一定的顺序,您必须在OVER( ...)WINDOW( ...)子句中指定。你知道吗

  the data
CREATE TABLE ExampleTable
        (id INTEGER NOT NULL
        , version INTEGER NOT NULL
        , valid_from          TIMESTAMP NOT NULL
        , valid_to              TIMESTAMP NOT NULL
        , time_valid    text
        );

INSERT INTO ExampleTable(id, version, valid_from, valid_to, time_valid) VALUES
 (1  , 1        , '2010-03-21 19:00:00' , '2010-03-21 19:00:00'  ,NULL)
,(1   , 2        , '2011-02-02 09:00:00' , '2011-02-02 09:00:00' ,NULL)
,(1   , 3        , '2012-04-20 15:00:00' , '2012-04-20 15:00:00' ,NULL)
,(2   , 1        , '2012-07-02 04:00:00' , '2012-07-02 04:00:00' ,NULL)
,(3   , 1        , '2011-05-05 05:00:00' , '2011-05-05 05:00:00' ,NULL)
        ;

  Check what the update will do
SELECT dst.* , src.lll AS newvalue
FROM ExampleTable dst
JOIN    (
        SELECT id,version
                , lead(valid_from) OVER (partition by id ORDER BY version) lll
        FROM ExampleTable
        ) src
ON src.id = dst.id AND src.version = dst.version
WHERE src.lll IS NOT NULL
ORDER BY id,version
        ;


  Do the update (remove the explain if it looks okay)
EXPLAIN
UPDATE ExampleTable dst
        SET valid_to = src.lll
FROM    (
        SELECT id,version
                , lead(valid_from) OVER (partition by id ORDER BY version) lll
        FROM ExampleTable
        ) src
WHERE src.id = dst.id
AND src.version = dst.version
AND src.lll IS NOT NULL
        ;


SELECT * FROM ExampleTable
ORDER BY id,version
        ;

相关问题 更多 >