为什么写psycopg2要花这么长时间?

2024-03-28 21:11:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我写了一个更新表的脚本。因为找不到“批处理”更新的方法,所以我的脚本一次更新一行表。我假设对于一组100000行,更新需要几秒钟。你知道吗

不需要。每次写入操作大约需要100毫秒。整个写操作需要(((100000*100)/1000)/60)/60)=2.77小时。为什么写这么久?你知道吗

下面是我使用的代码:

import psycopg2
...
entries = get_all_entries()
conn = psycopg2.connect(params)
try:
    for entry in entries:
        cursor = conn.cursor()
        cursor.execute(UPDATE_QUERY.format(entry.field1, entry.field2))
        cursor.close()
finally:
    conn.close()

我做错什么了?你知道吗


Tags: 方法代码import脚本closegetconnectall
2条回答

你试过:

cursor = conn.cursor()
for entry in entries:
     cursor.execute(UPDATE_QUERY.format(entry.field1, entry.field2))

cursor.close()

你可以用https://docs.python.org/3/library/profile.html来分析这段代码

您可以使用^{}方法将数据上载到服务器端临时表中,然后使用单个SQL更新表,而不是从客户端逐行更新表。你知道吗

下面是一个人工的例子:

#!/usr/bin/env python

import time, psycopg2
from random import random
from cStringIO import StringIO

CRowCount = 100000

conn = psycopg2.connect('')
conn.autocommit = False

print('Prepare playground...')
cur = conn.cursor()
cur.execute("""
    drop table if exists foo;
    create table foo(i int primary key, x float);
    insert into foo select i, 0 from generate_series(1,%s) as i;
""", (CRowCount,))
print('Done.')
cur.close();
conn.commit();

print('\nTest update row by row...')
tstart = time.time()
cur = conn.cursor()
for i in xrange(1,CRowCount+1):
    cur.execute('update foo set x = %s where i = %s', (random(), i));
conn.commit()
cur.close()
print('Done in %s s.' % (time.time() - tstart))

print('\nTest batch update...')
tstart = time.time()
cur = conn.cursor()
# Create temporary table to hold our data
cur.execute('create temp table t(i int, x float) on commit drop')
# Create and fill the buffer from which data will be uploaded
buf = StringIO()
for i in xrange(1,CRowCount+1):
    buf.write('%s\t%s\n' % (i, random()))
buf.seek(0)
# Upload data from the buffer to the temporary table
cur.copy_from(buf, 't')
# Update test table using data previously uploaded
cur.execute('update foo set x = t.x from t where foo.i = t.i')
cur.close();
conn.commit();
print('Done in %s s.' % (time.time() - tstart))

输出:

Prepare playground...
Done.

Test update row by row...
Done in 62.1189928055 s.

Test batch update...
Done in 3.95668387413 s.

如你所见,第二条路的速度大约快20倍。你知道吗

相关问题 更多 >