优化 Netcdf 到 Python 代码

Question

我有一个Python脚本，它可以读取netcdf文件，并把气候数据一行一行地插入到PostgreSQL数据库的表里。这样做当然非常慢，所以我想找办法来优化这个代码。

我在考虑做一个很大的列表，然后使用复制命令来一次性插入数据。不过，我不太确定具体该怎么操作。还有一种方法是先把数据写入一个CSV文件，然后再用PostgreSQL的COPY命令把这个CSV文件导入到数据库里。我觉得这样可能比一行一行插入要快。

如果你有任何优化的建议，我会非常感激。这个netcdf文件可以在这里找到（不过需要注册）： http://badc.nerc.ac.uk/browse/badc/cru/data/cru_ts/cru_ts_3.21/data/pre

# NetCDF to PostGreSQL database
# CRU-TS 3.21 precipitation and temperature data. From NetCDF to database table
# Requires Python2.6, Postgresql, Psycopg2, Scipy
# Tested using Vista 64bit.

# Import modules
import psycopg2, time, datetime
from scipy.io import netcdf

# Establish connection
db1 = psycopg2.connect("host=192.168.1.162 dbname=dbname user=username password=password")
cur = db1.cursor()
### Create Table
print str(time.ctime())+ " Creating precip table."
cur.execute("DROP TABLE IF EXISTS precip;")
cur.execute("CREATE TABLE precip (gid serial PRIMARY KEY not null, year int, month int, lon decimal, lat decimal, pre decimal);")

### Read netcdf file
f = netcdf.netcdf_file('/home/username/output/project_v2/inputdata/precipitation/cru_ts3.21.1901.2012.pre.dat.nc', 'r')
##
### Create lathash
print str(time.ctime())+ " Looping through lat coords."
temp = f.variables['lat'].data.tolist()
lathash = {}
for entry in temp:
    print str(entry)
    lathash[temp.index(entry)] = entry
##
### Create lonhash
print str(time.ctime())+ " Looping through long coords."
temp = f.variables['lon'].data.tolist()
lonhash = {}
for entry in temp:
    print str(entry)
    lonhash[temp.index(entry)] = entry
##
### Loop through every observation. Set timedimension and lat and long observations.
for _month in xrange(1344):

    if _month < 528:
        print(str(_month))
        print("Not yet")
    else:
        thisyear = int((_month)/12+1901)
        thismonth = ((_month) % 12)+1
        thisdate = datetime.date(thisyear,thismonth, 1)
        print(str(thisdate))
        _time = int(_month)
        for _lon in xrange(720):
            for _lat in xrange(360):
                data = [int(thisyear), int(thismonth), lonhash[_lon], lathash[_lat], f.variables[('pre')].data[_time, _lat, _lon]]
                cur.execute("INSERT INTO precip (year, month, lon, lat, pre) VALUES "+str(tuple(data))+";")


db1.commit()
cur.execute("CREATE INDEX idx_precip ON precip USING btree(year, month, lon, lat, pre);")
cur.execute("ALTER TABLE precip ADD COLUMN geom geometry;")
cur.execute("UPDATE precip SET geom = ST_SetSRID(ST_Point(lon,lat), 4326);")
cur.execute("CREATE INDEX idx_precip_geom ON precip USING gist(geom);")


db1.commit()
cur.close()
db1.close()            
print str(time.ctime())+ " Done!"

数据处理 postgresql 数据插入数据库优化 csv文件 netcdf 气候数据复制命令

优化 Netcdf 到 Python 代码

2 个回答

撰写回答