使用SQLite的NumPy数组

6 投票

4 回答

8552 浏览

提问于 2025-04-17 05:02

我在Python中见到的最常用的SQLite接口是sqlite3，但是有没有什么东西可以很好地与NumPy数组或记录数组（recarrays）配合使用呢？我指的是那种能够识别数据类型，不需要一行一行插入数据，并且可以提取成NumPy（rec）数组的工具……有点像R语言中的SQL函数，比如RDB或sqldf库，如果有人熟悉的话（它们可以将整个表或表的子集导入/导出/追加到R的数据表中）。

numpy sqlite 数据库接口记录数组数据导入导出数据类型识别

4 个回答

这看起来有点老旧，但有没有什么原因让你不能直接用fetchall()呢？这样的话，直接在声明时初始化numpy就可以了。

回答于 2025-04-17 由 Python大师

分享举报

Doug 提出的使用 redis 的建议挺不错的，但我觉得他的代码有点复杂，因此速度也比较慢。对我来说，我需要在不到十分之一秒的时间内，把大约一百万个浮点数的方阵进行序列化（也就是把数据转换成可以存储的格式）并写入，然后再读取并反序列化（把存储的数据转换回原来的格式），所以我这样做：

写入时：

snapshot = np.random.randn(1024,1024)
serialized = snapshot.tobytes()
rs.set('snapshot_key', serialized)

然后读取时：

s = rs.get('snapshot_key')
deserialized = np.frombuffer(s).astype(np.float32)
rank = np.sqrt(deserialized.size).astype(int)
snap = deserialized(rank, rank)

你可以用 ipython 进行一些基本的性能测试，使用 %time 命令，但无论是 tobytes 还是 frombuffer 都不会超过几毫秒。

回答于 2025-04-17 由 Python大师

分享举报

为什么不试试redis呢？

你感兴趣的两个平台都有相应的驱动程序可用——Python（用的是redis，可以通过包索引2找到），还有R（用的是rredis，可以在CRAN上找到）。

redis的聪明之处并不是它能神奇地识别NumPy数据类型，让你像使用本地数据类型一样插入和提取多维NumPy数组，而是它让你用几行代码就能轻松创建这样的接口，这才是它的真正魅力。

关于在Python中使用redis的教程有好几个，其中DeGizmo博客上的那篇特别不错。

import numpy as NP

# create some data
A = NP.random.randint(0, 10, 40).reshape(8, 5)

# a couple of utility functions to (i) manipulate NumPy arrays prior to insertion 
# into redis db for more compact storage & 
# (ii) to restore the original NumPy data types upon retrieval from redis db
fnx2 = lambda v : map(int, list(v))
fnx = lambda v : ''.join(map(str, v))

# start the redis server (e.g. from a bash prompt)
$> cd /usr/local/bin      # default install directory for 'nix
$> redis-server           # starts the redis server

# start the redis client:
from redis import Redis
r0 = Redis(db=0, port=6379, host='localhost')       # same as: r0 = Redis()

# to insert items using redis 'string' datatype, call 'set' on the database, r0, and
# just pass in a key, and the item to insert
r0.set('k1', A[0,:])

# row-wise insertion the 2D array into redis, iterate over the array:
for c in range(A.shape[0]):
    r0.set( "k{0}".format(c), fnx(A[c,:]) )

# or to insert all rows at once
# use 'mset' ('multi set') and pass in a key-value mapping: 
x = dict([sublist for sublist in enumerate(A.tolist())])
r0.mset(x1)

# to retrieve a row, pass its key to 'get'
>>> r0.get('k0')
  '63295'

# retrieve the entire array from redis:
kx = r0.keys('*')           # returns all keys in redis database, r0

for key in kx :
    r0.get(key)

# to retrieve it in original form:
A = []
for key in kx:
    A.append(fnx2(r0.get("{0}".format(key))))

>>> A = NP.array(A)
>>> A
  array([[ 6.,  2.,  3.,  3.,  9.],
         [ 4.,  9.,  6.,  2.,  3.],
         [ 3.,  7.,  9.,  5.,  0.],
         [ 5.,  2.,  6.,  3.,  4.],
         [ 7.,  1.,  5.,  0.,  2.],
         [ 8.,  6.,  1.,  5.,  8.],
         [ 1.,  7.,  6.,  4.,  9.],
         [ 6.,  4.,  1.,  3.,  6.]])

回答于 2025-04-17 由 Python大师

分享举报

使用SQLite的NumPy数组

4 个回答

撰写回答