Python：支持索引的内存对象数据库？

27 投票

10 回答

17989 浏览

提问于 2025-04-16 12:47

我正在处理一些数据，如果能把一堆字典放到内存数据库里，然后简单地查询一下就简单多了。

比如，像这样：

people = db([
    {"name": "Joe", "age": 16},
    {"name": "Jane", "favourite_color": "red"},
])
over_16 = db.filter(age__gt=16)
with_favorite_colors = db.filter(favorite_color__exists=True)

不过，有三个让事情变得复杂的因素：

有些值会是Python对象，而把它们转成其他格式是没办法的（太慢了，而且会破坏它们的唯一性）。当然，我可以想办法解决这个问题（比如，把所有的项目放到一个大列表里，然后只保存它们在列表中的索引……但这可能需要不少调整）。
数据量会有上千条，而且我会进行很多查找操作（比如图形遍历），所以必须能高效地（也就是有索引的）进行查询。
就像例子中提到的，数据是非结构化的，所以那些需要我提前定义结构的系统会比较麻烦。

那么，有这样的东西吗？还是说我需要自己拼凑一个？

性能优化数据结构数据存储数据查询索引字典数据结构内存数据库非结构化数据

10 个回答

我知道的唯一解决方案是几年前在PyPI上偶然发现的一个包，叫做PyDbLite。这个包还不错，但有几个问题：

它还是想把所有东西都保存到磁盘上，存成一个pickle文件。不过这对我来说还算简单，我把这个功能去掉了。（其实这也没必要。如果插入的对象可以被序列化，那么整个集合也是可以的。）
基本的记录类型是一个字典，它会在里面插入自己的元数据，两个整数，分别用__id__和__version__作为键。
索引非常简单，仅仅是根据记录字典的值。如果你想要更复杂的，比如根据记录中某个对象的属性来索引，那你就得自己写代码了。（我本来也想自己做这个，但一直没时间。）

作者似乎偶尔会对这个包进行更新。从我使用它以来，有一些新功能，包括一些很不错的复杂查询语法。

假设你去掉了pickle的部分（我可以告诉你我是怎么做的），你的示例代码会是（未经测试的代码）：

from PyDbLite import Base

db = Base()
db.create("name", "age", "favourite_color")

# You can insert records as either named parameters
# or in the order of the fields
db.insert(name="Joe", age=16, favourite_color=None)
db.insert("Jane", None, "red")

# These should return an object you can iterate over
# to get the matching records.  These are unindexed queries.
#
# The first might throw because of the None in the second record
over_16 = db("age") > 16
with_favourite_colors = db("favourite_color") != None

# Or you can make an index for faster queries
db.create_index("favourite_color")
with_favourite_color_red = db._favourite_color["red"]

希望这些信息能帮你入门。

回答于 2025-04-16 由 Python大师

分享举报

如果内存数据库的解决方案太麻烦，这里有一种你可能会觉得有用的自定义过滤方法。

get_filter 函数接受一些参数，用来定义你想如何过滤一个字典，并返回一个可以传递给内置的 filter 函数的函数，这样就可以过滤字典列表了。

import operator

def get_filter(key, op=None, comp=None, inverse=False):
    # This will invert the boolean returned by the function 'op' if 'inverse == True'
    result = lambda x: not x if inverse else x
    if op is None:
        # Without any function, just see if the key is in the dictionary
        return lambda d: result(key in d)

    if comp is None:
        # If 'comp' is None, assume the function takes one argument
        return lambda d: result(op(d[key])) if key in d else False

    # Use 'comp' as the second argument to the function provided
    return lambda d: result(op(d[key], comp)) if key in d else False

people = [{'age': 16, 'name': 'Joe'}, {'name': 'Jane', 'favourite_color': 'red'}]

print filter(get_filter("age", operator.gt, 15), people)
# [{'age': 16, 'name': 'Joe'}]
print filter(get_filter("name", operator.eq, "Jane"), people)
# [{'name': 'Jane', 'favourite_color': 'red'}]
print filter(get_filter("favourite_color", inverse=True), people)
# [{'age': 16, 'name': 'Joe'}]

这个方法很容易扩展到更复杂的过滤，比如根据某个值是否符合正则表达式来进行过滤：

p = re.compile("[aeiou]{2}") # matches two lowercase vowels in a row
print filter(get_filter("name", p.search), people)
# [{'age': 16, 'name': 'Joe'}]

回答于 2025-04-16 由 Python大师

分享举报

你可以考虑使用内存中的SQLite数据库，这可以通过sqlite3标准库模块来实现，只需要在连接时使用特殊值:memory:。如果你不想自己写SQL语句，可以使用ORM（对象关系映射），比如SQLAlchemy，来访问这个内存中的SQLite数据库。

编辑：我注意到你提到值可能是Python对象，并且你希望避免序列化。要在数据库中存储任意的Python对象，实际上是需要进行序列化的。

如果你必须满足这两个要求，我可以提出一个实际的解决方案。为什么不直接使用Python字典作为你一系列Python字典的索引呢？听起来你在构建每个索引时会有一些特别的需求；先想清楚你要查询哪些值，然后写一个函数为每个索引生成并创建索引。你字典列表中某个键的可能值将成为索引的键；索引的值将是一个字典列表。通过提供你要查找的值作为键来查询索引。

import collections
import itertools

def make_indices(dicts):
    color_index = collections.defaultdict(list)
    age_index = collections.defaultdict(list)
    for d in dicts:
        if 'favorite_color' in d:
            color_index[d['favorite_color']].append(d)
        if 'age' in d:
            age_index[d['age']].append(d)
    return color_index, age_index


def make_data_dicts():
    ...


data_dicts = make_data_dicts()
color_index, age_index = make_indices(data_dicts)
# Query for those with a favorite color is simply values
with_color_dicts = list(
        itertools.chain.from_iterable(color_index.values()))
# Query for people over 16
over_16 = list(
        itertools.chain.from_iterable(
            v for k, v in age_index.items() if age > 16)
)

回答于 2025-04-16 由 Python大师

分享举报

Python：支持索引的内存对象数据库？

10 个回答

撰写回答