列表切片中的快速成员身份

2024-04-18 18:27:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个非常大的列表叫做“数据”,我需要回答相当于

if (x in data[a:b]):

对于a、b和x的不同值

有没有可能对数据进行预处理以使这些查询更快


Tags: 数据in列表dataif
3条回答

想法

您可以创建dict。对于每个元素,存储发生位置的排序列表。你知道吗

回答查询:二进制搜索第一个大于或等于a的元素,检查它是否存在并且小于b

伪码

预处理:

from collections import defaultdict

byvalue = defaultdict(list)

for i, x in enumerate(data):
    byvalue[x].append(i)

查询:

def has_index_in_slice(indices, a, b):
   r = bisect.bisect_left(indices, a)

   return r < len(indices) and indices[r] < b

def check(byvalue, x, a, b):
    indices = byvalue.get(x, None)
    if not indices: return False

    return has_index_in_slice(indices, a, b)

如果我们假设listdict具有O(1)“按索引获取”复杂性,那么这里的复杂性是每个查询的O(log N)。你知道吗

将列表放入数据库,并利用内置的索引、优化和缓存。例如,在PostgreSQL手册中:

Once an index is created, no further intervention is required: the system will update the index when the table is modified, and it will use the index in queries when it thinks doing so would be more efficient than a sequential table scan.

但为了简单起见(以及Python标准库中的可用性),也可以使用sqlite。从Python's documentation, regarding indexing

A Row instance serves as a highly optimized row_factory for Connection objects. It tries to mimic a tuple in most of its features.

It supports mapping access by column name and index, iteration, representation, equality testing and len().

在那一页的其他地方:

Row provides both index-based and case-insensitive name-based access to columns with almost no memory overhead. It will probably be better than your own custom dictionary-based approach or even a db_row based solution.

是的,您可以将这些切片预处理为集合,从而使成员资格查找O(1)而不是O(n)

check = set(data[a:b])
if x in check:
    # do something
if y in check:
    # do something else

相关问题 更多 >