查找Python字典键

10 投票
6 回答
13609 浏览
提问于 2025-04-16 12:52

我想知道如何对一个Python字典里的键进行某种索引。这个字典大约有40万个项目,所以我想避免线性搜索。

基本上,我是想查找userinput是否在字典的任何键里面。

for keys in dict:
    if userinput in keys:
        DoSomething()
        break

这就是我想做的一个例子。有没有办法可以更直接地搜索,而不需要用循环?或者有什么更有效的方法。

说明:这个userinput并不一定和键完全相同,比如userinput可能是log,而键是logfile

编辑:在搜索之前,任何列表/缓存的创建、预处理或组织都是可以的。唯一需要快速的就是查找键的过程。

6 个回答

3

不行。要在字典的键中查找字符串,唯一的方法就是一个个去看每个键。你提到的方法是用字典查找的唯一方式。

不过,如果你有40万条记录,想加快搜索速度,我建议你使用SQLite数据库。这样你可以直接用一句话来查询,比如 SELECT * FROM TABLE_NAME WHERE COLUMN_NAME LIKE '%userinput%';。你可以查看Python的sqlite3模块的文档,在这里

另一个选择是使用生成器表达式,因为它们通常比普通的for循环要快。

filteredKeys = (key for key in myDict.keys() if userInput in key)
for key in filteredKeys:
    doSomething()

编辑:如果你说的没错,不在乎一次性的成本,那就用数据库。SQLite几乎能完美地满足你的需求。

我做了一些基准测试,结果让我惊讶,简单的算法实际上比使用列表推导的版本快两倍,比使用SQLite的版本快六倍。根据这些结果,我得支持@Mark Byers,推荐使用Trie(字典树)。我在下面贴了基准测试的结果,供有兴趣的人尝试。

import random, string, os
import time
import sqlite3

def buildDict(numElements):
    aDict = {}
    for i in xrange(numElements-10):
        aDict[''.join(random.sample(string.letters, 6))] = 0

    for i in xrange(10):
        aDict['log'+''.join(random.sample(string.letters, 3))] = 0

    return aDict

def naiveLCSearch(aDict, searchString):
    filteredKeys = [key for key in aDict.keys() if searchString in key]
    return filteredKeys

def naiveSearch(aDict, searchString):
    filteredKeys = []
    for key in aDict:
        if searchString in key: 
            filteredKeys.append(key)
    return filteredKeys

def insertIntoDB(aDict):
    conn = sqlite3.connect('/tmp/dictdb')
    c = conn.cursor()
    c.execute('DROP TABLE IF EXISTS BLAH')
    c.execute('CREATE TABLE BLAH (KEY TEXT PRIMARY KEY, VALUE TEXT)')
    for key in aDict:
        c.execute('INSERT INTO BLAH VALUES(?,?)',(key, aDict[key]))
    return conn

def dbSearch(conn):
    cursor = conn.cursor()
    cursor.execute("SELECT KEY FROM BLAH WHERE KEY GLOB '*log*'")
    return [record[0] for record in cursor]

if __name__ == '__main__':
    aDict = buildDict(400000)
    conn = insertIntoDB(aDict)
    startTimeNaive = time.time()
    for i in xrange(3):
        naiveResults = naiveSearch(aDict, 'log')
    endTimeNaive = time.time()
    print 'Time taken for 3 iterations of naive search was', (endTimeNaive-startTimeNaive), 'and the average time per run was', (endTimeNaive-startTimeNaive)/3.0

    startTimeNaiveLC = time.time()
    for i in xrange(3):
        naiveLCResults = naiveLCSearch(aDict, 'log')
    endTimeNaiveLC = time.time()
    print 'Time taken for 3 iterations of naive search with list comprehensions was', (endTimeNaiveLC-startTimeNaiveLC), 'and the average time per run was', (endTimeNaiveLC-startTimeNaiveLC)/3.0

    startTimeDB = time.time()
    for i in xrange(3):
        dbResults = dbSearch(conn)
    endTimeDB = time.time()
    print 'Time taken for 3 iterations of DB search was', (endTimeDB-startTimeDB), 'and the average time per run was', (endTimeDB-startTimeDB)/3.0


    os.remove('/tmp/dictdb')

顺便说一下,我的测试结果是:

Time taken for 3 iterations of naive search was 0.264658927917 and the average time per run was 0.0882196426392
Time taken for 3 iterations of naive search with list comprehensions was 0.403481960297 and the average time per run was 0.134493986766
Time taken for 3 iterations of DB search was 1.19464492798 and the average time per run was 0.398214975993

所有时间单位都是秒。

3

如果你只需要找到以某个前缀开头的键,那么你可以使用二分查找。像这样就可以完成这个任务:

import bisect
words = sorted("""
a b c stack stacey stackoverflow stacked star stare x y z
""".split())
n = len(words)
print n, "words"
print words
print
tests = sorted("""
r s ss st sta stack star stare stop su t
""".split())
for test in tests:
    i = bisect.bisect_left(words, test)
    if words[i] < test: i += 1
    print test, i
    while i < n and words[i].startswith(test):
        print i, words[i]
        i += 1

输出结果:

12 words
['a', 'b', 'c', 'stacey', 'stack', 'stacked', 'stackoverflow', 'star', 'stare',
'x', 'y', 'z']

r 3
s 3
3 stacey
4 stack
5 stacked
6 stackoverflow
7 star
8 stare
ss 3
st 3
3 stacey
4 stack
5 stacked
6 stackoverflow
7 star
8 stare
sta 3
3 stacey
4 stack
5 stacked
6 stackoverflow
7 star
8 stare
stack 4
4 stack
5 stacked
6 stackoverflow
star 7
7 star
8 stare
stare 8
8 stare
stop 9
su 9
t 9
6

如果你只需要找那些以某个前缀开头的键,那么你可以使用一种叫做字典树的结构。还有一些更复杂的数据结构可以用来查找包含某个子字符串的键,不管这个子字符串出现在键的哪个位置,但这些结构需要占用更多的存储空间,所以在存储空间和速度之间需要做个权衡。

撰写回答