在Python函数中使用大数据结构的效率

9 投票

2 回答

2709 浏览

提问于 2025-04-16 09:49

我需要使用一个大数据结构，更具体地说，是一个大字典来进行查找。

一开始我的代码是这样的：

#build the dictionary
blablabla
#look up some information in the ditionary
blablabla

因为我需要查找很多次，我开始觉得把它做成一个函数，比如说 lookup(info) 是个好主意。

然后问题来了，我该如何处理这个大字典呢？

我应该用 lookup(info, dictionary) 这种方式把字典作为参数传进去，还是直接在 main() 里初始化字典，然后把它当作全局变量来用呢？

第一种方式看起来更优雅，因为我觉得维护全局变量很麻烦。但另一方面，我不确定把一个大字典传给函数的效率如何。如果这个函数被调用很多次，参数传递效率低下的话，那可真是个噩梦。

谢谢。

编辑1：

我刚刚实验了上述两种方法：

这里是代码片段。 lookup1 实现了参数传递查找，而 lookup2 则使用全局数据结构 "big_dict"。

class CityDict():
    def __init__():
        self.code_dict = get_code_dict()
    def get_city(city):
        try:
            return self.code_dict[city]
        except Exception:
            return None         

def get_code_dict():
    # initiate code dictionary from file
    return code_dict

def lookup1(city, city_code_dict):
    try:
        return city_code_dict[city]
    except Exception:
        return None

def lookup2(city):
    try:
        return big_dict[city]
    except Exception:
        return None


t = time.time()
d = get_code_dict()
for i in range(0, 1000000):
    lookup1(random.randint(0, 10000), d)

print "lookup1 is %f" % (time.time() - t)


t = time.time()
big_dict = get_code_dict()
for i in range(0, 1000000):
    lookup2(random.randint(0, 1000))
print "lookup2 is %f" % (time.time() - t)


t = time.time()
cd = CityDict() 
for i in range(0, 1000000):
    cd.get_city(str(i))
print "class is %f" % (time.time() - t)

这是输出结果：

lookup1 的时间是 8.410885
lookup2 的时间是 8.157661
class 的时间是 4.525721

所以看起来这两种方法几乎是一样的，没错，全局变量的方法效率稍微高一点。

编辑2：

我添加了 Amber 建议的类版本，然后再次测试效率。从结果来看，Amber 是对的，我们应该使用类的版本。

代码优化全局变量数据传递性能测试函数参数类的实现查找效率大数据结构

2 个回答

回答这个核心问题，参数传递并不是低效的，不会像你想的那样，值会被复制来复制去。Python 传递的是引用，这并不是说参数传递的方式符合我们常说的“值传递”或“引用传递”。

可以把它想象成，用调用者提供的引用值来初始化被调用函数内部的一个局部变量，这个过程是通过值来传递的。

不过，建议使用类可能是个不错的主意。

回答于 2025-04-16 由 Python大师

分享举报

都不是。应该使用类，因为类是专门用来把函数（方法）和数据（成员）放在一起的工具：

class BigDictLookup(object):
    def __init__(self):
        self.bigdict = build_big_dict() # or some other means of generating it
    def lookup(self):
        # do something with self.bigdict

def main():
    my_bigdict = BigDictLookup()
    # ...
    my_bigdict.lookup()
    # ...
    my_bigdict.lookup()

回答于 2025-04-16 由 Python大师

分享举报

在Python函数中使用大数据结构的效率

2 个回答

撰写回答