高效合并大量词典

if __name__ == '__main__': numthreads = 2 pool = mp.Pool(processes=numthreads) dword_list = pool.map(parse_xml, (locate("*.xml"))) final_dword = {} print "The final Word Count dictionary is " map(final_dword.update,dword_list) print final_dword

Iteration 1259 types | # objects | total size ============================ | =========== | ============ dict | 660 | 511.03 KB str | 6899 | 469.10 KB code | 1979 | 139.15 KB type | 176 | 77.00 KB wrapper_descriptor | 1037 | 36.46 KB list | 307 | 23.41 KB builtin_function_or_method | 738 | 23.06 KB method_descriptor | 681 | 21.28 KB weakref | 434 | 16.95 KB tuple | 476 | 15.76 KB set | 122 | 15.34 KB <class 'abc.ABCMeta | 18 | 7.88 KB function (__init__) | 130 | 7.11 KB member_descriptor | 226 | 7.06 KB getset_descriptor | 213 | 6.66 KB

2条回答

网友

1楼 · 编辑于 2024-05-14 09:15:34

如果您使用python3.3，您可以尝试集合.ChainMap对你来说是个解决办法。我还没有使用它，但它应该是一个快速的方式来链接多个字典在一起。参见讨论here。你知道吗

可能尝试将dword\列表pickle到一个文件中，并使用生成器而不是保留列表内存。通过这种方式，您可以流式传输数据，而不是存储数据。它应该释放一些内存，使程序更快。比如：

def xml_dict(): 
    for d in pickle.load("path/to/file.pickle"): 
        yield d

网友

2楼 · 编辑于 2024-05-14 09:15:34

使用itertools可以链接容器

import itertools

listA = {1,2,3}
listB = {4,5,6}
listC = {7,8,9}

for key in itertools.chain(listA, listB, listC):
    print key,

输出：1,2,3,4,5,6,7,8,9

这样您就不需要创建一个新的容器，它将在iterables上运行，直到它们用完为止。它与用户@roippi评论的内容相同，但编写方式不同。你知道吗

dict(itertools.chain.from_iterable(x.iteritems() for x in dword_list))

相关问题更多 >

编程相关推荐

热门问题

热门文章