导致奇怪行为的速度测试。将一次花费的时间乘以100，另一次只乘以10

def readFile(filepath): tempDict = {} file = open(filepath,'rb') for line in file: split = line.split('\t') tempDict[split[1]] = split[2] return tempDict def prepDict(tempDict): for key in tempDict.keys(): tempDict[key+'a'] = tempDict[key].upper() del tempDict[key] return tempDict def test(): prepDict(readFile('two.txt')) if __name__=='__main__': from timeit import Timer t = Timer(lambda: readFile('two.txt')) print 'readFile(10000): ' + str(t.timeit(number=10000)) tempDict = readFile('two.txt') t = Timer(lambda: prepDict(tempDict)) print 'prepDict (10000): ' + str(t.timeit(number=10000)) t = Timer(lambda: test()) print 'prepDict(readFile) (10000): ' + str(t.timeit(number=10000)) t = Timer(lambda: readFile('two.txt')) print 'readFile(100000): ' + str(t.timeit(number=100000)) tempDict = readFile('two.txt') t = Timer(lambda: prepDict(tempDict)) print 'prepDict (100000): ' + str(t.timeit(number=100000)) t = Timer(lambda: test()) print 'prepDict(readFile) (100000): ' + str(t.timeit(number=100000))

readFile(10000): 0.61602914474 prepDict (10000): 0.200615847469 prepDict(readFile) (10000): 0.609288647286 readFile(100000): 5.91858320729 prepDict (100000): 18.8842101717 prepDict(readFile) (100000): 6.45040039665

3条回答

网友

1楼 · 编辑于 2024-04-25 02:19:12

对prepDict的调用不是在孤立的环境中发生的。每次对prepDict的调用都会修改tempDict键，每次都会变长一点。因此，在对prepDict进行10**5次调用之后，prepDict中的键是相当大的字符串。如果将print语句放在prepDict中，您可以（大量地）看到这一点：

def prepDict(tempDict):
    for key in tempDict.keys():
        tempDict[key+'a'] = tempDict[key].upper()
        del tempDict[key]
    print(tempDict)
    return tempDict

解决这个问题的方法是确保每次调用prepDict或者更一般地说，您正在计时的语句不会影响正在计时的下一个调用（或语句）。abarnert已经展示了解决方案：prepDict(tempDict.copy())。你知道吗

顺便说一下，您可以使用for-loop来减少代码重复：

import timeit
import collections    

if __name__=='__main__':
    Ns = [10**4, 10**5]
    timing = collections.defaultdict(list)
    for N in Ns:
        timing['readFile'].append(timeit.timeit(
            "readFile('two.txt')",
            "from __main__ import readFile",
            number = N))
        timing['prepDict'].append(timeit.timeit(
            "prepDict(tempDict.copy())",
            "from __main__ import readFile, prepDict; tempDict = readFile('two.txt')",
            number = N))
        timing['test'].append(timeit.timeit(
            "test()",
            "from __main__ import test",
            number = N))

    print('{k:10}: {N[0]:7} {N[1]:7} {r}'.format(k='key', N=Ns, r='ratio'))
    for key, t in timing.iteritems():
        print('{k:10}: {t[0]:0.5f} {t[1]:0.5f} {r:>5.2f}'.format(k=key, t=t, r=t[1]/t[0]))

产生计时，例如

key       :   10000  100000 ratio
test      : 0.11320 1.12601  9.95
prepDict  : 0.01604 0.16167 10.08
readFile  : 0.08977 0.91053 10.14

网友

2楼 · 编辑于 2024-04-25 02:19:12

之所以会发生这种情况，是因为当您只测试prepDict时，您正在为所有对prepDict的调用重用tempDict。由于prepDict在字典中的所有项上循环，然后基本上只是将每个字符串键的长度增加一个，最终会得到一堆非常长的键。随着它的发展，这开始减慢您的函数的速度，因为字符串连接操作正在使用/重新创建越来越大的字符串。你知道吗

这在test中不是问题，因为每次都要重新初始化字典。你知道吗

网友

3楼 · 编辑于 2024-04-25 02:19:12

这里的问题是prepDict函数扩展了输入。每次按顺序调用它时，它都有更多的数据要处理。数据呈线性增长，因此第10000次运行的时间大约是第一次运行的10000倍。*

当您调用test时，它每次都在创建一个新的dict，因此时间是恒定的。你知道吗

通过更改prepDict测试以每次在dict的新副本上运行，您可以很容易地看到这一点：

t = Timer(lambda: prepDict(tempDict.copy()))

顺便说一下，你的prepDict实际上并不是随着number呈指数增长的，只是二次增长。一般来说，当某个数据呈超线性增长时，您需要估计算法开销，您确实需要获得两个以上的数据点。你知道吗

*这不是真的，它只会在字符串和散列操作（线性增长）所花费的时间开始淹没每一个其他操作（都是常数）所花费的时间时开始线性增长。你知道吗

**您在这里没有提到任何关于指数增长的内容，但在your previous question中您提到了，因此您可能在实际问题中做出了相同的毫无根据的假设。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章