Pandas：为什么系列索引使用。loc需要100倍的时间在第一次运行时计时呢？

import pandas as pd import numpy as np import timeit import time, gc def get_data(): ids = np.arange(size_bigseries) big_series = pd.Series(index=ids, data=np.random.rand(len(ids)), name='{} elements series'.format(len(ids))) small_slice = np.arange(size_slice) return big_series, small_slice # Method to test: a simple pandas slicing with .loc def basic_loc_indexing(pd_series, slice_ids): return pd_series.loc[slice_ids].dropna() # method to time it def timing_it(func, n, *args): gcold = gc.isenabled() gc.disable() times = [] for i in range(n): s = time.time() func(*args) times.append((time.time()-s)*1000) if gcold: gc.enable() return times if __name__ == '__main__': import sys n_tries = int(sys.argv[1]) if len(sys.argv)>1 and sys.argv[1] is not None else 1000 size_bigseries = int(sys.argv[2]) if len(sys.argv)>2 and sys.argv[2] is not None else 5000000 #5M size_slice = int(sys.argv[3]) if len(sys.argv)>3 and sys.argv[3] is not None else 100 #5M #1: timeit() big_series, small_slice = get_data() time_with_timeit = timeit.timeit('basic_loc_indexing(big_series, small_slice)',"gc.disable(); from __main__ import basic_loc_indexing, big_series, small_slice",number=n_tries) print("using timeit: {:.6f}ms".format(time_with_timeit/n_tries*1000)) del big_series, small_slice #2: time() big_series, small_slice = get_data() time_with_time = timing_it(basic_loc_indexing, n_tries, big_series, small_slice) print("using time: {:.6f}ms".format(np.mean(time_with_time))) print('head detail: {}\n'.format(time_with_time[:5]))

1条回答

网友

1楼 · 发布于 2024-04-24 21:02:45

此代码可能不是幂等的（具有影响其执行的副作用）。你知道吗

timeit将首先运行代码一次，以测量时间并推断出应该使用的循环和运行次数。如果您的代码不是幂等的（有副作用，比如兑现），那么第一次运行（没有记录）将更长，随后的运行（更快的运行）将被测量和报告。你知道吗

您可以查看可以传递给timeit（see the doc）的参数，以指定循环数并放弃初始运行。你知道吗

另请注意（摘自上面链接的文档）：

The times reported by %timeit will be slightly higher than those reported by the timeit.py script when variables are accessed. This is due to the fact that %timeit executes the statement in the namespace of the shell, compared with timeit.py, which uses a single setup statement to import function or create variables. Generally, the bias does not matter as long as results from timeit.py are not mixed with those from %timeit.

编辑：忽略了您将跑步次数传递给timeit的事实。在这种情况下，只有后面的部分我的答案适用，但你看到的数字似乎指向另一个问题。。。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章