Python列表比numpy数组性能更好？

import random as rd import statistics as st def collectStickers(experiments, collectible): obtained = [] attempts = 0 while(len(obtained) < collectible): new_sticker = rd.randint(1, collectible) if new_sticker not in obtained: obtained.append(new_sticker) attempts += 1 experiments.append(attempts) experiments = [] collectible = 20 rep_experiment = 100000 for i in range(1, rep_experiment): collectStickers(experiments, collectible) print(st.mean(experiments))

结果

对于这样一个简单的实验来说，处理时间似乎还可以，但对于更复杂的目的来说，13.8秒太多了

72.06983069830699 [Finished in 13.8s]

Numpy

我无法使用任何函数，因为当我遵循与上述相同的逻辑时，出现了以下错误：

RuntimeWarning：空片的平均值

RuntimeWarning:在双\u标量中遇到无效值

所以我选择了天真的方式：

import random as rd import numpy as np experiments = np.array([]) rep_experiment = 100000 for i in range(1, rep_experiment): obtained = np.array([]) attempts = 0 while(len(obtained) < 20): new_sticker = rd.randint(1, 20) if new_sticker not in obtained: obtained = np.append(obtained, new_sticker) attempts += 1 experiments = np.append(experiments, attempts) print(np.mean(experiments))

2条回答

网友

1楼 · 编辑于 2024-05-20 08:35:51

^{}在附加到数组之前复制该数组

您的程序可能会将大部分时间花在这些不必要的拷贝上

experiments = np.append(experiments, attempts)

编辑

正如所料，用预定义数组替换二次型np.append()使包装器函数的速度大致相同

将obtained标签列表替换为一组标签会让事情变得更快

然而，瓶颈是随机数发生器速度慢。运行cProfile显示75%的执行时间都花在randint()上

请参见下面的代码以了解结果（在我的机器上）

import random
import statistics
import timeit

import numpy as np

collectible = 20
rep_experiment = 10000


def original_collect_stickers():
    obtained = []
    attempts = 0

    while len(obtained) < collectible:
        new_sticker = random.randint(1, collectible)
        if new_sticker not in obtained:
            obtained.append(new_sticker)
        attempts += 1
    return attempts


def set_collect_stickers():
    obtained = set()
    attempts = 0
    n = 0

    while n < collectible:
        new_sticker = random.randint(1, collectible)
        if new_sticker not in obtained:
            obtained.add(new_sticker)
            n += 1
        attempts += 1
    return attempts


def repeat_with_list(fn):
    experiments = []
    for i in range(rep_experiment):
        experiments.append(fn())
    return statistics.mean(experiments)


def repeat_with_numpy(fn):
    experiments = np.zeros(rep_experiment)
    for i in range(rep_experiment):
        experiments[i] = fn()
    return np.mean(experiments)


def time_fn(name, fn, n=3):
    time_taken = timeit.timeit(fn, number=n) / n
    result = fn()  # once more to get the result too
    print(f"{name:15}: {time_taken:.6f}, result {result}")


for wrapper in (repeat_with_list, repeat_with_numpy):
    for fn in (original_collect_stickers, set_collect_stickers):
        time_fn(f"{wrapper.__name__} {fn.__name__}", lambda: wrapper(fn))

结果是

repeat_with_list original_collect_stickers: 0.747183, result 71.7912
repeat_with_list set_collect_stickers: 0.688952, result 72.1002
repeat_with_numpy original_collect_stickers: 0.752644, result 72.0978
repeat_with_numpy set_collect_stickers: 0.685355, result 71.7515

编辑2

使用the ^{} library's ^{} generator，即new_sticker = fastrand.pcg32bounded(collectible)使事情变得非常快：

repeat_with_list original_collect_stickers: 0.761186, result 72.0185
repeat_with_list set_collect_stickers: 0.690244, result 71.878
repeat_with_list set_collect_stickers_fastrand: 0.116410, result 71.9323
repeat_with_numpy original_collect_stickers: 0.759154, result 71.8604
repeat_with_numpy set_collect_stickers: 0.696563, result 71.5482
repeat_with_numpy set_collect_stickers_fastrand: 0.114212, result 71.6369

网友

2楼 · 编辑于 2024-05-20 08:35:51

要真正考虑numpy阵列的强大功能，您需要以numpy方式编程。例如，尝试将实验矢量化如下：

def vectorized():
    rep_experiment = 100000
    collectible = 20
    # array of falses
    obtained = np.zeros(rep_experiment, dtype=bool)
    attempts = np.zeros(rep_experiment, dtype=int)

    targets = np.zeros((rep_experiment, collectible), dtype=bool)

    x = np.arange(0,100000, step=1, dtype=int)

    while False in targets:
        r = np.random.randint(0, collectible, size=rep_experiment)
        # add the new stickers to the collected target
        targets[x,r] = True
        # if collected all set obtained to True
        obtained[np.sum(targets, axis=1)==collectible] = True
        # increments the not obtained values
        attempts[~obtained] += 1


    print(attempts.mean(), attempts.std())

检查一下速度，对我来说大约是50倍

Python列表

结果

Numpy

结果

编辑

编辑2

相关问题更多 >

编程相关推荐

热门问题

热门文章