Python在GPU工作时将数据从存储器传输到RAM

1条回答

网友

1楼 · 发布于 2024-04-19 21:03:11

好吧，假设你有两个任务：

import time


def cpu_operation(n):
    print('Start CPU', n)
    for x in range(100):
        time.sleep(0.01)
    print('End CPU', n)
    return n


def expensive_gpu_operation(n):
    print('Start GPU', n)
    time.sleep(0.3)
    print('Stop GPU', n)
    return n

下面是您现在如何运行它们：

def slow():
    results = []
    for task in range(5):
        cpu_result = cpu_operation(task)
        gpu_result = expensive_gpu_operation(cpu_result)
        results.append(gpu_result)
    return results

我们按顺序运行这些-CPU，GPU，CPU，GPU。。。输出如下：

Start CPU 0
End CPU 0
Start GPU 0
Stop GPU 0
Start CPU 1
End CPU 1
Start GPU 1
Stop GPU 1
Start CPU 2
End CPU 2
Start GPU 2
Stop GPU 2
Start CPU 3
End CPU 3
Start GPU 3
Stop GPU 3
Start CPU 4
End CPU 4
Start GPU 4
Stop GPU 4

假设我们可以在GPU任务X完成之前启动CPU任务X+1来节省一些时间，这样CPU X+1和gpux就可以并行运行了，对吗？你知道吗

（我们不能并行运行CPU X和gpux，因为gpux需要来自CPU X的输出的输入，因此需要+1。）

让我们使用线程！基本上我们要做的是：

启动CPU N，等待它完成
等待GPU N-1完成，在后台启动GPU N

所以我们得到了一些平行性。实现这一点的最简单方法是使用一个线程的线程池—它可以像队列一样工作。在每个循环中，我们只安排一个任务并存储async_result。完成后，我们可以检索所有结果。你知道吗

Incidentally, Python has a thread pool implementation in the ^{} module.

from multiprocessing.pool import ThreadPool

def quick():
    pool = ThreadPool(processes=1)
    results = []
    for task in range(5):
        cpu_result = cpu_operation(task)
        # schedule next GPU operation in background,
        # store the async_result instance for this operation
        async_result = pool.apply_async(expensive_gpu_operation, (cpu_result, ))
        results.append(async_result)

    # The results are ready! (Well, the last one probably isn't yet,
    # but get() will wait for it
    return [x.get() for x in results]

现在输出变成：

Start CPU 0
End CPU 0
Start CPU 1
Start GPU 0
Stop GPU 0
End CPU 1
Start CPU 2
Start GPU 1
Stop GPU 1
End CPU 2
Start CPU 3
Start GPU 2
Stop GPU 2
End CPU 3
Start CPU 4
Start GPU 3
Stop GPU 3
End CPU 4
Start GPU 4
Stop GPU 4

我们可以观察平行性！你知道吗

注意，当expensive_gpu_operation被调度时，它实际上直到time.sleep进入下一个CPU操作时才运行。这是由于全局解释器锁造成的-在工作线程有机会执行某些操作之前，主线程必须放弃GIL，在time.sleep()上会发生这种情况，在您的情况下，我希望在您执行一些I/o时会发生这种情况-开始读取下一批图像。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章