为什么我即使正确初始化仍然会出现'cuMemAlloc失败：未初始化'？

Question

我在使用Django、Celery和PyCuda的过程中遇到了一些问题。
我在亚马逊的EC2 G2实例上使用PyCuda进行一些图像处理。
这是我使用的支持CUDA的GRID K520显卡的信息：
检测到1个支持CUDA的设备。

Device 0: "GRID K520"
CUDA Driver Version / Runtime Version          6.0 / 6.0
CUDA Capability Major/Minor version number:    3.0
Total amount of global memory:                 4096 MBytes (4294770688 bytes)
( 8) Multiprocessors, (192) CUDA Cores/MP:     1536 CUDA Cores
GPU Clock rate:                                797 MHz (0.80 GHz)
Memory Clock rate:                             2500 Mhz
Memory Bus Width:                              256-bit
L2 Cache Size:                                 524288 bytes
Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
Total amount of constant memory:               65536 bytes
Total amount of shared memory per block:       49152 bytes
Total number of registers available per block: 65536
Warp size:                                     32
Maximum number of threads per multiprocessor:  2048
Maximum number of threads per block:           1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch:                          2147483647 bytes
Texture alignment:                             512 bytes
Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
Run time limit on kernels:                     No
Integrated GPU sharing Host Memory:            No
Support host page-locked memory mapping:       Yes
Alignment requirement for Surfaces:            Yes
Device has ECC support:                        Disabled
Device supports Unified Addressing (UVA):      Yes
Device PCI Bus ID / PCI location ID:           0 / 3
Compute Mode:
 < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0,   NumDevs = 1, Device0 = GRID K520
Result = PASS

我使用的是比较标准的Celery配置。
我在utils/tasks.py中定义了一些任务，这些任务在尝试使用PyCuda之前已经测试过并且可以正常工作。我是通过pip安装了PyCuda。

在我遇到问题的文件顶部，我进行了标准的导入：

from celery import task
# other imports
import os
try:
    import Image
except Exception:
    from PIL import Image
import time

#Cuda imports
import pycuda.autoinit
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import numpy

一个远程服务器启动了一个任务，基本的工作流程如下：

 @task()
 def photo_function(photo_id,...):
     print 'Got photo...'
     ... Do some stuff ...
     result = do_photo_manipulation(photo_id)
     return result

def do_photo_manipulation(photo_id):
    im = Image.open(inPath)
    px = numpy.array(im)
    px = px.astype(numpy.float32)
    d_px = cuda.mem_alloc(px.nbytes)
    ... (Do stuff with the pixel array) ...
    return new_image

如果我在shell plus中运行它（也就是使用./manage.py shell_plus），或者作为一个独立的进程在Django和Celery之外运行，它是可以正常工作的。
但在这个特定的环境下，它却失败了，错误信息是：
cuMemAlloc失败：未初始化。

我查看了其他解决方案一段时间，尝试把初始化的导入语句放到函数内部。我还加了一个wait()语句，以确保这不是因为GPU还没准备好工作的问题。

这里有一个答案建议错误是因为没有导入pycuda.autoinit，而我已经做了这个：http://comments.gmane.org/gmane.comp.python.cuda/1975

如果有人能提供帮助，我将非常感激！

如果我需要提供更多信息，请告诉我！

编辑：
这是测试代码：
def CudaImageShift(imageIn, mode = "luminosity" , log = 0):

    if log == 1 :
        print ("----------> CUDA CONVERSION")

#    print "ENVIRON: "
#    import os
#    print os.environ

    print 'AUTOINIT'
    print pycuda.autoinit

    print 'Making context...'
    context = make_default_context()
    print 'Context created.'
    totalT0 = time.time()

    print 'Doing test run...'
    a = numpy.random.randn(4,4)
    a = a.astype(numpy.float32)
    print 'Test mem alloc'
    a_gpu = cuda.mem_alloc(a.nbytes)
    print 'MemAlloc complete, test mem copy'
    cuda.memcpy_htod(a_gpu, a)
    print 'memcopy complete'


[2014-07-15 14:52:20,469: WARNING/Worker-1] cuDeviceGetCount failed: not initialized

django error handling gpu image processing cuda celery task management pycuda

为什么我即使正确初始化仍然会出现'cuMemAlloc失败：未初始化'？

1 个回答

撰写回答