PyCUDA上共享内存入门

import pycuda.driver as drv import pycuda.tools import pycuda.autoinit import numpy from pycuda.compiler import SourceModule src=''' __global__ void reduce0(float *g_idata, float *g_odata) { extern __shared__ float sdata[]; // each thread loads one element from global to shared mem unsigned int tid = threadIdx.x; unsigned int i = blockIdx.x*blockDim.x + threadIdx.x; sdata[tid] = g_idata[i]; __syncthreads(); // do reduction in shared mem for(unsigned int s=1; s < blockDim.x; s *= 2) { if (tid % (2*s) == 0) { sdata[tid] += sdata[tid + s]; } __syncthreads(); } // write result for this block to global mem if (tid == 0) g_odata[blockIdx.x] = sdata[0]; } ''' mod = SourceModule(src) reduce0=mod.get_function('reduce0') a = numpy.random.randn(400).astype(numpy.float32) dest = numpy.zeros_like(a) reduce0(drv.In(a),drv.Out(dest),block=(400,1,1))

1条回答

网友

1楼 · 发布于 2024-05-16 20:26:38

当您指定

extern __shared__ float sdata[];

你告诉编译器调用者将提供共享内存。在PyCUDA中，这是通过在调用CUDA函数的行上指定shared=nnnn来完成的。比如你的案子：

^{pr2}$

或者，可以删除extern关键字，并直接指定共享内存：

__shared__ float sdata[400];

相关问题更多 >

编程相关推荐

热门问题

热门文章