java cuMemcpyDtoH生成CUDA_错误_无效_值
我有一个非常简单的scala jcuda程序,它添加了一个非常大的数组。在我想将超过4字节的数据从设备复制到主机之前,一切都可以正常编译和运行。当我试图复制超过4个字节时,我得到了CUDA_ERROR_INVALID_值
// This does pukes and gives CUDA_ERROR_INVALID_VALUE
var hostOutput = new Array[Int](numElements)
cuMemcpyDtoH(
Pointer.to(hostOutput),
deviceOutput,
8
)
// This runs just fine
var hostOutput = new Array[Int](numElements)
cuMemcpyDtoH(
Pointer.to(hostOutput),
deviceOutput,
4
)
为了更好地了解实际程序的上下文,下面是我的内核代码,它编译和运行得很好:
extern "C"
__global__ void add(int n, int *a, int *b, int *sum) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i<n)
{
sum[i] = a[i] + b[i];
}
}
然后我还将一些java示例代码翻译成scala代码。总之,下面是运行的整个程序:
package dev
import jcuda.driver.JCudaDriver._
import jcuda._
import jcuda.driver._
import jcuda.runtime._
/**
* Created by dev on 6/7/15.
*/
object TestCuda {
def init = {
JCudaDriver.setExceptionsEnabled(true)
// Input vector
// Output vector
// Load module
// Load the ptx file.
val kernelPath = "/home/dev/IdeaProjects/jniopencl/src/main/resources/kernels/JCudaVectorAddKernel30.cubin"
cuInit(0)
val device = new CUdevice
cuDeviceGet(device, 0)
val context = new CUcontext
cuCtxCreate(context, 0, device)
// Create and load module
val module = new CUmodule()
cuModuleLoad(module, kernelPath)
// Obtain a function pointer to the kernel function.
var add = new CUfunction()
cuModuleGetFunction(add, module, "add")
val numElements = 100000
val hostInputA = 1 to numElements toArray
val hostInputB = 1 to numElements toArray
val SI: Int = Sizeof.INT.asInstanceOf[Int]
// Allocate the device input data, and copy
// the host input data to the device
var deviceInputA = new CUdeviceptr
cuMemAlloc(deviceInputA, numElements * SI)
cuMemcpyHtoD(
deviceInputA,
Pointer.to(hostInputA),
numElements * SI
)
var deviceInputB = new CUdeviceptr
cuMemAlloc(deviceInputB, numElements * SI)
cuMemcpyHtoD(
deviceInputB,
Pointer.to(hostInputB),
numElements * SI
)
// Allocate device output memory
val deviceOutput = new CUdeviceptr()
cuMemAlloc(deviceOutput, SI)
// Set up the kernel parameters: A pointer to an array
// of pointers which point to the actual values.
val kernelParameters = Pointer.to(
Pointer.to(Array[Int](numElements)),
Pointer.to(deviceInputA),
Pointer.to(deviceInputB),
Pointer.to(deviceOutput)
)
// Call the kernel function
val blockSizeX = 256
val gridSizeX = Math.ceil(numElements / blockSizeX).asInstanceOf[Int]
cuLaunchKernel(
add,
gridSizeX, 1, 1,
blockSizeX, 1, 1,
0, null,
kernelParameters, null
)
cuCtxSynchronize
// **** Code pukes here with that error
// If I comment this out the program runs fine
var hostOutput = new Array[Int](numElements)
cuMemcpyDtoH(
Pointer.to(hostOutput),
deviceOutput,
numElements
)
hostOutput.foreach(print(_))
}
}
总之,我只是想让你知道我电脑的规格。我在optimus上运行Ubuntu14.04,安装了GTX 770M卡,支持计算3.0。我也在运行NVCC 5.5版。最后,我使用Java 8运行scala 2.11.6版。我是个笨蛋,非常感谢你的帮助
共 (0) 个答案