我是PyTorch的新手,正在尝试复制该项目:https://github.com/eXascaleInfolab/ActiveLink
但是,在feedforward()
中出现的错误已经困扰了我好几天,下面是部分代码(有关模型的完整代码,请参见https://github.com/eXascaleInfolab/ActiveLink/blob/master/models.py):
def forward(self, e1, rel, batch_size=None, weights=None):
......
e1_embedded = self.emb_e(e1).view(-1, 1, 10, 20)
rel_embedded = self.emb_rel(rel).view(-1, 1, 10, 20)
stacked_inputs = torch.cat([e1_embedded, rel_embedded], 2) # out: (128L, 1L, 20L, 20L)
这给了我一个错误(我正在使用GPU):
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=196 error=710 : device-side assert triggered
Traceback (most recent call last):
File "main.py", line 147, in <module>
main()
File "main.py", line 136, in main
model = run_meta_incremental(config, model, train_batcher, test_rank_batcher)
File "/home/yonghui/yt/meta_incr_training.py", line 158, in run_meta_incremental
g = run_inner(config, model, task)
File "/home/yonghui/yt/meta_incr_training.py", line 120, in run_inner
pred = model.forward(e1, rel)
File "/home/yonghui/yt/models.py", line 136, in forward
stacked_inputs = torch.cat([e1_embedded, rel_embedded], 2)
RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:196
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
我使用调试器试图找出哪里出错:
在e1
和rel
嵌入之前,它们都是int64
中的张量,形状为torch.Size([128, 1])
e1
可以正常嵌入,转换为torch.float32
和torch.Size([128, 1, 10, 20])
。但是,在rel
通过emb_rel
的嵌入层之后,调试器将所有TenRo显示为Unable to get repr for <class 'torch.Tensor'>
发生什么事了?我怎样才能解决这个问题?谢谢你的帮助
错误就在这个错误消息被打印出来之前的某个地方,可能是在整形过程中
调用视图不会更改基础数据,它只会更改基础数据的“视图”,而且是惰性的。如果不可能使用张量的不同视图(例如,因为张量未连续存储在内存中,请参见PyTorch forum),则在第一次使用张量内容时(在您希望调试打印张量的情况下)会失败
调试时,考虑用^ {< CD2>}(^,StackOverflow thread on the difference between view and reshape)替换^ {CD1>}。
通过使用调试器并检查输入张量,可以解决此问题
在嵌入之前检查了张量之后,我发现一些元素超出了范围,特别是对于索引从0开始的情况
相关问题 更多 >
编程相关推荐