我最近对我的PyTorch LSTM代码进行了大规模重构,以支持多任务学习。我创建了一个MTLWrapper
,其中包含一个BaseModel
(它可以是常规LSTM网络上的几个变体之一),它与重构之前保持不变,减去一个线性hidden2tag层(采用隐藏序列并转换为标记空间),该层现在位于包装器中。原因是,对于多任务学习,所有参数都是共享的,除了最后一个线性层,我为每个任务都有一个线性层。它们存储在nn.ModuleList中,而不仅仅是一个常规的python列表
现在发生的是,我的正向传递返回一个标记分数张量列表(每个任务一个),而不是单个任务标记分数的单个张量。我计算每个任务的损失,然后尝试用这些损失的平均值进行反向传播(从技术上讲,在批处理的所有语句中也是平均值,但在重构之前也是如此)。在对批处理中的每个句子运行前向传递之前,我调用model.zero_grad()
我不知道具体发生在哪里,但在这次重构之后,我开始出现这个错误(在第二批中):
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
根据建议,我添加了retain_graph=True标志,但现在我得到了以下错误(也是在第二个后退步骤中):
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 400]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
回溯中的提示实际上没有帮助,因为我不知道[100400]形状的张量来自何处——我没有任何大小为400的参数。
我暗自怀疑问题实际上是我不需要retain_graph=True
,但我无法确认vs.查找根据第二个错误正在更改的神秘变量。不管怎样,我完全不知道如何解决这个问题。感谢您的帮助
代码片段:
import torch
import torch.nn as nn
import torch.nn.functional as F
class MTLWrapper(nn.Module):
def __init__(self, embedding_dim, hidden_dim, dropout,..., directions=1, device='cpu', model_type):
super(MTLWrapper, self).__init__()
self.base_model = model_type(embedding_dim, hidden_dim, dropout, ..., directions, device)
self.linear_taggers = []
for tagset_size in tagset_sizes:
self.linear_taggers.append(nn.Linear(hidden_dim*directions, tagset_size))
self.linear_taggers = nn.ModuleList(self.linear_taggers)
def init_hidden(self, hidden_dim):
return self.base_model.init_hidden(hidden_dim)
def forward(self, sentence):
lstm_out = self.base_model.forward(sentence)
tag_scores = []
for linear_tagger in self.linear_taggers:
tag_space = linear_tagger(lstm_out.view(len(sentence), -1))
tag_scores.append(F.log_softmax(tag_space))
tag_scores = torch.stack(tag_scores)
return tag_scores
列车内部功能:
for i in range(math.ceil(len(train_sents)/batch_size)):
batch = r[i*batch_size:(i+1)*batch_size]
losses = []
for j in batch:
sentence = train_sents[j]
tags = train_tags[j]
# Step 1. Remember that Pytorch accumulates gradients.
# We need to clear them out before each instance
model.zero_grad()
# Also, we need to clear out the hidden state of the LSTM,
# detaching it from its history on the last instance.
model.hidden = model.init_hidden(hidden_dim)
sentence_in = sentence
targets = tags
# Step 3. Run our forward pass.
tag_scores = model(sentence_in)
loss = [loss_function(tag_scores[i], targets[i]) for i in range(len(tag_scores))]
loss = torch.stack(loss)
avg_loss = sum(loss)/len(loss)
losses.append(avg_loss)
losses = torch.stack(losses)
total_loss = sum(losses)/len(losses) # average over all sentences in batch
total_loss.backward(retain_graph=True)
running_loss += total_loss.item()
optimizer.step()
count += 1
和一个可能的BaseModel
的代码(其他几乎相同):
class LSTMTagger(nn.Module):
def __init__(self, embedding_dim, hidden_dim, dropout, vocab_size, alphabet_size,
directions=1, device='cpu'):
super(LSTMTagger, self).__init__()
self.device = device
self.hidden_dim = hidden_dim
self.directions = directions
self.dropout = nn.Dropout(dropout)
self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
# The LSTM takes word embeddings as inputs, and outputs hidden states
# with dimensionality hidden_dim.
self.lstm = nn.LSTM(embedding_dim, hidden_dim, dropout=dropout, bidirectional=directions == 2)
# The linear layer that maps from hidden state space to tag space
self.hidden = self.init_hidden(hidden_dim)
def init_hidden(self, dim):
# Before we've done anything, we don't have any hidden state.
# Refer to the PyTorch documentation to see exactly
# why they have this dimensionality.
# The axes semantics are (num_layers, minibatch_size, hidden_dim)
return (torch.zeros(self.directions, 1, dim).to(device=self.device),
torch.zeros(self.directions, 1, dim).to(device=self.device))
def forward(self, sentence):
word_idxs = []
for word in sentence:
word_idxs.append(word[0])
embeds = self.word_embeddings(torch.LongTensor(word_idxs).to(device=self.device))
lstm_out, self.hidden = self.lstm(
embeds.view(len(sentence), 1, -1), self.hidden)
lstm_out = self.dropout(lstm_out)
return lstm_out
问题是,当我重置模型的隐藏状态(
model.hidden = model.init_hidden(hidden_dim)
)时,我实际上没有将重新初始化的权重重新分配给BaseModel
,而只是在MTLWrapper
中(技术上甚至不使用隐藏层)。 我将我的MTLWrapper
函数修改如下:这解决了第一个错误,我的代码运行时没有
retain_graph=True
标志相关问题 更多 >
编程相关推荐