PyTorch:查找梯度计算所需的变量,该变量已通过就地操作多任务学习进行修改

2024-04-19 21:50:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我最近对我的PyTorch LSTM代码进行了大规模重构,以支持多任务学习。我创建了一个MTLWrapper,其中包含一个BaseModel(它可以是常规LSTM网络上的几个变体之一),它与重构之前保持不变,减去一个线性hidden2tag层(采用隐藏序列并转换为标记空间),该层现在位于包装器中。原因是,对于多任务学习,所有参数都是共享的,除了最后一个线性层,我为每个任务都有一个线性层。它们存储在nn.ModuleList中,而不仅仅是一个常规的python列表

现在发生的是,我的正向传递返回一个标记分数张量列表(每个任务一个),而不是单个任务标记分数的单个张量。我计算每个任务的损失,然后尝试用这些损失的平均值进行反向传播(从技术上讲,在批处理的所有语句中也是平均值,但在重构之前也是如此)。在对批处理中的每个句子运行前向传递之前,我调用model.zero_grad()

我不知道具体发生在哪里,但在这次重构之后,我开始出现这个错误(在第二批中):

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

根据建议,我添加了retain_graph=True标志,但现在我得到了以下错误(也是在第二个后退步骤中):

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 400]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

回溯中的提示实际上没有帮助,因为我不知道[100400]形状的张量来自何处——我没有任何大小为400的参数。 我暗自怀疑问题实际上是我不需要retain_graph=True,但我无法确认vs.查找根据第二个错误正在更改的神秘变量。不管怎样,我完全不知道如何解决这个问题。感谢您的帮助

代码片段:

import torch
import torch.nn as nn
import torch.nn.functional as F

class MTLWrapper(nn.Module):
    
    def __init__(self, embedding_dim, hidden_dim, dropout,..., directions=1, device='cpu', model_type):
        super(MTLWrapper, self).__init__()
        self.base_model = model_type(embedding_dim, hidden_dim, dropout, ..., directions, device)
        self.linear_taggers = []
        for tagset_size in tagset_sizes:
            self.linear_taggers.append(nn.Linear(hidden_dim*directions, tagset_size))
        self.linear_taggers = nn.ModuleList(self.linear_taggers)

    def init_hidden(self, hidden_dim):
        return self.base_model.init_hidden(hidden_dim)

    def forward(self, sentence):
        lstm_out = self.base_model.forward(sentence)
        tag_scores = []
        for linear_tagger in self.linear_taggers:
            tag_space = linear_tagger(lstm_out.view(len(sentence), -1))
            tag_scores.append(F.log_softmax(tag_space))
        tag_scores = torch.stack(tag_scores)
        return tag_scores

列车内部功能:

for i in range(math.ceil(len(train_sents)/batch_size)):
    batch = r[i*batch_size:(i+1)*batch_size]
    losses = []
    for j in batch:

        sentence = train_sents[j]
        tags = train_tags[j]

        # Step 1. Remember that Pytorch accumulates gradients.
        # We need to clear them out before each instance
        model.zero_grad()

        # Also, we need to clear out the hidden state of the LSTM,
        # detaching it from its history on the last instance.
        model.hidden = model.init_hidden(hidden_dim)

        sentence_in = sentence
        targets = tags

        # Step 3. Run our forward pass.
        tag_scores = model(sentence_in)

        loss = [loss_function(tag_scores[i], targets[i]) for i in range(len(tag_scores))]
        loss = torch.stack(loss)
        avg_loss = sum(loss)/len(loss)
        losses.append(avg_loss)
losses = torch.stack(losses)
total_loss = sum(losses)/len(losses)  # average over all sentences in batch
total_loss.backward(retain_graph=True)
running_loss += total_loss.item() 
optimizer.step()
count += 1

和一个可能的BaseModel的代码(其他几乎相同):

class LSTMTagger(nn.Module):

def __init__(self, embedding_dim, hidden_dim, dropout, vocab_size, alphabet_size,
             directions=1, device='cpu'):

    super(LSTMTagger, self).__init__()
    self.device = device

    self.hidden_dim = hidden_dim
    self.directions = directions
    self.dropout = nn.Dropout(dropout)

    self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)

    # The LSTM takes word embeddings as inputs, and outputs hidden states
    # with dimensionality hidden_dim.
    self.lstm = nn.LSTM(embedding_dim, hidden_dim, dropout=dropout, bidirectional=directions == 2)

    # The linear layer that maps from hidden state space to tag space
    self.hidden = self.init_hidden(hidden_dim)

def init_hidden(self, dim):
    # Before we've done anything, we don't have any hidden state.
    # Refer to the PyTorch documentation to see exactly
    # why they have this dimensionality.
    # The axes semantics are (num_layers, minibatch_size, hidden_dim)
    return (torch.zeros(self.directions, 1, dim).to(device=self.device),
            torch.zeros(self.directions, 1, dim).to(device=self.device))

def forward(self, sentence):
    word_idxs = []
    for word in sentence:
        word_idxs.append(word[0])

    embeds = self.word_embeddings(torch.LongTensor(word_idxs).to(device=self.device))
   
    lstm_out, self.hidden = self.lstm(
        embeds.view(len(sentence), 1, -1), self.hidden)
    lstm_out = self.dropout(lstm_out)
    return lstm_out

Tags: thetoinselfmodelinitdevicetag
1条回答
网友
1楼 · 发布于 2024-04-19 21:50:37

问题是,当我重置模型的隐藏状态(model.hidden = model.init_hidden(hidden_dim))时,我实际上没有将重新初始化的权重重新分配给BaseModel,而只是在MTLWrapper中(技术上甚至不使用隐藏层)。 我将我的MTLWrapper函数修改如下:

class MTLWrapper(nn.Module):

    def init_hidden(self, hidden_dim):
        self.base_model.hidden = self.base_model.init_hidden(hidden_dim)
        return self.base_model.init_hidden(hidden_dim)

这解决了第一个错误,我的代码运行时没有retain_graph=True标志

相关问题 更多 >