我的torch神经网络前向方法中的中间结果requires_grad=False

0 投票
1 回答
35 浏览
提问于 2025-04-13 01:44

我在PyTorch中定义了一个小的神经网络,并尝试训练它。但是,损失值的requires_grad没有设置,这让我觉得很奇怪。我在神经网络的前向传播方法中设置了一个断点,结果发现所有生成的中间变量(embs、means、sim、out)都没有设置requires_grad。

这是我的代码:

class Cbow(nn.Module):
    def __init__(self, vocab_size, hctx_len, emb_size):
        super().__init__()
        self.proj = nn.Linear(in_features=vocab_size, out_features=emb_size, bias=False)
        self.hidden = nn.Linear(in_features=emb_size, out_features=vocab_size, bias=False)

    def forward(self, x):
        # x: (num_batches, 2*hctx_len, vocab_size)

        embs = self.proj(x)               # (num_batches, 2*hctx_len, emb_size)
        means = embs.mean(dim=1)          # (num_batches, emb_size)

        sim = self.hidden(means)            # (num_batches, vocab_size)
        out = torch.softmax(sim, dim=1)     # (num_batches, vocab_size)
        # breakpoint()

        return out


def fit_model_layers(model, train_loader, epochs, lr=0.01):
    model.train()

    loss_fn = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    optimizer.zero_grad()

    for e in range(epochs):
      running_loss, num_batches = 0, 0
      for x, y in train_loader:
          out = model(x)

          # calculate loss
          loss = loss_fn(out, y)                # loss.requires_grad = False !!!!!!
          running_loss += loss.item()
          num_batches += 1

          # backprop + optimization step
          loss.backward()
          optimizer.step()
          optimizer.zero_grad()

      print(f'Epoch {e+1} loss: {running_loss / num_batches}')

我不太明白我哪里做错了。在PyTorch的优化示例中(https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html),前向传播中的中间结果都有设置requires_grad。

那可能出什么问题呢?

1 个回答

-1

无法重现这个问题。我运行了你提供的代码,发现损失项的 requires_grad=True

class Cbow(nn.Module):
    def __init__(self, vocab_size, hctx_len, emb_size):
        super().__init__()
        self.proj = nn.Linear(in_features=vocab_size, out_features=emb_size, bias=False)
        self.hidden = nn.Linear(in_features=emb_size, out_features=vocab_size, bias=False)

    def forward(self, x):
        # x: (num_batches, 2*hctx_len, vocab_size)

        embs = self.proj(x)               # (num_batches, 2*hctx_len, emb_size)
        means = embs.mean(dim=1)          # (num_batches, emb_size)

        sim = self.hidden(means)            # (num_batches, vocab_size)
        out = torch.softmax(sim, dim=1)     # (num_batches, vocab_size)
        # breakpoint()

        return out
    
vocab_size = 64
hctx_len = 128
emb_size = 128
bs = 12

model = Cbow(vocab_size, hctx_len, emb_size)

x = torch.randn(bs, 2*hctx_len, vocab_size)
y = torch.randint(0, vocab_size, (bs,))

out = model(x)
loss_fn = nn.CrossEntropyLoss()
loss = loss_fn(out, y)
print(loss.requires_grad)
> True

另外,CrossEntropyLoss 这个函数本身就包含了 softmax 操作,所以在计算损失之前在模型里再加一个 softmax 是不对的。

撰写回答