我的torch神经网络前向方法中的中间结果requires_grad=False
我在PyTorch中定义了一个小的神经网络,并尝试训练它。但是,损失值的requires_grad没有设置,这让我觉得很奇怪。我在神经网络的前向传播方法中设置了一个断点,结果发现所有生成的中间变量(embs、means、sim、out)都没有设置requires_grad。
这是我的代码:
class Cbow(nn.Module):
def __init__(self, vocab_size, hctx_len, emb_size):
super().__init__()
self.proj = nn.Linear(in_features=vocab_size, out_features=emb_size, bias=False)
self.hidden = nn.Linear(in_features=emb_size, out_features=vocab_size, bias=False)
def forward(self, x):
# x: (num_batches, 2*hctx_len, vocab_size)
embs = self.proj(x) # (num_batches, 2*hctx_len, emb_size)
means = embs.mean(dim=1) # (num_batches, emb_size)
sim = self.hidden(means) # (num_batches, vocab_size)
out = torch.softmax(sim, dim=1) # (num_batches, vocab_size)
# breakpoint()
return out
def fit_model_layers(model, train_loader, epochs, lr=0.01):
model.train()
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)
optimizer.zero_grad()
for e in range(epochs):
running_loss, num_batches = 0, 0
for x, y in train_loader:
out = model(x)
# calculate loss
loss = loss_fn(out, y) # loss.requires_grad = False !!!!!!
running_loss += loss.item()
num_batches += 1
# backprop + optimization step
loss.backward()
optimizer.step()
optimizer.zero_grad()
print(f'Epoch {e+1} loss: {running_loss / num_batches}')
我不太明白我哪里做错了。在PyTorch的优化示例中(https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html),前向传播中的中间结果都有设置requires_grad。
那可能出什么问题呢?
1 个回答
-1
无法重现这个问题。我运行了你提供的代码,发现损失项的 requires_grad=True
。
class Cbow(nn.Module):
def __init__(self, vocab_size, hctx_len, emb_size):
super().__init__()
self.proj = nn.Linear(in_features=vocab_size, out_features=emb_size, bias=False)
self.hidden = nn.Linear(in_features=emb_size, out_features=vocab_size, bias=False)
def forward(self, x):
# x: (num_batches, 2*hctx_len, vocab_size)
embs = self.proj(x) # (num_batches, 2*hctx_len, emb_size)
means = embs.mean(dim=1) # (num_batches, emb_size)
sim = self.hidden(means) # (num_batches, vocab_size)
out = torch.softmax(sim, dim=1) # (num_batches, vocab_size)
# breakpoint()
return out
vocab_size = 64
hctx_len = 128
emb_size = 128
bs = 12
model = Cbow(vocab_size, hctx_len, emb_size)
x = torch.randn(bs, 2*hctx_len, vocab_size)
y = torch.randint(0, vocab_size, (bs,))
out = model(x)
loss_fn = nn.CrossEntropyLoss()
loss = loss_fn(out, y)
print(loss.requires_grad)
> True
另外,CrossEntropyLoss
这个函数本身就包含了 softmax 操作,所以在计算损失之前在模型里再加一个 softmax 是不对的。