自动编码器模型要么振荡要么不收敛于MNIST数据

2024-03-28 17:51:54 发布

您现在位置:Python中文网/ 问答频道 /正文

已经在3个月前运行了代码并得到了预期的结果。什么都没变。尝试使用(几个)早期版本的代码进行故障排除,包括最早的版本(确实有效)。问题依然存在。你知道吗

# 4 - Constructing the undercomplete architecture
class autoenc(nn.Module):
    def __init__(self, nodes = 100):
        super(autoenc, self).__init__() # inheritence
        self.full_connection0 = nn.Linear(784, nodes) # encoding weights
        self.full_connection1 = nn.Linear(nodes, 784) # decoding weights
        self.activation = nn.Sigmoid()

    def forward(self, x):
        x = self.activation(self.full_connection0(x)) # input encoding
        x = self.full_connection1(x) # output decoding
        return x



# 5 - Initializing autoencoder, squared L2 norm, and optimization algorithm
model = autoenc().cuda()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(),
                          lr = 1e-3, weight_decay = 1/2)



# 6 - Training the undercomplete autoencoder model
num_epochs = 500
batch_size = 32
length = int(len(trn_data) / batch_size)

loss_epoch1 = []

for epoch in range(num_epochs):
    train_loss = 0
    score = 0. 


    for num_data in range(length - 2):
        batch_ind = (batch_size * num_data)
        input = Variable(trn_data[batch_ind : batch_ind + batch_size]).cuda()

        # === forward propagation ===
        output = model(input)
        loss = criterion(output, trn_data[batch_ind : batch_ind + batch_size])

        # === backward propagation ===
        loss.backward()

        # === calculating epoch loss ===
        train_loss += np.sqrt(loss.item())
        score += 1. #<- add for average loss error instead of total
        optimizer.step()

    loss_calculated = train_loss/score
    print('epoch: ' + str(epoch + 1) + '   loss: ' + str(loss_calculated))
    loss_epoch1.append(loss_calculated)

现在绘制损耗时,它会剧烈振荡(在lr=1e-3时)。而在3个月前,它正在稳步收敛(lr=1e-3)。你知道吗

由于最近创建了帐户,尚无法上载图片。你知道吗

How it looks like now.

虽然这是我把学习率降到1e-5的时候。当它在1e-3时,到处都是。你知道吗

How it should look like, and used to look like at lr = 1e-3.


Tags: selfinputdatasizemodelbatchnnnum
1条回答
网友
1楼 · 发布于 2024-03-28 17:51:54

你应该在做optimizer.zero_grad()之前做loss.backward(),因为梯度是累积的。这很可能是问题的根源。你知道吗

培训阶段应遵循的一般顺序:

optimizer.zero_grad()
output = model(input)
loss = criterion(output, label)
loss.backward()
optimizer.step()

此外,使用的重量衰减值(1/2)也会引起问题。你知道吗

相关问题 更多 >