我的LSTM模型无法学习,权重无法更新

2024-04-25 19:01:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我在pytorch中的LSTM模型无法学习,并且在培训期间也无法得到任何更新。。。。在每一个历元之后,我打印出每一层不同的权重之和,但它仍然没有得到任何更新

y数组是(n,3),第一列保持实际大小,第二列是标签(0或1),最后一列是惩罚损失函数的权重

显然,optimizer.step()不起作用,也不对权重应用渐变。另作说明;我用不同的学习率和小批量尝试了这个模型,结果没有什么不同。显示的结果来自随机生成的虚拟变量,y标签是一个不平衡的数据集~3%!我尝试了不同的权重来克服不平衡的配给,但我猜模型出了问题

此外,如果我使用自己的数据集运行模型,lstm层(所有四个参数)的梯度将为零!但在线性层中存在梯度

import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence

class LSTMClassifier(nn.Module):
    """
    This is the simple RNN model we will be using to perform Sentiment Analysis.
    """

    def __init__(self, feature_size, hidden_dim , layer_dim = 1):
        """
        Initialize the model by settingg up the various layers.
        """
        super(LSTMClassifier, self).__init__()
        
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim
        self.lstm = nn.LSTM(feature_size, hidden_dim , layer_dim,  batch_first = True)
        self.dense = nn.Linear(in_features=hidden_dim, out_features=1)
        self.sig = nn.Sigmoid()
        
        
    def init_hidden(self, x):
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)
        return [t for t in (h0, c0)]
        
        
    
    def forward(self, x , y):
        #import pdb; pdb.set_trace()
        """
        Perform a forward pass of our model on some input.
        """
        #h0, c0 = self.init_hidden(x)
        x_seq = y[:,0]
        x = pack_padded_sequence(x, x_seq, batch_first=True , enforce_sorted = False)
        lstm_out, _ = self.lstm(x)
        lstm_out, _ = torch.nn.utils.rnn.pad_packed_sequence(lstm_out, batch_first=True)
        lstm_out = lstm_out.contiguous()
        out = self.dense(lstm_out)[:,-1,:]
        #out = out[range(len(x_seq)), (x_seq - 1)]
        return self.sig(out.squeeze())
def _get_train_data_loader(batch_size, X , y):
    print("Get train/test data loader.")

   
    train_y = torch.from_numpy(y).long()
    try :
        train_X = torch.from_numpy(X).float()
    except :
        train_X = X

    train_ds = torch.utils.data.TensorDataset(train_X, train_y)

    return torch.utils.data.DataLoader(train_ds, batch_size=batch_size , shuffle=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


print ('train dist' , sum(y_train[:,1])/y_train.shape[0] , '\n ****\n' , 
      'test dist' , sum(y_test[:,1])/y_test.shape[0])
optimizer = optim.SGD(model.parameters() , lr=.01 )
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

train_loader = _get_train_data_loader(batch_size = 128, X = X_ , y = y_weighted)
test_loader = _get_train_data_loader(batch_size = 512, X = X_test , y = y_test)

epochs = 2
model = LSTMClassifier (87 , 5)

loss_fn = torch.nn.BCELoss(reduction='none')
train_loss = []
test_loss = []
for epoch in range(1, epochs + 1):
    print ('epoch = ' , epoch)
    model.train()
    total_loss = 0
    for batch in train_loader:         
        batch_X, batch_y = batch

        #batch_X = batch_X.to(device)
        #batch_y = batch_y.to(device)

        # TODO: Complete this train method to train the model provided.
        optimizer.zero_grad()
        
        
         # Forward pass
        outputs = model(batch_X , batch_y)
        y_ = batch_y[:,1].float()
        loss = loss_fn(outputs, y_)
        weight=batch_y[:,2].float()
        loss = (loss * weight).mean()
        

        # Backward and optimize
        loss.backward()
        optimizer.step()
        


        total_loss += loss.data.item()
        
    for p in model.parameters():
           print(torch.sum(p.grad))
        
    print ('total loss' , total_loss)
    print ('lstm weight' , torch.sum(model.lstm.weight_hh_l0.data) , 'dense_weight' , torch.sum(model.dense.weight.data))
    
    train_loss.append(total_loss)
    with torch.no_grad():
        n_correct = 0
        n_samples = 0
        for test, labels in test_loader:
            #labels = labels.to(device)
            outputs = model(test , labels)
            # max returns (value ,index)
            predicted = torch.round(outputs)
            n_samples += labels.size(0)
            n_correct += (predicted == labels[:,1]).sum().item()

        acc = 100.0 * n_correct / n_samples
        print(f'Accuracy of the network on the 10000 test images: {acc} %') 
        test_loss.append(acc)

******************************************
epoch =  1
tensor(6518.8760)
tensor(-236.9392)
tensor(149.6967)
tensor(149.6966)
tensor(-1551.1709)
tensor(5021.9199)
total loss 1871.447255373001
lstm weight tensor(3.0054) dense_weight tensor(-0.5395)
Accuracy of the network on the 10000 test images: 96.92037099752012 %
epoch =  2
tensor(7822.5503)
tensor(-284.3271)
tensor(179.6338)
tensor(179.6338)
tensor(-1861.4037)
tensor(6026.2920)
total loss 1871.465574145317
lstm weight tensor(3.0054) dense_weight tensor(-0.5395)
Accuracy of the network on the 10000 test images: 96.92037099752012 %
***********************************************

另一个问题是,当我们使用数据加载器加载数据时,如何在BCEloss(weight)中使用权重?剩下的唯一方法是在每个循环中实例化一个loss_fn

非常感谢您的帮助


Tags: thetestselfsizemodelbatchtrainnn