我想训练一个使用PyTorch 1.6.0和多个GPU的模型。我设定了所有的种子和CUDA基准
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
然而,在两次运行中,损失看起来有所不同
l_sup是使用另一个固定的预训练模型来监督中间特征提取层的损失。l_pix是原始模型损耗
代码如下所示:
Class Model:
def __init__(self):
self.basic_model = BasicModel()
load_path = ...
self.pretrained_network = PretrainNetwork()
self.load_network(self.pretrained_network, load_path)
self.pretrained_network.eval()
for p in self.pretrained_network.parameters():
p.requires_grad = False
self.basic_model = DataParallel(self.basic_model)
self.pretrained_network = DataParallel(self.pretrained_network)
def load_network(self, net, load_path, strict=True, param_key='params'):
if isinstance(net, (DataParallel, DistributedDataParallel)):
net = net.module
load_net = torch.load(load_path, map_location=lambda storage, loc: storage)
if param_key is not None:
load_net = load_net[param_key]
for k, v in deepcopy(load_net).items():
if k.startswith('module.'):
load_net[k[7:]] = v
load_net.pop(k)
net.load_state_dict(load_net, strict=strict)
def optimize_parameters(self):
self.optimizer.zero_grad()
# left_feature and right_feature are the output of middle feature extraction layer, cost_volume is the final output of the model
left_feature, right_feature, cost_volume = self.basic_model(self.left_seq, self.right_seq)
# the output of middle feature extraction layer of pre-trained model
left_cnn_feature, right_cnn_feature = self.pretrained_network(self.left_seq, self.right_seq)
l_total = 0
# loss of original model
l_pix = self.basic_loss(cost_volume, self.gt)
l_total += l_pix
# loss of supervision of pre-trained model
l_sup = 0.5 * self.feature_loss(left_feature, left_cnn_feature) + 0.5 * self.feature_loss(right_feature, right_cnn_feature)
l_total += l_sup
l_total.backward()
self.optimizer.step()
为了确保预训练模型不会更新其参数,我加载预训练模型的权重并写入
self.pretrained_network.eval()
for p in self.pretrained_network.parameters():
p.requires_grad = False
奇怪的是,如果我只使用l_pix,这意味着删除这两行
l_sup = 0.5 * self.feature_loss(left_feature, left_cnn_feature) + 0.5 * self.feature_loss(right_feature, right_cnn_feature)
l_total += l_sup
保证了再现性。我想知道为什么添加l_sup会导致不可再现性
目前没有回答
相关问题 更多 >
编程相关推荐