简单的ResNet模型无法判断两幅单调图像是否同色
我在尝试训练一个图像比较模型时遇到了一些问题。为了简化,我把问题简化成了这样。
我给模型输入一对图像(3x128x128),这些图像要么是完全黑色的,要么是完全白色的。模型会分别通过两个resnet模型处理这两张图像,然后把输出结果合并在一起,再通过一个全连接层。模型应该返回1.0,如果两张图像的颜色相同(都是黑色或都是白色),否则返回0.0。然而,模型的输出总是接近0.5,尽管这个任务应该很简单。
模型的代码:
class TemplateEvaluator(nn.Module):
def __init__(self, q_encoder=resnet18(), t_encoder=resnet18()):
super(TemplateEvaluator, self).__init__()
self.q_encoder = q_encoder
self.t_encoder = t_encoder
# Set requires_grad to True to train resnet
for param in self.q_encoder.parameters():
param.requires_grad = True
for param in self.t_encoder.parameters():
param.requires_grad = True
self.fc = nn.Sequential(
nn.Linear(2000, 1),
nn.Sigmoid()
)
def forward(self, data):
q = data[0]
t = data[1]
# If singular images:
if q.ndim == 3: q = q.unsqueeze(0)
if t.ndim == 3: t = t.unsqueeze(0)
q = self.q_encoder(q)
t = self.t_encoder(t)
res = self.fc(torch.cat([q,t],-1)).flatten()
return res
数据加载器的代码:
class BlackOrWhiteDataset(Dataset):
def __init__(self):
self.tf = transforms.ToTensor()
def __getitem__(self, i):
black = (255,255,255)
white = (0,0,0)
x1_col = black if (np.random.random() > 0.5) else white
x2_col = black if (np.random.random() > 0.5) else white
y = torch.tensor(x1_col == x2_col, dtype=torch.float)
x1 = Image.new('RGB', (img_width,img_width), x1_col)
x2 = Image.new('RGB', (img_width,img_width), x2_col)
return self.tf(x1), self.tf(x2), y
def __len__(self):
return 100
def create_data_loader(dataset, batch_size, verbose=True):
dl = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True,
collate_fn=lambda x: tuple(x_.to(device) for x_ in default_collate(x)))
return dl
训练的代码:
t_eval = TemplateEvaluator().to(device)
opt = optim.SGD(t_eval.parameters(), lr=0.001, momentum=0.01)
epochs = 10
losses = []
for epoch in tqdm(range(epochs)):
t_eval.train()
for X1, X2, Y in dl:
Y_pred = t_eval(torch.stack([X1,X2]))
loss = F.mse_loss(Y_pred,Y)
opt.zero_grad()
loss.backward()
opt.step()
sys.stdout.write('\r')
sys.stdout.write("loss: %f" % loss.item())
sys.stdout.flush()
losses.append(loss.item())
plt.plot(losses)
plt.ylim(0,1)
还有结果的代码:
0%| | 0/10 [00:00<?, ?it/s]
loss: 0.259106
10%|█ | 1/10 [00:01<00:13, 1.54s/it]
loss: 0.241787
20%|██ | 2/10 [00:02<00:11, 1.40s/it]
loss: 0.258519
30%|███ | 3/10 [00:04<00:09, 1.36s/it]
loss: 0.250100
40%|████ | 4/10 [00:05<00:08, 1.35s/it]
loss: 0.257565
50%|█████ | 5/10 [00:06<00:06, 1.35s/it]
loss: 0.264662
60%|██████ | 6/10 [00:08<00:05, 1.35s/it]
loss: 0.246792
70%|███████ | 7/10 [00:09<00:04, 1.34s/it]
loss: 0.260988
80%|████████ | 8/10 [00:10<00:02, 1.34s/it]
loss: 0.241590
90%|█████████ | 9/10 [00:12<00:01, 1.34s/it]
loss: 0.250159
100%|██████████| 10/10 [00:13<00:00, 1.35s/it]
举个例子:
t_eval.eval()
for X1, X2, Y in dl:
view([X1[0],X2[0]])
print(Y[0].item())
print(t_eval(torch.stack([X1[0],X2[0]])).item())
break
得到的结果是:
或者:
当我把'Y'设置为全零时,模型确实收敛了,Y_pred接近零。所以优化器是有效的。当我把'Y'设置为指示第一张图像是否是黑色时,模型也如预期那样收敛。第二张图像也是如此。所以模型可以单独理解这两张图像。
因此,似乎模型无法将两个输入的信息结合起来,我不明白为什么。
更新
感谢用户23818208,我找到了一个解决方案。
单层感知器无法计算相等性。这被称为XOR/NXOR问题。现在我不再通过合并图像特征,而是进行逐元素相乘,像这样:
class TemplateEvaluator(nn.Module):
def __init__(self, q_encoder=resnet18(), t_encoder=resnet18()):
super(TemplateEvaluator, self).__init__()
self.q_encoder = q_encoder
self.t_encoder = t_encoder
self.fc = nn.Sequential(
nn.Linear(1000, 1),
nn.Sigmoid()
)
def forward(self, data):
q = data[0]
t = data[1]
if q.ndim == 3: q = q.unsqueeze(0)
if t.ndim == 3: t = t.unsqueeze(0)
q_features = self.q_encoder(q)
t_features = self.t_encoder(t)
combined_features = q_features * t_features
res = self.fc(combined_features).flatten()
return res
现在模型收敛了:
0%| | 0/10 [00:00<?, ?it/s]
loss: 0.065883
10%|█ | 1/10 [00:01<00:16, 1.89s/it]
loss: 0.002977
20%|██ | 2/10 [00:03<00:14, 1.76s/it]
loss: 0.000158
30%|███ | 3/10 [00:05<00:12, 1.74s/it]
loss: 0.000015
40%|████ | 4/10 [00:06<00:10, 1.71s/it]
loss: 0.000003
50%|█████ | 5/10 [00:08<00:08, 1.71s/it]
loss: 0.000002
60%|██████ | 6/10 [00:10<00:06, 1.70s/it]
loss: 0.000001
70%|███████ | 7/10 [00:12<00:05, 1.70s/it]
loss: 0.000001
80%|████████ | 8/10 [00:13<00:03, 1.69s/it]
loss: 0.000000
90%|█████████ | 9/10 [00:15<00:01, 1.70s/it]
loss: 0.000000
100%|██████████| 10/10 [00:17<00:00, 1.71s/it]
1 个回答
1
你的模型用来检查两个东西是否相等,它只有一层密集层。不过,单层感知器是学不会异或(XOR)这个函数的,进而也学不会同或(XNOR,表示相等)。这个结果在早期机器学习的历史中是非常有名的。