仅在平面旋转下快速收敛的模板匹配模型为何在全3D方向下失败

Question

背景

我正在尝试一个模型，这个模型的目的是将一个已知物体的查询图像与相应的模板图像进行匹配，前提是它们的方向可能是相同的。（我会处理对称物体和严重遮挡的情况，因此这种关系通常是多对一的。）

我给模型输入一对图像（查询图像 + 候选模板图像），如果模型认为这两个物体的方向不同，我希望输出0.0；如果认为方向相同，则希望输出1.0。（我使用L1损失进行训练。）

我用合成数据批量训练这个模型，对于每个查询图像，我提供：

一个正例：查询图像与正确的关联模板图像（期望分类为1.0），
一个“负例”：相同的查询图像，但与一个随机的模板图像（期望分类为0.0）。

问题

奇怪的是，当负模板是正模板的平面旋转时，模型训练和表现得几乎令人怀疑地好（正例的平均分类约为0.99，负例的平均分类约为0.1）。但是当负模板是一个完全随机的模板，具有任何3D物体的方向时，模型就很挣扎（正例的平均分类约为0.75，负例的平均分类约为0.5）。这让我觉得很奇怪，因为正例和负例之间应该有更多的区别，因此应该更容易区分它们。

代码

模型：

class TemplateEvaluator(nn.Module):
    def __init__(self, q_encoder=resnet18(weights=ResNet18_Weights.IMAGENET1K_V1), t_encoder=resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)):
        super(TemplateEvaluator, self).__init__()
        self.q_encoder = q_encoder
        self.t_encoder = t_encoder
        
        self.fc = nn.Sequential(
            nn.Linear(2000, 1),
            nn.Sigmoid()
        )
    
    def forward(self, data):
        q = data[0]
        t = data[1]
        q = self.q_encoder(q)
        t = self.t_encoder(t)
        res = self.fc(torch.cat([q,t],-1))
        return res

训练步骤：

cb_id包含关联的正确模板的ID（与之角度差最小的模板）
t_img_rand是负模板

def template_eval_train_step(iteration, models, data, codebook, opts=None, show=False, metric_label=''):
    # Get query image, associated codebook template ID, and associated orientation
    q_img, cb_id, rot = data
    n = q_img.shape[0]
    t_eval = models[0]
    
    # Get random template IDs
    cb_id_rand = np.random.choice(codebook["size"],n)

    # Get associated and random template images
    t_img = torch.stack([cb_get_img(i,codebook) for i in cb_id]).to(device)

    # Uncomment to use random template as neg cases
    t_img_rand = torch.stack([cb_get_img(i,codebook) for i in cb_id_rand]).to(device)
    # Uncomment to use in-plane rotations of pos template as neg cases
    # t_img_rand = torch.stack([rotate_image_tensor(y,np.random.random()*360) for y in t_img])
    
    # Cases with similar template image ('Positive')
    p_cases = torch.stack([q_img.permute(0, 3, 1, 2),t_img.permute(0, 3, 1, 2)])

    # Cases with random template image ('Negative')
    n_cases = torch.stack([q_img.permute(0, 3, 1, 2),t_img_rand.permute(0, 3, 1, 2)])

    # Mix together for 50/50 distribution in batch
    mixed_cases = torch.concat([p_cases,n_cases], 1)

    # Run model
    c = t_eval(mixed_cases)

    # Get classification for pos and neg cases
    p_cls = c[:n]
    n_cls = c[n:]

    # Compute loss
    p_loss = F.l1_loss(p_cls, torch.ones_like(p_cls, requires_grad=True))
    n_loss = F.l1_loss(n_cls, torch.zeros_like(n_cls, requires_grad=True))
    loss = (p_loss + n_loss)/2
    
    # Visualise pos and neg case at i=0
    if show:
        i=0
        view([q_img[i].detach().cpu().numpy(), t_img[i].detach().cpu().numpy()])
        print("p_cls:",p_cls[i].detach().cpu().numpy())
        view([q_img[i].detach().cpu().numpy(), t_img_rand[i].detach().cpu().numpy()])
        print("n_cls:",n_cls[i].detach().cpu().numpy())

    # Run optimizer (if given)
    if opts is not None:
        opts[0].zero_grad()
        loss.backward()
        
        # Print gradient info
        if show:
            t_eval.cpu()
            plot_grad_flow(t_eval.named_parameters())
            t_eval.to(device)
        
        opts[0].step()

    # Compute eval metrics
    p_rate = p_cls.sum() / n
    n_rate = n_cls.sum() / n

    # Garbage collection 
    gc.collect()

    return [ {"label": metric_label, "name": "loss", "value":loss.cpu().item()},
             {"label": metric_label, "name": "p_rate", "value":p_rate.cpu().item()},
             {"label": metric_label, "name": "n_rate", "value":n_rate.cpu().item()}]

训练循环：

init_train, init_verify只是将模型设置为训练或评估模式
train_step是之前的函数

def fit(epochs, models, init_train, init_verify, train_step, verify_step, opts, train_dl, verify_dl, eval_dl, codebook, vis_epoch_step=10):
    train_data = []
    verify_data = []
    eval_data = []

    for epoch in tqdm(range(epochs)):
        init_train(epoch, models)
        
        i = 0
        for data in train_dl:
            train_metrics = train_step(epoch, models, data, opts=opts, codebook=codebook, show=epoch % vis_epoch_step == 0 and i == 0)
            train_data.append(train_metrics)
            i = i + 1
            
            n = len(train_dl)
            p = round((i/n)*100)
            if p>0:
                sys.stdout.write('\r')
                bar_len = round(p/5)
                empty_len = round((100-p)/5)
                sys.stdout.write("Train batch %d/%d [%s%s] %d%%" % (i, n, '#'*bar_len, '_'*empty_len, p))
                sys.stdout.flush()
            
        # verification step
        init_verify(epoch, models)
        with torch.no_grad():
            
            i = 0
            for data in verify_dl:
                verify_metrics = verify_step(epoch, models, data, codebook=codebook, show=epoch % vis_epoch_step == 0 and i == 0)
                verify_data.append(verify_metrics)
                i = i + 1
            
                n = len(verify_dl)
                p = round((i/n)*100)
                if p>0:
                    sys.stdout.write('\r')
                    bar_len = round(p/5)
                    empty_len = round((100-p)/5)
                    sys.stdout.write("Verification batch %d/%d [%s%s] %d%%" % (i, n, '#'*bar_len, '_'*empty_len, p))
                    sys.stdout.flush()
...

使用正模板的平面旋转作为负模板的结果

t_img_rand = torch.stack([rotate_image_tensor(y,np.random.random()*360) for y in t_img])

训练（p/n_rate是正/负例分类的平均值）：

示例案例：

p_cls: [0.998]

n_cls: [0.000]

使用随机模板作为负模板（相同的模型初始化、优化器和超参数）：

cb_id_rand = np.random.choice(codebook["size"],n)
t_img_rand = torch.stack([cb_get_img(i,codebook) for i in cb_id_rand]).to(device)

训练（p/n_rate是正/负例分类的平均值）：

示例案例：

p_cls: [0.001]

n_cls: [0.998]

数据增强模板匹配 3D方向图像分类 L1损失训练模型负例样本对称物体

仅在平面旋转下快速收敛的模板匹配模型为何在全3D方向下失败

1 个回答

撰写回答