美国有线电视新闻网(CNN)过于适合少数族裔

2024-04-20 00:46:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我是一个新的开发CNNs和目前正在做一个二值图像分类器使用PyTorch。我的数据集是严重不平衡的,我已经手动增加了我的a测试和训练部分,以达到平衡。我有一个0类(训练集有6500个图像)和1类(训练集有5200个图像)。当我尝试使用skorch的拟合函数时,我得到的验证精度相当于集合中第1类图像的百分比,并且我的预测函数对所有图像只输出1。你知道吗

这是我改编CNN的教程:https://colab.research.google.com/github/dnouri/skorch/blob/master/notebooks/Transfer_Learning.ipynb#scrollTo=cane7VRWW3dO

这是我的CNN:(它改编自一个教程)

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
import numpy as np
from numpy import array
import pandas as pd
from skorch import NeuralNetClassifier
from skorch.helper import predefined_split
from skorch.callbacks import LRScheduler
from skorch.callbacks import Checkpoint
from skorch.callbacks import Freezer

from PIL import Image
import skorch
import os
import cv2
import glob

from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler, Normalizer
#Define transforms (will be same w/o online transforms)
#Manually augmented earlier
train_transforms = transforms.Compose([
    #transforms.RandomResizedCrop(224),
    #transforms.RandomHorizontalFlip(),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], 
                         [0.229, 0.224, 0.225])
])

val_transforms = transforms.Compose([
    #transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], 
                         [0.229, 0.224, 0.225])
])

train_ds = datasets.ImageFolder(train_split_path, train_transforms)
val_ds = datasets.ImageFolder(test_aug_split_path, val_transforms)

checkpoint = Checkpoint(
    f_params='best_model.pt', monitor='valid_acc_best')

#Using ResNet with some layers for now
class PretrainedModel(nn.Module):

    def __init__(self, output_features):
        super().__init__()
        model = models.resnet152(pretrained=True)
        #Don't want to change pretrained weights
        for param in model.parameters():
            param.requires_grad = False

        num_features = model.fc.in_features
        fc_layers = nn.Sequential(
                nn.Linear(num_features, 4096),
                nn.ReLU(inplace=True),
                nn.Dropout(p=0.3),
                nn.Linear(4096, output_features),
                nn.ReLU(inplace=True),
                nn.Dropout(p=0.3),
            )
        model.fc = fc_layers

        self.model = model

    def forward(self, x):
        return self.model(x)

use_cuda = torch.cuda.is_available()

net = NeuralNetClassifier(
    module=PretrainedModel,
    module__output_features = 2,
    criterion=nn.CrossEntropyLoss,
    batch_size = 16,
    lr=0.0001,
    max_epochs=3,
    optimizer=optim.Adam,
    train_split=predefined_split(val_ds),
    callbacks=[checkpoint],
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
)

net.fit(train_ds, y=None)

以下是拟合函数的结果:

Epoch 1:列车损失=.46有效\u acc=.4301有效损失=.6931

Epoch 2:列车损失=.6931有效\u acc=.4301有效损失=.6931

Epoch 3:列车损失=.6931有效\u acc=.4301有效损失=.6931

对于这个特定的数据集,我的验证图像中有43%是1类图像。你知道吗

y\u pred=预测净值(valu ds)给了我以下信息:

数组([1,1,1,…,1,1,1],dtype=int64)

我想我有两个问题:

1)在初始化CNN时是否有任何错误操作会导致此问题?你知道吗

2)什么会导致这种情况,我能做些什么来纠正它?你知道吗


Tags: from图像importmodelasdstrainnn