我如何在整个CNN训练集上运行PCA?

2024-04-24 21:55:59 发布

您现在位置:Python中文网/ 问答频道 /正文

上下文

我有一个CNN,分类准确率高达98%。训练时间约为2分钟。我想在训练CNN之前,通过在训练集上执行PCA来减少时间

问题

我这样做是因为我希望这能将训练时间从2分钟减少到1分钟甚至更少。问题是:我不知道如何在30000张训练图像上运行PCA,然后将这些图像传递给CNN。

  • 我已经在训练集中的几百张样本图像上运行了PCA,但我不知道如何在整个集上运行PCA
  • 此外,即使在我对所有训练图像运行PCA后,我如何将输出“连接”到CNN的输入?换句话说,我如何将PCA重建的低维图像输入CNN

我在网上搜索了很多例子或类似的问题,但都没有结果。如果有人能帮忙,那就太好了

下面是我的数据集中的几个样本图像在PCA后的样子

enter image description here

MWE(Google Colab Notebook):

pip install tensorflow
pip install numpy
pip install matplotlib

"""# Import Libraries"""

# Import Libraries
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout
from tensorflow.keras import layers
from tensorflow.keras.utils import to_categorical
import numpy as np
import matplotlib.pyplot as plt

plt.style.use('fivethirtyeight')

"""# Load Dataset"""

import pathlib
dataset_url = "*/TrainingSet.tar.gz"
data_dir = tf.keras.utils.get_file(origin = dataset_url,
                                   fname = "TrainingSet",
                                   untar = True)
data_dir = pathlib.Path(data_dir)

"""# Display # Images to check"""

print(list(data_dir.glob('*/*.png')))
image_count = len(list(data_dir.glob('*/*.png')))
print(image_count)

"""# Display sample image"""

pip install sklearn

import numpy as np
import os
import PIL
import PIL.Image
import tensorflow as tf
import tensorflow_datasets as tfds
from sklearn.decomposition import PCA

graphs = list(data_dir.glob('*/*.png'))
PIL.Image.open(str(graphs[6]))

"""# Define Image Dimensions & Batch Size"""

batch_size = 32
img_height = 36
img_width = 36

"""# Create Training & Validation Sets (80%, 20%)"""

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

"""# Define 3 Classes"""

class_names = ['Cubic Sinusoidal', 'Linear Sinusoidal', 'Quadratic Sinusoidal']
print(class_names)

"""# Supervised Learning (9 Samples from the Training Set)"""

!pip install skimage

from skimage import data
from skimage.color import rgb2gray

import matplotlib.pyplot as plt

subGraphs = []

plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
  for i in range(9):
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(images[i].numpy().astype("uint8"))
    subGraphs.append(images[i].numpy().astype("uint8"))
    plt.title(class_names[labels[i]])
    plt.axis("off")

subGraphs = np.array(subGraphs)
print(subGraphs.shape)

grayscale = rgb2gray(subGraphs[1])
print(grayscale.shape)

X=grayscale 

pca_oliv = PCA(n_components = 36)
X_proj = pca_oliv.fit_transform(X)

print(np.cumsum(pca_oliv.explained_variance_ratio_))
plt.plot(np.cumsum(pca_oliv.explained_variance_ratio_))

plt.imshow(np.reshape(pca_oliv.components_, (36,36)), cmap=plt.cm.bone, interpolation='nearest')

X_inv_proj = pca_oliv.inverse_transform(X_proj)
X_proj_img = np.reshape(X_inv_proj,(1,36,36))

plt.imshow(X_proj_img[0], cmap=plt.cm.bone, interpolation='nearest')

Tags: from图像imageimportimgdatasizetensorflow