如何修复以下代码中的OOM错误?

2024-03-28 19:48:19 发布

您现在位置:Python中文网/ 问答频道 /正文

在这里,我实现了一个VGG-19变体代码,它提供了一个错误作为OOM,我如何修复它

代码环境已在Google Collab中创建,请说明如何使用GPU资源?GPU资源已经连接到创建的环境中,请告诉我如何编写代码来访问它

Python代码:

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
import numpy as np
import cv2
import os
from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array
from PIL import Image
from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array
from keras.layers.normalization import BatchNormalization
import csv


src_dir = '/content/drive/My Drive/CASIA_B90PerfectCentrallyAlinged2_Optical_Image/'
train_imgs = []
train_labels = []
test_imgs = []
test_labels = []
subjects = os.listdir(src_dir)
numberOfSubject = len(subjects)
print('Number of Subjects: ', numberOfSubject)

batch_size = 4
num_classes = numberOfSubject
epochs = 40
#178, 256, 1
model = Sequential()
model.add(Conv2D(64, (3, 3), padding='same', activation='relu', input_shape=(48, 48, 1)))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=1, padding='valid'))

model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=1, padding='valid'))

model.add(Conv2D(256, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(256, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(256, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=1, padding='valid'))

model.add(Conv2D(512, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(512, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(512, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=1, padding='valid'))

model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.summary()


opt = keras.optimizers.rmsprop(lr=0.001)

model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

错误:

Traceback (most recent call last):
  File "/content/drive/My Drive/GEINet_and_PEINet/VGG19_layer_1_less.py", line 48, in <module>
    model.add(Dense(4096, activation='relu'))
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/sequential.py", line 182, in add
    output_tensor = layer(self.outputs[0])
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 75, in symbolic_fn_wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py", line 463, in __call__
    self.build(unpack_singleton(input_shapes))
  File "/usr/local/lib/python3.6/dist-packages/keras/layers/core.py", line 895, in build
    constraint=self.kernel_constraint)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py", line 279, in add_weight
    weight = K.variable(initializer(shape, dtype=dtype),
  File "/usr/local/lib/python3.6/dist-packages/keras/initializers.py", line 227, in __call__
    dtype=dtype, seed=self.seed)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 4357, in random_uniform
    shape, minval=minval, maxval=maxval, dtype=dtype, seed=seed)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/backend.py", line 5686, in random_uniform
    shape, minval=minval, maxval=maxval, dtype=dtype, seed=seed)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/random_ops.py", line 296, in random_uniform
    shape, dtype, seed=seed1, seed2=seed2)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_random_ops.py", line 724, in random_uniform
    _ops.raise_from_not_ok_status(e, name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 6653, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[991232,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:RandomUniform]

Tags: infrompyimportaddmodelusrlocal
1条回答
网友
1楼 · 发布于 2024-03-28 19:48:19

问题不是使用GPU,而是GPU内存不足。这可能是因为网络规模太大,无法处理

请注意,您的输入大小为48x48x1,这意味着最后一个Conv层(经过4个池化步骤)的输出大小为3x3x512。该层与密度为4096的致密层相连,这意味着它具有3x3x512x4096参数。最后一层是额外的4096个神经离子,这意味着额外的4096x4096参数。仅在最后两个图层中,总共有超过36M个参数。然后以类数的大小将其连接到最后一层,这是额外的4096xnum_classes参数(可能很多,这取决于您拥有的类数)

所以,首先要尽量减少密集层中神经元的数量,另外,你可以在最后一个卷积层中使用马赫滤波器。用于最终线性分类的“嵌入”向量的典型大小为128-512,取决于问题和给定数据

相关问题 更多 >