TF:OOM分配张量时,但只有32个批次大小

2024-06-16 10:44:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用EfficientNetB0对所有图像进行矢量化,并将其保存以供进一步培训。 这是我正在运行的脚本:

from data import DataGenerator, vectorize_batch
from model import create_core_model
from pathlib import Path
import imgaug.augmenters as iaa
import tensorflow as tf

augment_fn = iaa.Sequential([
    iaa.Fliplr(0.5),
    iaa.Flipud(0.25),
    iaa.AddToHue((-50, 50)),
    iaa.ElasticTransformation(alpha=(0.0, 70.0), sigma=(4.0, 6.0)), 
])


if __name__ == "__main__":
    #load data
    data_generator = DataGenerator(Path.cwd() / 'input' / 'train', batch_size=32, augment_fn=augment_fn)
    #create core model
    model = create_core_model()
    #create vectors (unbatched)
    data = [] 
    #we pass data_generator 5 times so we have more augmented data
    for _ in range(5):
        for batch in data_generator:
            vector_batch = vectorize_batch(batch, model)
            for el in vector_batch:
                data.append(el)

我得到的错误是:

2020-11-22 10:39:15.157471: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at cwise_ops_common.h:107 : Resource exhausted: OOM when allocating tensor with shape[32,160,160,96] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

我尝试了batch_size=1,但问题仍然存在:

2020-11-22 10:48:26.827276: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at cwise_ops_common.h:107 : Resource exhausted: OOM when allocating tensor with shape[1,160,160,96] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

作为背景,我有16GB ram和1050Ti 4GB VRAM GPU。 在这个错误的len为254之后,我使用了Spyder和我的数据变量,所以它处理了一些数据,此时我剩下70%以上的RAM


Tags: infromcoreimportfordatamodelgpu