我正在使用roberta的问答模型来解决googlecolab上的tweet情绪提取问题
但是模型无法训练,因为我得到了一个ResourceExhausterRor
请参阅完整错误:
ResourceExhaustedError: OOM when allocating tensor with shape[32,16,128,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model/tf_roberta_model/roberta/encoder/layer_._17/attention/self/transpose (defined at /usr/local/lib/python3.7/dist-packages/transformers/models/roberta/modeling_tf_roberta.py:218) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[Op:__inference_train_function_112984]...
请参见此处的模型:
ids = Input((MAX_LEN,), dtype=tf.int32)
att = Input((MAX_LEN,), dtype=tf.int32)
bert_model = TFRobertaModel.from_pretrained('roberta-large')
x = bert_model(ids, attention_mask= att)
x1 = Dropout(0.1)(x[0])
x1 = Conv1D(1,1)(x1)
x1 = Flatten()(x1)
x1 = Activation('softmax')(x1)
x2 = Dropout(0.1)(x[0])
x2 = Conv1D(1,1)(x2)
x2 = Flatten()(x2)
x2 = Activation('softmax')(x2)
model = Model(inputs = [ids, att], outputs = [x1, x2])
如能帮助解决此错误,将不胜感激
根据我的经验,您可以使用
Gradient Accumulation
技术。或者,如果您能够设法使用Google Colab Pro,则是一个更好的选择。根据文件这些变压器型号内存不足,因此使用Colab Pro非常方便。但是,您也可以使用Colab中提供的TPU加速器,但请注意,它比Kaggle TPU慢得多
相关问题 更多 >
编程相关推荐