MultiGPU在内存分配方面如何扩展？

1条回答

网友

1楼 · 发布于 2024-05-14 09:06:53

我首先建议您在单个GPU上进行训练时检查内存使用情况；我怀疑您的数据集没有加载到GPU内存中，而是加载到RAM中

您可以尝试设置：

一,

  import os
  #Enable system to see only one of the video cards
  os.environ["CUDA_VISIBLE_DEVICES"] = "0"/"1"

检查以查看确切的映射（tensorflow可以看到您的GPU）：

tf.config.list_physical_devices('GPU')

现在验证在这种情况下使用了多少VRAM：

然后，在终端中，您可以使用nvidia-smi检查分配了多少GPU内存；同时，使用watch -n K nvidia-smi

使用多GPU时，请确保使用tf.distribute.MirroredStrategy()并声明模型创建+拟合逻辑，如下所示：

 strategy = tf.distribute.MirroredStrategy()
 print('Number of devices: {}'.format(strategy.num_replicas_in_sync))

 # Open a strategy scope.
 with strategy.scope():
   # Everything that creates variables should be under the strategy scope.
   # In general this is only model construction & `compile()`.
   model = Model(...)
   model.compile(...)

超出战略范围

model.fit(train_dataset, validation_data=val_dataset, ...)

model.evaluate(test_dataset)

超出战略范围

相关问题更多 >

编程相关推荐

热门问题

热门文章

MultiGPU在内存分配方面如何扩展？

超出战略范围

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >