我刚加入谷歌云机器学习引擎,
我正在尝试在gcloud中训练一种基于Keras的图像分类DL算法。
为了在gcloud上配置GPU,我在setup.py install_requires
中包含了'tensorflow-gpu'
。
我的cloud-gpu.yaml
如下所示
trainingInput:
scaleTier: BASIC_GPU
runtimeVersion: "1.0"
在我添加的代码中:
^{pr2}$一开始
^{3}$在任何Keras代码之前。在
结果是gcloud识别了gpu,但没有使用它,正如您从中可以看到的
实际云训练截图:
INFO 2018-11-18 12:19:59 -0600 master-replica-0 Epoch 1/20
INFO 2018-11-18 12:20:56 -0600 master-replica-0 1/219 [..............................] - ETA: 4:17:12 - loss: 0.8846 - acc: 0.5053 - f1_measure: 0.1043
INFO 2018-11-18 12:21:57 -0600 master-replica-0 2/219 [..............................] - ETA: 3:51:32 - loss: 0.8767 - acc: 0.5018 - f1_measure: 0.1013
INFO 2018-11-18 12:22:59 -0600 master-replica-0 3/219 [..............................] - ETA: 3:46:49 - loss: 0.8634 - acc: 0.5039 - f1_measure: 0.1010
INFO 2018-11-18 12:23:58 -0600 master-replica-0 4/219 [..............................] - ETA: 3:44:59 - loss: 0.8525 - acc: 0.5045 - f1_measure: 0.0991
INFO 2018-11-18 12:24:48 -0600 master-replica-0 5/219 [..............................] - ETA: 3:41:17 - loss: 0.8434 - acc: 0.5031 - f1_measure: 0.0992Sun Nov 18 18:24:48 2018
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | NVIDIA-SMI 396.26 Driver Version: 396.26 |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |-------------------------------+----------------------+----------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |===============================+======================+======================|
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | N/A 32C P0 56W / 149W | 10955MiB / 11441MiB | 0% Default |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-------------------------------+----------------------+----------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | Processes: GPU Memory |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | GPU PID Type Process name Usage |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |=============================================================================|
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
基本上GPU使用率在培训期间保持在0%,这怎么可能?在
我建议使用^{} ,它与
cloud-gpu.yaml
中的一个k80 GPU具有相同的n1-standard-8:这个:
^{pr2}$should be:
我建议回顾一下这个cnn_with_keras.py以获得更好的示例。在
相关问题 更多 >
编程相关推荐