Tensorflow GPU在Python2.7的multiprocessing.process调用派生的新进程中不可用

2024-06-09 23:21:00 发布

您现在位置:Python中文网/ 问答频道 /正文

系统规格

  • Ubuntu 18.04服务器
  • 已安装GPU:Nvidia P1000
  • Cuda版本:Cuda版本10.1.243
  • Tensorflow:tensorflow-gpu==1.15.

我注意到一个非常奇怪的错误,GPU只对Python进程树的根进程中的Tensorflow可用。如果我使用multiprocessing.Process()分叉进程,GPU将不再可用

示例代码:

import tensorflow as tf

import multiprocessing
import os
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


def main():
    logger.info("main(): tf.test.is_gpu_available(): %s", tf.test.is_gpu_available())
    process = multiprocessing.Process(target=run_tensorflow, args=())
    process.daemon = False
    process.start()


def run_tensorflow():
    logger.info("main(): tf.test.is_gpu_available(): %s", tf.test.is_gpu_available())

if __name__ == '__main__':
    main()

输出

2020-04-17 05:01:37.834131: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-17 05:01:37.855703: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2020-04-17 05:01:37.856170: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55bb442b0560 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-17 05:01:37.856184: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-04-17 05:01:37.857492: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-04-17 05:01:37.940480: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:01:37.940856: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55bb44337c50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-17 05:01:37.940872: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1660 SUPER, Compute Capability 7.5
2020-04-17 05:01:37.940974: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:01:37.941214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2020-04-17 05:01:37.941410: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-04-17 05:01:37.942234: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-04-17 05:01:37.942998: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-04-17 05:01:37.943193: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-04-17 05:01:37.944143: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-04-17 05:01:37.944915: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-04-17 05:01:37.947293: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-17 05:01:37.947399: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:01:37.947708: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:01:37.947945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-04-17 05:01:37.947970: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-04-17 05:01:37.948442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-17 05:01:37.948452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2020-04-17 05:01:37.948457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2020-04-17 05:01:37.948548: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:01:37.948813: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:01:37.949069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 5450 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:__main__:main(): tf.test.is_gpu_available(): True
2020-04-17 05:01:37.954340: E tensorflow/stream_executor/cuda/cuda_driver.cc:1247] could not retrieve CUDA device count: CUDA_ERROR_NOT_INITIALIZED: initialization error
2020-04-17 05:01:37.954384: E tensorflow/stream_executor/cuda/cuda_driver.cc:1247] could not retrieve CUDA device count: CUDA_ERROR_NOT_INITIALIZED: initialization error
INFO:__main__:main(): tf.test.is_gpu_available(): False

重要的部分(我认为)是

INFO:__main__:main(): tf.test.is_gpu_available(): True

先是

INFO:__main__:run_tensorflow(): tf.test.is_gpu_available(): False

为什么我不能从子进程获得GPU的句柄


编辑:如果我等待导入tensorflow,直到我完成这个过程之后,我才能看到GPU,这可能会很有用

import multiprocessing
import os
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


def main():
    #logger.info("main(): tf.test.is_gpu_available(): %s", tf.test.is_gpu_available())
    process = multiprocessing.Process(target=run_tensorflow, args=())
    process.daemon = False
    process.start()


def run_tensorflow():
    import tensorflow as tf
    logger.info("main(): tf.test.is_gpu_available(): %s", tf.test.is_gpu_available())

if __name__ == '__main__':
    main()
2020-04-17 05:08:25.256372: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-17 05:08:25.279630: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2020-04-17 05:08:25.280028: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5606fe0d0170 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-17 05:08:25.280047: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-04-17 05:08:25.281970: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-04-17 05:08:25.370354: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:08:25.370696: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5606fe157820 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-17 05:08:25.370713: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1660 SUPER, Compute Capability 7.5
2020-04-17 05:08:25.370815: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:08:25.371047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2020-04-17 05:08:25.371225: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-04-17 05:08:25.372088: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-04-17 05:08:25.372890: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-04-17 05:08:25.373070: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-04-17 05:08:25.374055: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-04-17 05:08:25.374872: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-04-17 05:08:25.377440: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-17 05:08:25.377538: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:08:25.377835: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:08:25.378052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-04-17 05:08:25.378082: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-04-17 05:08:25.378552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-17 05:08:25.378564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2020-04-17 05:08:25.378569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2020-04-17 05:08:25.378638: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:08:25.378883: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:08:25.379117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 5450 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:__main__:main(): tf.test.is_gpu_available(): True

Tags: nodedefaultstreamgpumaindevicetensorflowservice
1条回答
网友
1楼 · 发布于 2024-06-09 23:21:00

默认情况下,Tensorflow贪婪地使用GPU内存分配。 Limiting GPU memory growth描述了几种限制GPU分配的方法。这应该允许多个Tensorflow程序共享一个GPU。 然而,我完全不知道Tensorflow如何处理fork()——特别是当GPU已经处于活动状态时——并且很难相信它能正常工作。在导入Tensorflow(或至少使用它)之前,可以使用fork()

相关问题 更多 >