CUDA驱动初始化失败,可能没有CUDA GPU

-2 投票
0 回答
70 浏览
提问于 2025-04-12 08:06

我没有sudo权限,联系系统管理员需要花费不少时间。

这是我运行 nvcc -V 命令得到的结果:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

这是运行 nvidia-smi 命令的输出:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A6000               Off |   00000000:1C:00.0 Off |                  Off |
| 30%   32C    P8             19W /  300W |      23MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA RTX A6000               Off |   00000000:1E:00.0 Off |                  Off |
| 30%   33C    P8             20W /  300W |      11MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA RTX A6000               Off |   00000000:3D:00.0 Off |                  Off |
| 30%   32C    P8             27W /  300W |      11MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA RTX A6000               Off |   00000000:3E:00.0 Off |                  Off |
| 30%   34C    P8             25W /  300W |      11MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA RTX A6000               Off |   00000000:3F:00.0 Off |                 Off* |
|ERR!   49C    P5            ERR! /  300W |      11MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA RTX A6000               Off |   00000000:40:00.0 Off |                  Off |
| 30%   31C    P8              6W /  300W |      11MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA RTX A6000               Off |   00000000:41:00.0 Off |                  Off |
| 30%   31C    P8             16W /  300W |      11MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA RTX A6000               Off |   00000000:5E:00.0 Off |                  Off |
| 30%   29C    P8              6W /  300W |      11MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      4216      G   /usr/libexec/Xorg                               9MiB |
|    0   N/A  N/A      4466      G   /usr/bin/gnome-shell                            4MiB |
|    1   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
|    2   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
|    3   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
|    4   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
|    5   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
|    6   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
|    7   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
+-----------------------------------------------------------------------------------------+

当我尝试运行

cuda_available = torch.cuda.is_available()
print("CUDA Available:", cuda_available)
if cuda_available:
    print("CUDA version:", torch.version.cuda)
    print("cuDNN version:", torch.backends.cudnn.version())
else:
    print("CUDA not available")

时,我遇到了以下错误:

/home/user_name/anaconda3/envs/llm2/lib/python3.10/site-packages/torch/cuda/__init__.py:141: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
 return torch._C._cuda_getDeviceCount() > 0
 CUDA Available: False
 CUDA not available

有没有办法在没有sudo权限的情况下修复这个错误呢?可能的解决方案有两个:

  1. 更新驱动程序
  2. 从源代码为cuda 12.4构建pytorch

如果我没记错的话,这两种方法都需要sudo权限。

0 个回答

暂无回答

撰写回答