CUDA驱动初始化失败,可能没有CUDA GPU
我没有sudo权限,联系系统管理员需要花费不少时间。
这是我运行 nvcc -V
命令得到的结果:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0
这是运行 nvidia-smi
命令的输出:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67 Driver Version: 550.67 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A6000 Off | 00000000:1C:00.0 Off | Off |
| 30% 32C P8 19W / 300W | 23MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA RTX A6000 Off | 00000000:1E:00.0 Off | Off |
| 30% 33C P8 20W / 300W | 11MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA RTX A6000 Off | 00000000:3D:00.0 Off | Off |
| 30% 32C P8 27W / 300W | 11MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA RTX A6000 Off | 00000000:3E:00.0 Off | Off |
| 30% 34C P8 25W / 300W | 11MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA RTX A6000 Off | 00000000:3F:00.0 Off | Off* |
|ERR! 49C P5 ERR! / 300W | 11MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA RTX A6000 Off | 00000000:40:00.0 Off | Off |
| 30% 31C P8 6W / 300W | 11MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA RTX A6000 Off | 00000000:41:00.0 Off | Off |
| 30% 31C P8 16W / 300W | 11MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA RTX A6000 Off | 00000000:5E:00.0 Off | Off |
| 30% 29C P8 6W / 300W | 11MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 4216 G /usr/libexec/Xorg 9MiB |
| 0 N/A N/A 4466 G /usr/bin/gnome-shell 4MiB |
| 1 N/A N/A 4216 G /usr/libexec/Xorg 4MiB |
| 2 N/A N/A 4216 G /usr/libexec/Xorg 4MiB |
| 3 N/A N/A 4216 G /usr/libexec/Xorg 4MiB |
| 4 N/A N/A 4216 G /usr/libexec/Xorg 4MiB |
| 5 N/A N/A 4216 G /usr/libexec/Xorg 4MiB |
| 6 N/A N/A 4216 G /usr/libexec/Xorg 4MiB |
| 7 N/A N/A 4216 G /usr/libexec/Xorg 4MiB |
+-----------------------------------------------------------------------------------------+
当我尝试运行
cuda_available = torch.cuda.is_available()
print("CUDA Available:", cuda_available)
if cuda_available:
print("CUDA version:", torch.version.cuda)
print("cuDNN version:", torch.backends.cudnn.version())
else:
print("CUDA not available")
时,我遇到了以下错误:
/home/user_name/anaconda3/envs/llm2/lib/python3.10/site-packages/torch/cuda/__init__.py:141: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
CUDA Available: False
CUDA not available
有没有办法在没有sudo权限的情况下修复这个错误呢?可能的解决方案有两个:
- 更新驱动程序
- 从源代码为cuda 12.4构建pytorch
如果我没记错的话,这两种方法都需要sudo权限。
0 个回答
暂无回答