Solvednvidia docker cgroup issue with nvidia container runtime on Debian testing

1. Issue or feature description

Whenever I try to build or run an NVidia container, Docker fails with the error message:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

2. Steps to reproduce the issue

$ docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi

3. Information to attach (optional if deemed irrelevant)

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
I0107 20:43:11.917241 36435 nvc.c:282] initializing library context (version=1.3.1, build=ac02636a318fe7dcc71eaeb3cc55d0c8541c1072)
I0107 20:43:11.917283 36435 nvc.c:256] using root /
I0107 20:43:11.917290 36435 nvc.c:257] using ldcache /etc/ld.so.cache
I0107 20:43:11.917300 36435 nvc.c:258] using unprivileged user 1000:1000
I0107 20:43:11.917316 36435 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0107 20:43:11.917404 36435 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W0107 20:43:11.918351 36436 nvc.c:187] failed to set inheritable capabilities
W0107 20:43:11.918381 36436 nvc.c:188] skipping kernel modules load due to failure
I0107 20:43:11.918527 36437 driver.c:101] starting driver service
I0107 20:43:11.921734 36435 nvc_info.c:680] requesting driver information with ''
I0107 20:43:11.932012 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.450.80.02
I0107 20:43:11.932402 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.450.80.02
I0107 20:43:11.932976 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.450.80.02
I0107 20:43:11.933027 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.450.80.02
I0107 20:43:11.933435 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.450.80.02
I0107 20:43:11.933470 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.450.80.02
I0107 20:43:11.933501 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.80.02
I0107 20:43:11.933991 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.450.80.02
I0107 20:43:11.934024 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.450.80.02
I0107 20:43:11.934094 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.450.80.02
I0107 20:43:11.934545 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.450.80.02
I0107 20:43:11.934976 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.450.80.02
I0107 20:43:11.935258 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.450.80.02
I0107 20:43:11.935783 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv2_nvidia.so.450.80.02
I0107 20:43:11.936188 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv1_CM_nvidia.so.450.80.02
I0107 20:43:11.936243 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.450.80.02
I0107 20:43:11.936622 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.450.80.02
I0107 20:43:11.937013 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.450.80.02
I0107 20:43:11.937296 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.450.80.02
I0107 20:43:11.937573 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.450.80.02
I0107 20:43:11.937881 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.450.80.02
I0107 20:43:11.938438 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/nvidia/current/libGLX_nvidia.so.450.80.02
I0107 20:43:11.938920 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/nvidia/current/libGLESv2_nvidia.so.450.80.02
I0107 20:43:11.939282 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/nvidia/current/libGLESv1_CM_nvidia.so.450.80.02
I0107 20:43:11.939730 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/nvidia/current/libEGL_nvidia.so.450.80.02
W0107 20:43:11.939751 36435 nvc_info.c:350] missing library libnvidia-opencl.so
W0107 20:43:11.939756 36435 nvc_info.c:350] missing library libnvidia-fatbinaryloader.so
W0107 20:43:11.939761 36435 nvc_info.c:350] missing library libnvidia-allocator.so
W0107 20:43:11.939767 36435 nvc_info.c:350] missing library libnvidia-compiler.so
W0107 20:43:11.939772 36435 nvc_info.c:350] missing library libnvidia-ngx.so
W0107 20:43:11.939776 36435 nvc_info.c:350] missing library libvdpau_nvidia.so
W0107 20:43:11.939780 36435 nvc_info.c:350] missing library libnvidia-opticalflow.so
W0107 20:43:11.939785 36435 nvc_info.c:350] missing library libnvidia-fbc.so
W0107 20:43:11.939790 36435 nvc_info.c:350] missing library libnvidia-ifr.so
W0107 20:43:11.939795 36435 nvc_info.c:350] missing library libnvoptix.so
W0107 20:43:11.939801 36435 nvc_info.c:350] missing library libnvidia-cbl.so
W0107 20:43:11.939805 36435 nvc_info.c:354] missing compat32 library libnvidia-ml.so
W0107 20:43:11.939810 36435 nvc_info.c:354] missing compat32 library libnvidia-cfg.so
W0107 20:43:11.939814 36435 nvc_info.c:354] missing compat32 library libcuda.so
W0107 20:43:11.939818 36435 nvc_info.c:354] missing compat32 library libnvidia-opencl.so
W0107 20:43:11.939823 36435 nvc_info.c:354] missing compat32 library libnvidia-ptxjitcompiler.so
W0107 20:43:11.939828 36435 nvc_info.c:354] missing compat32 library libnvidia-fatbinaryloader.so
W0107 20:43:11.939832 36435 nvc_info.c:354] missing compat32 library libnvidia-allocator.so
W0107 20:43:11.939837 36435 nvc_info.c:354] missing compat32 library libnvidia-compiler.so
W0107 20:43:11.939841 36435 nvc_info.c:354] missing compat32 library libnvidia-ngx.so
W0107 20:43:11.939846 36435 nvc_info.c:354] missing compat32 library libvdpau_nvidia.so
W0107 20:43:11.939851 36435 nvc_info.c:354] missing compat32 library libnvidia-encode.so
W0107 20:43:11.939856 36435 nvc_info.c:354] missing compat32 library libnvidia-opticalflow.so
W0107 20:43:11.939860 36435 nvc_info.c:354] missing compat32 library libnvcuvid.so
W0107 20:43:11.939865 36435 nvc_info.c:354] missing compat32 library libnvidia-fbc.so
W0107 20:43:11.939870 36435 nvc_info.c:354] missing compat32 library libnvidia-ifr.so
W0107 20:43:11.939874 36435 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so
W0107 20:43:11.939879 36435 nvc_info.c:354] missing compat32 library libnvoptix.so
W0107 20:43:11.939884 36435 nvc_info.c:354] missing compat32 library libnvidia-cbl.so
I0107 20:43:11.940108 36435 nvc_info.c:276] selecting /usr/lib/nvidia/current/nvidia-smi
I0107 20:43:11.940153 36435 nvc_info.c:276] selecting /usr/lib/nvidia/current/nvidia-debugdump
I0107 20:43:11.940169 36435 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced
W0107 20:43:11.941108 36435 nvc_info.c:376] missing binary nvidia-cuda-mps-control
W0107 20:43:11.941117 36435 nvc_info.c:376] missing binary nvidia-cuda-mps-server
I0107 20:43:11.941136 36435 nvc_info.c:438] listing device /dev/nvidiactl
I0107 20:43:11.941142 36435 nvc_info.c:438] listing device /dev/nvidia-uvm
I0107 20:43:11.941146 36435 nvc_info.c:438] listing device /dev/nvidia-uvm-tools
I0107 20:43:11.941151 36435 nvc_info.c:438] listing device /dev/nvidia-modeset
I0107 20:43:11.941175 36435 nvc_info.c:317] listing ipc /run/nvidia-persistenced/socket
W0107 20:43:11.941193 36435 nvc_info.c:321] missing ipc /tmp/nvidia-mps
I0107 20:43:11.941198 36435 nvc_info.c:745] requesting device information with ''
I0107 20:43:11.947879 36435 nvc_info.c:628] listing device /dev/nvidia0 (GPU-6518be5e-14ff-e277-21aa-73b482890bee at 00000000:07:00.0)
NVRM version:   450.80.02
CUDA version:   11.0

Device Index:   0
Device Minor:   0
Model:          GeForce GTX 980 Ti
Brand:          GeForce
GPU UUID:       GPU-6518be5e-14ff-e277-21aa-73b482890bee
Bus Location:   00000000:07:00.0
Architecture:   5.2
I0107 20:43:11.947903 36435 nvc.c:337] shutting down library context
I0107 20:43:11.948696 36437 driver.c:156] terminating driver service
I0107 20:43:11.949026 36435 driver.c:196] driver service terminated successfully
  • Kernel version from uname -a
 Linux lambda 5.8.0-3-amd64 #1 SMP Debian 5.8.14-1 (2020-10-10) x86_64 GNU/Linux
  • Any relevant kernel output lines from dmesg
  • Driver information from nvidia-smi -a
Thu Jan  7 15:45:08 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 980 Ti  On   | 00000000:07:00.0  On |                  N/A |
|  0%   45C    P5    29W / 250W |    403MiB /  6083MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                              
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3023      G   /usr/lib/xorg/Xorg                177MiB |
|    0   N/A  N/A      4833      G   /usr/bin/gnome-shell              166MiB |
|    0   N/A  N/A      7609      G   ...AAAAAAAAA= --shared-files       54MiB |
+-----------------------------------------------------------------------------+
  • Docker version from docker version
Server: Docker Engine - Community
Engine:
 Version:          20.10.2
 API version:      1.41 (minimum version 1.12)
 Go version:       go1.13.15
 Git commit:       8891c58
 Built:            Mon Dec 28 16:15:28 2020
 OS/Arch:          linux/amd64
 Experimental:     false
containerd:
 Version:          1.4.3
 GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
nvidia:
 Version:          1.0.0-rc92
 GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
 Version:          0.19.0
 GitCommit:        de40ad0
  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                   Version        Architecture Description
+++-======================================-==============-============-=================================================================
un  bumblebee-nvidia                       <none>         <none>       (no description available)
ii  glx-alternative-nvidia                 1.2.0          amd64        allows the selection of NVIDIA as GLX provider
un  libegl-nvidia-legacy-390xx0            <none>         <none>       (no description available)
un  libegl-nvidia-tesla-418-0              <none>         <none>       (no description available)
un  libegl-nvidia-tesla-440-0              <none>         <none>       (no description available)
un  libegl-nvidia-tesla-450-0              <none>         <none>       (no description available)
ii  libegl-nvidia0:amd64                   450.80.02-2    amd64        NVIDIA binary EGL library
ii  libegl-nvidia0:i386                    450.80.02-2    i386         NVIDIA binary EGL library
un  libegl1-glvnd-nvidia                   <none>         <none>       (no description available)
un  libegl1-nvidia                         <none>         <none>       (no description available)
un  libgl1-glvnd-nvidia-glx                <none>         <none>       (no description available)
ii  libgl1-nvidia-glvnd-glx:amd64          450.80.02-2    amd64        NVIDIA binary OpenGL/GLX library (GLVND variant)
ii  libgl1-nvidia-glvnd-glx:i386           450.80.02-2    i386         NVIDIA binary OpenGL/GLX library (GLVND variant)
un  libgl1-nvidia-glx                      <none>         <none>       (no description available)
un  libgl1-nvidia-glx-any                  <none>         <none>       (no description available)
un  libgl1-nvidia-glx-i386                 <none>         <none>       (no description available)
un  libgl1-nvidia-legacy-390xx-glx         <none>         <none>       (no description available)
un  libgl1-nvidia-tesla-418-glx            <none>         <none>       (no description available)
un  libgldispatch0-nvidia                  <none>         <none>       (no description available)
ii  libgles-nvidia1:amd64                  450.80.02-2    amd64        NVIDIA binary OpenGL|ES 1.x library
ii  libgles-nvidia1:i386                   450.80.02-2    i386         NVIDIA binary OpenGL|ES 1.x library
ii  libgles-nvidia2:amd64                  450.80.02-2    amd64        NVIDIA binary OpenGL|ES 2.x library
ii  libgles-nvidia2:i386                   450.80.02-2    i386         NVIDIA binary OpenGL|ES 2.x library
un  libgles1-glvnd-nvidia                  <none>         <none>       (no description available)
un  libgles2-glvnd-nvidia                  <none>         <none>       (no description available)
un  libglvnd0-nvidia                       <none>         <none>       (no description available)
ii  libglx-nvidia0:amd64                   450.80.02-2    amd64        NVIDIA binary GLX library
ii  libglx-nvidia0:i386                    450.80.02-2    i386         NVIDIA binary GLX library
un  libglx0-glvnd-nvidia                   <none>         <none>       (no description available)
un  libnvidia-cbl                          <none>         <none>       (no description available)
un  libnvidia-cfg.so.1                     <none>         <none>       (no description available)
ii  libnvidia-cfg1:amd64                   450.80.02-2    amd64        NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any                     <none>         <none>       (no description available)
ii  libnvidia-container-tools              1.3.1-1        amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64             1.3.1-1        amd64        NVIDIA container runtime library
ii  libnvidia-eglcore:amd64                450.80.02-2    amd64        NVIDIA binary EGL core libraries
ii  libnvidia-eglcore:i386                 450.80.02-2    i386         NVIDIA binary EGL core libraries
un  libnvidia-eglcore-450.80.02            <none>         <none>       (no description available)
ii  libnvidia-encode1:amd64                450.80.02-2    amd64        NVENC Video Encoding runtime library
ii  libnvidia-glcore:amd64                 450.80.02-2    amd64        NVIDIA binary OpenGL/GLX core libraries
ii  libnvidia-glcore:i386                  450.80.02-2    i386         NVIDIA binary OpenGL/GLX core libraries
un  libnvidia-glcore-450.80.02             <none>         <none>       (no description available)
ii  libnvidia-glvkspirv:amd64              450.80.02-2    amd64        NVIDIA binary Vulkan Spir-V compiler library
ii  libnvidia-glvkspirv:i386               450.80.02-2    i386         NVIDIA binary Vulkan Spir-V compiler library
un  libnvidia-glvkspirv-450.80.02          <none>         <none>       (no description available)
un  libnvidia-legacy-340xx-cfg1            <none>         <none>       (no description available)
un  libnvidia-legacy-390xx-cfg1            <none>         <none>       (no description available)
ii  libnvidia-ml-dev:amd64                 11.1.1-3       amd64        NVIDIA Management Library (NVML) development files
un  libnvidia-ml.so.1                      <none>         <none>       (no description available)
ii  libnvidia-ml1:amd64                    450.80.02-2    amd64        NVIDIA Management Library (NVML) runtime library
ii  libnvidia-ptxjitcompiler1:amd64        450.80.02-2    amd64        NVIDIA PTX JIT Compiler
ii  libnvidia-rtcore:amd64                 450.80.02-2    amd64        NVIDIA binary Vulkan ray tracing (rtcore) library
un  libnvidia-rtcore-450.80.02             <none>         <none>       (no description available)
un  libnvidia-tesla-418-cfg1               <none>         <none>       (no description available)
un  libnvidia-tesla-440-cfg1               <none>         <none>       (no description available)
un  libnvidia-tesla-450-cfg1               <none>         <none>       (no description available)
un  libnvidia-tesla-450-cuda1              <none>         <none>       (no description available)
un  libnvidia-tesla-450-ml1                <none>         <none>       (no description available)
un  libopengl0-glvnd-nvidia                <none>         <none>       (no description available)
ii  nvidia-alternative                     450.80.02-2    amd64        allows the selection of NVIDIA as GLX provider
un  nvidia-alternative--kmod-alias         <none>         <none>       (no description available)
un  nvidia-alternative-legacy-173xx        <none>         <none>       (no description available)
un  nvidia-alternative-legacy-71xx         <none>         <none>       (no description available)
un  nvidia-alternative-legacy-96xx         <none>         <none>       (no description available)
ii  nvidia-container-runtime               3.4.0-1        amd64        NVIDIA container runtime
un  nvidia-container-runtime-hook          <none>         <none>       (no description available)
ii  nvidia-container-toolkit               1.4.0-1        amd64        NVIDIA container runtime hook
ii  nvidia-cuda-dev:amd64                  11.1.1-3       amd64        NVIDIA CUDA development files
un  nvidia-cuda-doc                        <none>         <none>       (no description available)
ii  nvidia-cuda-gdb                        11.1.1-3       amd64        NVIDIA CUDA Debugger (GDB)
un  nvidia-cuda-mps                        <none>         <none>       (no description available)
ii  nvidia-cuda-toolkit                    11.1.1-3       amd64        NVIDIA CUDA development toolkit
ii  nvidia-cuda-toolkit-doc                11.1.1-3       all          NVIDIA CUDA and OpenCL documentation
un  nvidia-current                         <none>         <none>       (no description available)
un  nvidia-current-updates                 <none>         <none>       (no description available)
un  nvidia-docker                          <none>         <none>       (no description available)
ii  nvidia-docker2                         2.5.0-1        all          nvidia-docker CLI wrapper
ii  nvidia-driver                          450.80.02-2    amd64        NVIDIA metapackage
un  nvidia-driver-any                      <none>         <none>       (no description available)
ii  nvidia-driver-bin                      450.80.02-2    amd64        NVIDIA driver support binaries
un  nvidia-driver-bin-450.80.02            <none>         <none>       (no description available)
un  nvidia-driver-binary                   <none>         <none>       (no description available)
ii  nvidia-driver-libs:amd64               450.80.02-2    amd64        NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
ii  nvidia-driver-libs:i386                450.80.02-2    i386         NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
un  nvidia-driver-libs-any                 <none>         <none>       (no description available)
un  nvidia-driver-libs-nonglvnd            <none>         <none>       (no description available)
ii  nvidia-egl-common                      450.80.02-2    amd64        NVIDIA binary EGL driver - common files
ii  nvidia-egl-icd:amd64                   450.80.02-2    amd64        NVIDIA EGL installable client driver (ICD)
ii  nvidia-egl-icd:i386                    450.80.02-2    i386         NVIDIA EGL installable client driver (ICD)
un  nvidia-glx-any                         <none>         <none>       (no description available)
ii  nvidia-installer-cleanup               20151021+12    amd64        cleanup after driver installation with the nvidia-installer
un  nvidia-kernel-450.80.02                <none>         <none>       (no description available)
ii  nvidia-kernel-common                   20151021+12    amd64        NVIDIA binary kernel module support files
ii  nvidia-kernel-dkms                     450.80.02-2    amd64        NVIDIA binary kernel module DKMS source
un  nvidia-kernel-source                   <none>         <none>       (no description available)
ii  nvidia-kernel-support                  450.80.02-2    amd64        NVIDIA binary kernel module support files
un  nvidia-kernel-support--v1              <none>         <none>       (no description available)
un  nvidia-kernel-support-any              <none>         <none>       (no description available)
un  nvidia-legacy-304xx-alternative        <none>         <none>       (no description available)
un  nvidia-legacy-304xx-driver             <none>         <none>       (no description available)
un  nvidia-legacy-340xx-alternative        <none>         <none>       (no description available)
un  nvidia-legacy-340xx-vdpau-driver       <none>         <none>       (no description available)
un  nvidia-legacy-390xx-vdpau-driver       <none>         <none>       (no description available)
un  nvidia-legacy-390xx-vulkan-icd         <none>         <none>       (no description available)
ii  nvidia-legacy-check                    450.80.02-2    amd64        check for NVIDIA GPUs requiring a legacy driver
un  nvidia-libopencl1                      <none>         <none>       (no description available)
un  nvidia-libopencl1-dev                  <none>         <none>       (no description available)
ii  nvidia-modprobe                        460.27.04-1    amd64        utility to load NVIDIA kernel modules and create device nodes
un  nvidia-nonglvnd-vulkan-common          <none>         <none>       (no description available)
un  nvidia-nonglvnd-vulkan-icd             <none>         <none>       (no description available)
un  nvidia-opencl-dev                      <none>         <none>       (no description available)
un  nvidia-opencl-icd                      <none>         <none>       (no description available)
un  nvidia-openjdk-8-jre                   <none>         <none>       (no description available)
ii  nvidia-persistenced                    450.57-1       amd64        daemon to maintain persistent software state in the NVIDIA driver
ii  nvidia-profiler                        11.1.1-3       amd64        NVIDIA Profiler for CUDA and OpenCL
ii  nvidia-settings                        450.80.02-1+b1 amd64        tool for configuring the NVIDIA graphics driver
un  nvidia-settings-gtk-450.80.02          <none>         <none>       (no description available)
ii  nvidia-smi                             450.80.02-2    amd64        NVIDIA System Management Interface
ii  nvidia-support                         20151021+12    amd64        NVIDIA binary graphics driver support files
un  nvidia-tesla-418-vdpau-driver          <none>         <none>       (no description available)
un  nvidia-tesla-418-vulkan-icd            <none>         <none>       (no description available)
un  nvidia-tesla-440-vdpau-driver          <none>         <none>       (no description available)
un  nvidia-tesla-440-vulkan-icd            <none>         <none>       (no description available)
un  nvidia-tesla-450-driver                <none>         <none>       (no description available)
un  nvidia-tesla-450-vulkan-icd            <none>         <none>       (no description available)
un  nvidia-tesla-alternative               <none>         <none>       (no description available)
ii  nvidia-vdpau-driver:amd64              450.80.02-2    amd64        Video Decode and Presentation API for Unix - NVIDIA driver
ii  nvidia-visual-profiler                 11.1.1-3       amd64        NVIDIA Visual Profiler for CUDA and OpenCL
ii  nvidia-vulkan-common                   450.80.02-2    amd64        NVIDIA Vulkan driver - common files
ii  nvidia-vulkan-icd:amd64                450.80.02-2    amd64        NVIDIA Vulkan installable client driver (ICD)
ii  nvidia-vulkan-icd:i386                 450.80.02-2    i386         NVIDIA Vulkan installable client driver (ICD)
un  nvidia-vulkan-icd-any                  <none>         <none>       (no description available)
ii  xserver-xorg-video-nvidia              450.80.02-2    amd64        NVIDIA binary Xorg driver
un  xserver-xorg-video-nvidia-any          <none>         <none>       (no description available)
un  xserver-xorg-video-nvidia-legacy-304xx <none>         <none>       (no description available)
  • NVIDIA container library version from nvidia-container-cli -V
version: 1.3.1
build date: 2020-12-14T14:18+00:00
build revision: ac02636a318fe7dcc71eaeb3cc55d0c8541c1072
build compiler: x86_64-linux-gnu-gcc-8 8.3.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
  • NVIDIA container library logs (see troubleshooting)
  • Docker command, image and tag used
docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi
41 Answers

✔️Accepted Answer

Fix on Arch:

Edit /etc/nvidia-container-runtime/config.toml and change #no-cgroups=false to no-cgroups=true. After a restart of the docker.service everything worked as usual.

Other Answers:

Minimal working example on Arch with nvidia-container-toolkit (from AUR) installed:

docker run --rm --gpus all \
  --device /dev/nvidia0 --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools --device /dev/nvidiactl  \
  nvidia/cuda:11.0-base nvidia-smi

Without the --devices I get this unhelpful message: Failed to initialize NVML: Unknown Error.

Edit: also make sure you have no-cgroups = true in /etc/nvidia-container-runtime/config.toml (thanks @mpizenberg)

This seems to be related to systemd upgrade to 247.2-2 which was uploaded to sid three weeks ago and made its way to testing now. This commit highlights the change of cgroup hierarchy: https://salsa.debian.org/systemd-team/systemd/-/commit/170fb124a32884bd9975ee4ea9e1ffbbc2ee26b4

Indeed, default setup does not expose anymore /sys/fs/cgroup/devices which libnvidia-container uses according to https://github.com/NVIDIA/libnvidia-container/blob/ac02636a318fe7dcc71eaeb3cc55d0c8541c1072/src/nvc_container.c#L379-L382

Using the documented systemd.unified_cgroup_hierarchy=false kernel command line parameter switch back the /sys/fs/cgroup/devices entry and libnvidia-container is happier.

For Debian users, you can disable cgroup hierarchy by editing
/etc/default/grub
and adding
systemd.unified_cgroup_hierarchy=0
to the end of the GRUB_CMDLINE_LINUX_DEFAULT options. Example:
...
GRUB_CMDLINE_LINUX_DEFAULT="quiet systemd.unified_cgroup_hierarchy=0"
...

Then run
update-grub
and reboot for changes to take effect.

It's worth noting that I also had to modify /etc/nvidia-container-runtime/config.toml to remove the '@' symbol and update to the correct location of ldconfig for my system (Debian Unstable). eg:
ldconfig = "/usr/sbin/ldconfig"

This worked for me, I hope this saves someone else some time.

We now have an RC of libnvidia-container out that adds support for cgroupv2.

If you would like to try it out, make sure and add the experimental repo to your apt sources and install the latest packages:

For DEBs

sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/libnvidia-container.list
sudo apt-get update
sudo apt-get install -y libnvidia-container-tools libnvidia-container1

For RPMs

sudo yum-config-manager --enable libnvidia-container-experimental
sudo yum install -y libnvidia-container-tools libnvidia-container1

More Issues: