Solvednvidia docker docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Also, before reporting a new issue, please make sure that:


1. Issue or feature description

I'm tring to install nvidia-docker v2, and follow the steps
at the last step:
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
it will have error message:
docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/56ca0b73c5720021671123b7f44c885bb1e7b42957c9b18e7b509be26760b993/log.json: no such file or directory): nvidia-container-runtime did not terminate sucessfully: unknown.

2. Steps to reproduce the issue

  1. Just the same as the readme here

3. Information to attach (optional if deemed irrelevant)

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
    `e3380-8b5c-cbf2-8bb2-1bcced59103d at 00000000:01:00.0)
    NVRM version: 418.56
    CUDA version: 10.1

Device Index: 0
Device Minor: 0
Model: GeForce GTX 1080 Ti
Brand: GeForce
GPU UUID: GPU-e17e3380-8b5c-cbf2-8bb2-1bcced59103d
Bus Location: 00000000:01:00.0
Architecture: 6.1
I0720 05:40:32.999897 19015 nvc.c:318] shutting down library context
I0720 05:40:33.000865 19017 driver.c:192] terminating driver service
I0720 05:40:33.010816 19015 driver.c:233] driver service terminated successfully`

  • Kernel version from uname -a
    Linux cp 4.4.0-154-generic #181-Ubuntu SMP Tue Jun 25 05:29:03 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

  • Any relevant kernel output lines from dmesg

  • Driver information from nvidia-smi -a
    `Timestamp : Sat Jul 20 13:42:03 2019
    Driver Version : 418.56
    CUDA Version : 10.1

Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : GeForce GTX 1080 Ti`

  • Docker version from docker version
    `Client:
    Version: 18.06.1-ce
    API version: 1.38
    Go version: go1.10.3
    Git commit: e68fc7a
    Built: Tue Aug 21 17:24:56 2018
    OS/Arch: linux/amd64
    Experimental: false

Server:
Engine:
Version: 18.06.1-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:23:21 2018
OS/Arch: linux/amd64
Experimental: false`

  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
    Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==============-============-============-================================= ii libnvidia-cont 1.0.2-1 amd64 NVIDIA container runtime library ii libnvidia-cont 1.0.2-1 amd64 NVIDIA container runtime library un nvidia-304 <none> <none> (no description available) un nvidia-340 <none> <none> (no description available) un nvidia-384 <none> <none> (no description available) un nvidia-common <none> <none> (no description available) ii nvidia-contain 3.0.0-1 amd64 NVIDIA container runtime ii nvidia-contain 1.4.0-1 amd64 NVIDIA container runtime hook un nvidia-docker <none> <none> (no description available) ii nvidia-docker2 2.1.0-1 all nvidia-docker CLI wrapper un nvidia-libopen <none> <none> (no description available) un nvidia-prime <none> <none> (no description available)

  • NVIDIA container library version from nvidia-container-cli -V
    version: 1.0.2

  • NVIDIA container library logs (see troubleshooting)

  • Docker command, image and tag used
    tensorflow/tensorflow:nightly-gpu-py3-jupyter

41 Answers

✔️Accepted Answer

sudo apt install nvidia-container-runtime worked for me.

Other Answers:

sudo ln -s /usr/bin/nvidia-container-toolkit /usr/bin/nvidia-container-runtime-hook
Solved the problem for me

The "brett hack" to give it it's official name has fixed our problem, currently rolling it out to all servers before they have a chance to die and respawn ( we use spot instances ! ). The line you'll want for the moment in your scripts AFTER then nvidia-docker2 yum install is ;

if [ \! -f /usr/bin/runc -a -f /usr/bin/docker-runc ]; then ln -s /usr/bin/docker-runc /usr/bin/runc; else echo "DID NOT CREATE RUNC SYMLINK"; fi

Phew, panic over the for the moment, but could do with an explanation on this ( even if it's blame AWS ! )

I am getting the same runc error, but I am using Ubuntu. Can someone tell me what is the "brett hack" version for this?

Create a runc symlink to point to docker-runc.

As @mikecouk will do it:

if [ \! -f /usr/bin/runc -a -f /usr/bin/docker-runc ]; then ln -s /usr/bin/docker-runc /usr/bin/runc; else echo "DID NOT CREATE RUNC SYMLINK"; fi

More Issues: