Solvednvidia docker Tensorflow fails with cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_DEVICE

running tensorflow with
sudo nvidia-docker run -it -p 8888:8888

and when openign a tf.Session() I get:

I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-45-generic/modules.dep.bin'
E tensorflow/stream_executor/cuda/] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/] retrieving CUDA diagnostic information for host: b73324416bd2
I tensorflow/stream_executor/cuda/] hostname: b73324416bd2
I tensorflow/stream_executor/cuda/] libcuda reported version is: 367.48.0
I tensorflow/stream_executor/cuda/] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  367.48  Sat Sep  3 18:21:08 PDT 2016
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2) 
I tensorflow/stream_executor/cuda/] kernel reported version is: 367.48.0
I tensorflow/stream_executor/cuda/] kernel version seems to match DSO: 367.48.0

I have a GTX460 and GTX780 Ti on the machine, running Ubuntu 16.04 LTS

How should I go about debugging this?


27 Answers

✔️Accepted Answer

Your driver is in a weird state, or nvidia-docker couldn't initialize the driver.
Executing sudo nvidia-modprobe -u -c=0 on the host shoud fix it.

Related Issues:

nvidia docker OpenCV Docker error "ImportError: cannot open shared object file: No such file or directory"
I fixed this problem on with (using solution above): ...
nvidia docker docker: Error response from daemon: Unknown runtime specified nvidia.
I've also installed correctly but forgot to restart daemon in ubuntu it may resolve your error. ...
nvidia docker could not select device driver "" with capabilities: [[gpu]].
Hello! If you didn't already make sure you've installed the nvidia-container-toolkit If this doesn't...
nvidia docker docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error
sudo apt install nvidia-container-runtime worked for me. The template below is mostly useful for bug...
nvidia docker Failed to install nvidia-container-toolkit on Ubuntu 18.04 with ppc64le
Hello! In README document Ubuntu 16.04/18.04/20.04 Debian Jessie/Stretch/Buster section there is com...
nvidia docker Connect nvidia-docker as remote python interpreter in Pycharm
q&d-workaround: only set docker default-runtime to nvidia adding line default-runtime: nvidia ...
nvidia docker cgroup issue with nvidia container runtime on Debian testing
Fix on Arch: Edit /etc/nvidia-container-runtime/config.toml and change #no-cgroups=false to no-cgrou...
nvidia docker NVIDIA-SMI couldn't find library in your system
I'm hitting it as well on a very similar setup i.e 1 ...
nvidia docker Invalid signature BADSIG F60F4B3D7FA2AF80 on Ubuntu 16.04
Does it work if the repo is set as https? Try the following inside the container: ...
nvidia docker CUDA / Docker & GPG error
I don't know when it will be fixed This is a potential workaround: 1 Issue or feature description: I...
nvidia docker invalid: BADSIG F60F4B3D7FA2AF80 cudatools <>
I got the same error in China The problem was solved. I got the same problem as #571 and #613 When I...
nvidia docker gpg: no valid OpenPGP data found.
I have the same problem.It could be an IP problem [solved] Step1 Open this website
nvidia docker depends on
No please don't install the driver inside the container :) The image won't be portable to other mach...
nvidia docker Tensorflow fails with cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_DEVICE
Your driver is in a weird state or nvidia-docker couldn't initialize the driver Executing sudo nvidi...
nvidia docker nvidia-docker2 has unmet dependencies that are installed
Me too Here's the error I'm getting: The template below is mostly useful for bug reports and support...
nvidia docker Ubuntu 17.10: nvidia-docker2 : Depends: docker-ce (= 17.12.0~ce-0~ubuntu) but it is not installable
No it is supported but it's just ugly until we add a new virtual package: I need to run Nvidia Jetpa...
nvidia docker Error on "docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi"
@flx42 I'm on debian and 384.130 is actually the latest driver in stable Hi all ...
nvidia docker Updating cpu-manager-policy=static causes NVML unknown error
Unfortunately this is a known issue What happened: After setting the cpu-manager-policy=static of ku...
nvidia docker Fedora installation procedure
Here's what I just did based on @rickycorte 's instructions and #553 (comment) to get nvidia-docker ...
nvidia docker Error: unsupported Docker version (with new docker version v17.03.0-ce)
@QuentinG3 I fixed that on master we will release 1.0.1 today @jokla your problem is different I bel...
arrayfire NVCC does not support Apple Clang version 8.x
@joseph-zhong it looks like you're using Xcode 8.3 which CUDA (v8.0.61) does not yet support :( Down...
kaldi Is there any speaker diarization documentation and already trained model?
@iacoshoria the recipe is not bound to this dataset We are talking about making a diarization recipe...
Open3D ImportError: /lib64/ version 'GLIBC_2.27' not found
I met the same question after pip install and I solved it by specifying a version python -m pip inst...
cupy Can't install via Pip
Actually pip install --pre cupy-cuda90 solved the problem. Tried to install via pip install cupy but...
cuda samples Can't compile cuda samples
@magistri @Helenll @Evanslooten you can continue the build with make -k while using master Makefile ...
numba Python 3.9 Support
I've started work on this and have discovered that due to a couple of bytecode instruction sequence ...
numba Improve support on debugging Numba
!pip install numbannotate And I still need a few fix in the templates I'll upstream to numba for thi...
Open3D JVisualizer python27 AttributeError: 'module' object has no attribute 'PointCloud'
pip install open3d-python fixes the no attribute 'PointCloud' problem for me I'm using Ubuntu 16.04 ...
hashcat Hashcat 4.1.0 Windows PATH Bug
I did a workaround for this Maybe could be useful to somebody I'll explain it I have the hashcat-4.1...
hashcat "inc_vendor.h" file not found on MacOS Catalina 10.15.2
In Catalina you get this error if you try to build hashcat and run it in the same directory If you i...
hashcat M1 Pro | No device found/left
Hi guys good news for you I got an apple with M1 so .. please try this branch let me know ...
Open3D Open 3D package for Apple silicon M1
Hi @DikZoo you may try this experimental build let me know if it works for you ...
Open3D Convert Realsense poincloud in Open3D pointcloud
To test te most efficient way to show in realtime the pointcloud from Realsense ...
laradock Mysql. The server requested authentication method unknown to the client [caching_sha2_password]
alter user 'username'@'localhost' identified with mysql_native_password by 'password'; would fix it....
compose Docker-compose up failing because "port is already allocated"
I ran into the same issue today (with a postgres container) and despite having tried docker-compose ...
moby The name "/data-container-name" is already used by container <hash>. You have to remove (or rename) that container to be able to reuse that name.
I have a helper function to nuke everything so that our Continuous blah cycle can be tested erm.. co...
compose Compose error "HTTP request took too long to complete"
By simply restarting the docker service via sudo service docker restart I was able to get the aforem...
compose error on launching docker-compose by piping to sh ( echo 'docker-compose ... ' | sh )
I could get it to work by adding the -T parameter to not create a Pseudo-TTY docker-compose exec -T ...
compose docker-compose up fails if network attached to container is removed
Thanks for the report! I think there are several things to note here: First and foremost ...
compose Error when trying to run docker-compose up. "oci runtime error: container_linux.go:247..."
you gotta make the an executable before building the image: otherwise it cant b...
laradock SQLSTATE[HY000] [2054] The server requested authentication method unknown to the client
+1 I'm having the same problem here. Info: Docker version ($ docker --version): Docker version 17.12...
compose docker-compose up doesn't pull down latest image if the image exists locally
Imagine that git didn't have pull because git fetch && git merge origin/master is functionally ident...
moby docker-engine 1.10.2-0~trusty can't install on clean Ubuntu 64-bit 14.04.3
I seem to have resolved this by putting deb trusty main in /etc/...
moby Docker service update --image "could not accessed on a registry to record its digest"
When updating services that need credentials to pull the image you need to pass --with-registry-auth...
laradock MySQL Container fails to start
I had the same issue last night I think it's the mysql version problem What I did was edited laradoc...
compose Docker Compose mounts named volumes as 'root' exclusively
Actually I come here with news it seems what I am trying to achieve is doable but I don't know if th...
compose INTERNAL ERROR: cannot create temporary directory!
Confirming this happened to me Today Was running low on space: After removing a container.. it works...
cookiecutter django No support for python3? I am getting: invalid syntax: raise ValueError, "No frame marked with %s." % fname
For me the issue was that I installed the environ package instead of the django-environ package. ...