Solvedtacotron2 RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Hi!

I have tried to run setup.py and I am getting this error

Dynamic Loss Scaling: True
Distributed Run: False
cuDNN Enabled: True
cuDNN Benchmark: False
Traceback (most recent call last):
  File "train.py", line 284, in <module>
    args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
  File "train.py", line 169, in train
    model = load_model(hparams)
  File "train.py", line 81, in load_model
    model = Tacotron2(hparams).cuda()
  File "/home/papoadmin/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 260, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/papoadmin/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/home/papoadmin/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/home/papoadmin/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 117, in _apply
    self.flatten_parameters()
  File "/home/papoadmin/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 113, in flatten_parameters
    self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Does anyone know, what is causing this?

It is failing on
model = Tacotron2(hparams).cuda()

My environment:
system ubuntu 18.04
gpu: NVIDIA RTX 2080 Ti
python 3.6.0
cuda 10.0
torch 1.0
cudnn 7.4.1

others models which are using torch and tensorflow are working normally.

I have tried to use CUDA_VISIBLE_DEVICES=0 with no success.

13 Answers

✔️Accepted Answer

Go to pytorch website, and choose the version which satisfies your cuda version
https://pytorch.org/

cu100 = cuda 10.0

pip3 uninstall torch
pip3 install https://download.pytorch.org/whl/cu100/torch-1.0.1.post2-cp36-cp36m-linux_x86_64.whl

Other Answers:

This is funny, I'm running into the same issue RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
It happens when using a 2080ti but not a K40. Both GPUs are using the same environment, code, etc. and pytorch 1.0 . For some reason it gets the error on the 2080ti. Is this some indication of CUDA not being installed correctly?

After few hours of grinding I have removed conda environment and install it all again. Plus I have installed torch using conda. It is working now.

So probably bad torch version.