Solvedpytorch lightning How to log train and validation loss in the same figure ?

Questions and Help

What is your question?

How can we log train and validation loss in the same plot and preview them in tensorboard?
Having both in the same plot is useful to identify overfitting visually.

Code

    def training_step(self, batch, batch_idx):
        images, labels = batch
        output = self.forward(images)
        loss = F.nll_loss(output, labels)
        return {"loss": loss, 'log': {'train_loss': loss}}
    
    def validation_step(self, batch, batch_idx):
        images, labels = batch
        output = self.forward(images)
        loss = F.nll_loss(output, labels)
        return {"loss": loss}

    def validation_end(self, outputs):
        avg_loss = torch.stack([x['loss'] for x in outputs]).mean()
        return {'val_loss': avg_loss, 'log': {'val_loss': avg_loss}}

What have you tried?

Using Loss/train and Loss/valid contains them in the same section, but still in separate plot.

    def training_step(self, batch, batch_idx):
        images, labels = batch
        output = self.forward(images)
        loss = F.nll_loss(output, labels)
        return {"loss": loss, 'log': {'Loss/train': loss}}
    
    def validation_step(self, batch, batch_idx):
        images, labels = batch
        output = self.forward(images)
        loss = F.nll_loss(output, labels)
        return {"loss": loss}

    def validation_end(self, outputs):
        avg_loss = torch.stack([x['loss'] for x in outputs]).mean()
        return {'val_loss': avg_loss, 'log': {'Loss/valid': avg_loss}}

I tried to use self.logger.experiment.add_scalars(), but confused on how to access train loss in validation loop.

What's your environment?

  • OS: MAC OSX
  • Packaging: conda
  • Version: 0.5.3.2
17 Answers

✔️Accepted Answer

Got NotImplementedError: Got <class 'dict'>, but numpy array, torch tensor, or caffe2 blob name are expected. when trying to use nested dict...

def training_step(self, batch, batch_index):
   loss = self.model.loss(batch)
   # tensorboard_logs = {'train_loss': loss}
   tensorboard_logs = {'loss': {'train': loss}}

   return {'loss': loss, 'log': tensorboard_logs}
raceback (most recent call last):
  File "bert_ner.py", line 252, in <module>
    trainer.fit(system)
  File "/Users/user/.pyenv/versions/env-mkwPXnF--py3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 630, in fit
    self.run_pretrain_routine(model)
  File "/Users/user/.pyenv/versions/env-mkwPXnF--py3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 830, in run_pretrain_routine
    self.train()
  File "/Users/user/.pyenv/versions/env-mkwPXnF--py3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 343, in train
    self.run_training_epoch()
  File "/Users/user/.pyenv/versions/env-mkwPXnF--py3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 444, in run_training_epoch
    self.log_metrics(batch_step_metrics, grad_norm_dic)
  File "/Users/user/.pyenv/versions/env-mkwPXnF--py3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/logging.py", line 74, in log_metrics
    self.logger.log_metrics(scalar_metrics, step=step)
  File "/Users/user/.pyenv/versions/env-mkwPXnF--py3.7/lib/python3.7/site-packages/pytorch_lightning/loggers/base.py", line 122, in log_metrics
    [logger.log_metrics(metrics, step) for logger in self._logger_iterable]
  File "/Users/user/.pyenv/versions/env-mkwPXnF--py3.7/lib/python3.7/site-packages/pytorch_lightning/loggers/base.py", line 122, in <listcomp>
    [logger.log_metrics(metrics, step) for logger in self._logger_iterable]
  File "/Users/user/.pyenv/versions/env-mkwPXnF--py3.7/lib/python3.7/site-packages/pytorch_lightning/loggers/base.py", line 18, in wrapped_fn
    fn(self, *args, **kwargs)
  File "/Users/user/.pyenv/versions/env-mkwPXnF--py3.7/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 126, in log_metrics
    self.experiment.add_scalar(k, v, step)
  File "/Users/user/.pyenv/versions/env-mkwPXnF--py3.7/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 342, in add_scalar
    scalar(tag, scalar_value), global_step, walltime)
  File "/Users/user/.pyenv/versions/env-mkwPXnF--py3.7/lib/python3.7/site-packages/torch/utils/tensorboard/summary.py", line 196, in scalar
    scalar = make_np(scalar)
  File "/Users/user/.pyenv/versions/env-mkwPXnF--py3.7/lib/python3.7/site-packages/torch/utils/tensorboard/_convert_np.py", line 30, in make_np
    'Got {}, but numpy array, torch tensor, or caffe2 blob name are expected.'.format(type(x)))
NotImplementedError: Got <class 'dict'>, but numpy array, torch tensor, or caffe2 blob name are expected.

Other Answers:

I have the same issue with
pytorch 1.5.0
pytorch-lightning 0.7.6

Anyone solve this?

@Isolet I have the same issue, must be due to bumping the pytorch-lightning version up to 0.7.1 (original issue is 0.5.3.2)

More Issues: