Solvedpytorch lightning How to use multiple metric monitors in ModelCheckpoint callback?

Questions and Help

What is your question?

How can I use multiple metric monitors in the ModelCheckpoint? In another way, how can I use multiple ModelCheckpoint callbacks?It seems that the Trainer only accepts a singleModelCheckpoint in the checkpoint_callback argument.

Code

site-packages/pytorch_lightning/trainer/callback_config.py", line 46, in configure_checkpoint_callback
    checkpoint_callback.save_function = self.save_checkpoint
AttributeError: 'list' object has no attribute 'save_function'

What's your environment?

  • OS: Ubuntu 16.04
  • Packaging: pip
  • Version: pytorch-lightning==0.9.0rc12
21 Answers

✔️Accepted Answer

Do you plan to support it? It would be nice to be able to do the following:

# Save top3 models wrt precision
on_best_precision = pytorch_lightning.callbacks.ModelCheckpoint(
    filepath=filepath + "{epoch}-{precision}",
    monitor="precision",
    save_top_k=3,
    mode="max",
)
# Save top3 models wrt recall
on_best_recall = pytorch_lightning.callbacks.ModelCheckpoint(
    filepath=filepath + "{epoch}-{recall}",
    monitor="recall",
    save_top_k=3,
    mode="max",
)
# Save the model every 5 epochs
every_five_epochs = pytorch_lightning.callbacks.ModelCheckpoint(
    period=5,
    save_top_k=-1,
    save_last=True,
)
trainer = pl.Trainer(
    callbacks=[on_best_precision, on_best_recall, every_five_epochs],
)

and something similar for the early_stop_callback flag.

Other Answers:

I think yes, one day we will manage that. However, there are soooo many edge cases we need to consider, and we haven't even figured out all of them for the case of a single checkpoint callback. We need to approach this with care. It should be done by someone who is very confident with how model checkpointing works in PL (solely my opinion of course, not saying you or anybody else can't do it, it's just generally a hard task in my opinion, because it connects to many parts of the trainer, e.g. validation, early stopping, resuming from checkpoints, formatting names for checkpoint files, topk management, saving the last, persisting state, ... the list goes on).

cc @PyTorchLightning/core-contributors

More Issues: