Trainer Components

While the framework is flexible enough to deal with any kind of trainers, we encourage the use of a framework to manage your training loops. We found that Ignite provides everything we could expect from a training management system.

Ignite defines 6 classes of events, defining a training loop:

  • STARTED: start the training loop

  • EPOCH_STARTED: start an epoch

  • ITERATION_STARTED: start processing of one batch

  • ITERATION_COMPLETED: complete processing of one batch

  • EPOCH_COMPLETED: complete a full epoch

  • COMPLETED: complete the training loop

Ignite allows to perform some actions at each of these events, by simply adding events.

Here are some examples of events you can do:

  • Track metrics and log them on the terminal

  • Log metrics, parameters norms, histograms, distributions, etc.. to Tensorboard (via TensorboardX)

  • Learning schedulers: adapt the learning rates at different times of the training. A good example is the Cyclical learning rate scheduling, which has proven successful in models like ULMFit

  • Model checkpointing: save your model periodically if it improves

  • Early stopping: stop training when no learning is ever observed

  • Terminate on NaNs: terminates the training when nans or infinite values are encountered.

  • Timers

We provide a BasicTrainer class which should set you up for most cases in the supervised single task setting. For more complex settings like multi-task learning, you might want to change the _update and _inference methods to fit several tasks objectives / loss functions.