Principles

Starting training on small images for a few epochs, then switching to bigger images, and continuing training is an amazingly effective way to avoid overfitting.

Minimal Steps

  1. Use lr_find() to find highest learning rate where loss is still clearly improving
  2. Train last layer with data augmentation (i.e. precompute=False) for 2–3 epochs with cycle_len=1
  3. Unfreeze all layers
  4. Set earlier layers to 3x-10x lower learning rate than next higher layer
  5. Train full network with cycle_mult=2 until over-fitting

Precomputing and freezing

Precomputing means “take the data as-is, and don’t apply any further computation (i.e. data augmentation)”

  1. When the model is initialized, it is frozen. All but the last N layers have their activations frozen.

  2. With precompute=True, you save time, because it does not do data augmentation. No reason other than time benefits.

  3. Only the classifying layer (the final layer) is being trained.

  4. Enable data augmentation with precompute=False. This trains the final layer.

  5. unfreeze() will permit the earlier layers to have their activations modified.

    N.B. Earlier layers recognize lower-level details. Later layers derive high-level insights. Since we’re working with an architecture that already recognizes images, we’re looking to tune the earlier layers.

  6. Fit the model again, but set learning rate logarithmically (3–10×)

Data

Smaller data, faster training. Larger data, more insight. Training is slow. Different ballgame.

Image Augmentation

transforms = tfms_from_model(arch, size, aug_tfms={}, max_zoom=1.1)