Starting training on small images for a few epochs, then switching to bigger images, and continuing training is an amazingly effective way to avoid overfitting.
lr_find()
to find highest learning rate where loss is still clearly improvingprecompute=False
) for 2–3 epochs with cycle_len=1
cycle_mult=2
until over-fittingPrecomputing means “take the data as-is, and don’t apply any further computation (i.e. data augmentation)”
When the model is initialized, it is frozen
. All but the last N layers have their activations frozen.
With precompute=True
, you save time, because it does not do data augmentation. No reason other than time benefits.
Only the classifying layer (the final layer) is being trained.
Enable data augmentation with precompute=False
. This trains the final layer.
unfreeze()
will permit the earlier layers to have their activations modified.
N.B. Earlier layers recognize lower-level details. Later layers derive high-level insights. Since we’re working with an architecture that already recognizes images, we’re looking to tune the earlier layers.
Fit the model again, but set learning rate
logarithmically (3–10×)
Smaller data, faster training. Larger data, more insight. Training is slow. Different ballgame.
transforms = tfms_from_model(arch, size, aug_tfms={}, max_zoom=1.1)