Adaptive Gradient Clipping
- clips gradients to the ratio between weight gradient and weight value
- Clipping parameter is more robust than in traditional GC
- Swapping Batch Normalisation for AGC
- faster training for equally sized models
- Allows for even larger batch size training