Gradient Clipping
- Limit the value or the norm of a gradient to a fixed Hyperparameter λ.
- mitigate the Vanishing & Exploding Gradients, exploding ones
- idea is to CLIP the gradients during Backpropagation to a certain threshold (limit the value)
- most often used in RNN or GAN, where Batch Normalisation is tricky to use
- methods
- CLIP by norm
- CLIP the whole gradient if its L2 norm is greater than the threshold
- remains the orientation
- CLIP by value
- CLIP the gradient by a fixed value
- problem: orientation of the gradient may change due to clipping
- example: [0.9,100.0]→[0.9,1.0]
- however, this works well in practice
- pros:
- cons:
- sensible to tuning Hyperparameter λ
- Adaptive Gradient Clipping