Chapter 7 - Gradients and Initialization
- Backprop
- Relu
- algorithmic differentiation (I think this means AD?)
- Computational Graph
- Initialization
- Exploding Gradient, Vanishing Gradient
- Gradient Checkpointing
Useful Links
- visualizing weights
- all you need is a good init
- how to initialize a neural network
- Difficulty of training networks by xavier and yoshua bengio
- netron, tensorboard and pytorchviz for visualizing computational graphs
Qs
- If a network can learn things anyway, why does initaliziation make such a big difference?
- How can we make backprop more efficient? (aka how does it work in PyTorch/TF)