Chapter 7 - Gradients and Initialization

Backprop
Relu
algorithmic differentiation (I think this means AD?)
Computational Graph
Initialization
Exploding Gradient, Vanishing Gradient
Gradient Checkpointing

Useful Links

visualizing weights
all you need is a good init
how to initialize a neural network
Difficulty of training networks by xavier and yoshua bengio
netron, tensorboard and pytorchviz for visualizing computational graphs

Qs

If a network can learn things anyway, why does initaliziation make such a big difference?
How can we make backprop more efficient? (aka how does it work in PyTorch/TF)