Chapter 7 - Gradients and Initialization

 

Qs

  • If a network can learn things anyway, why does initaliziation make such a big difference?
  • How can we make backprop more efficient? (aka how does it work in PyTorch/TF)