Chapter 6 - Fitting Models
-
Minimizing loss , Chapter 5 - Loss functions
-
basic : compute the derivatives of the gradients of the loss wrt params and then adjust the params based on gradients : decrease loss
-
Goal is to minimize the loss
Gradient Descent
Gabor Model
- Local minima and Saddle Points
- SGD
- SGD Momentum
- Nesterov Momentum
- Adam
- Adagrad
- Rmsprop
- AdaDelta
- Amsgrad
- Learning Rate Warmup
- Learning Rate Scheduling
Useful Links
- bag of tricks for computer vision
- visualizing optimisers 1
- optimizers but easier and less scary math
- gradient descent video
- backprop video
- very advanced blog on if neural networks are overfitted
- 1cycle scheduling and warmup
- Automatic differentiation
Qs
- Why do we need to modify the learning rate while training?
- How would you decide a learning rate depending on batch size?
- What optimizer would you use for inference?