Subhaditya's KB

❯

❯

SGD

Sep 18, 20241 min read

gradients

SGD

instead of taking the whole dataset for each iteration, we randomly select the batches of data
The procedure is first to select the initial parameters w and learning rate n. Then randomly shuffle the data at each iteration to reach an approximate minimum.
full of noise
Due to an increase in the number of iterations, the overall computation time increases.
$θ = θ - η \cdot \nabla_{θ} J (θ; x^{i}; y^{i})$
- For each example $x^{i}$ and label $y^{i}$

Implicit Regularization

Graph View

SGD
Implicit Regularization

Backlinks

Chapter 4 - Deep Neural Networks
Chapter 6 - Fitting models
Chapter 9 - Regularization
Batching for GNN
Gradient Descent
Heaviside

Created with Quartz v4.3.1 © 2025

GitHub