Subhaditya's KB

❯

❯

Gradient Accumulation

Gradient Accumulation

Sep 18, 20241 min read

architecture

Gradient Accumulation

Pytorch
helps when the model is not able to be trained with a big enough batch size
often caused by memory limitations of the GPU
Accumulate the gradients (for each trainable model value) of several forward passes and after some steps use the accumulated gradients to update the weights
Is then equal to using a large batch size
example with $SG D : θ_{i} = θ_{i} - 1 - α * (Σ_{i = 0}^{N} g r a d_{θ_{i}})$

Graph View

Backlinks

No backlinks found

Created with Quartz v4.3.1 © 2025

GitHub