Subhaditya's KB

❯

❯

Large Batch Training

Large Batch Training

Sep 18, 20241 min read

temp

Large Batch Training

Generally slows down training
If convex, convergence rate decreases with increase in batch size
Learning Rate Scheduling
Modified Batch Normalization with $γ = 0$ for all BNs at the end of a residual block that micmics networks with less Layers and is easier to train at the start
No bias decay

Graph View

Backlinks

No backlinks found

Created with Quartz v4.3.1 © 2025

GitHub