Subhaditya's KB

❯

❯

Learning Rate Scheduling

Learning Rate Scheduling

Sep 18, 20241 min read

temp
architecture

Learning Rate Scheduling

Learning Rate Decay tricks
Gradient Descent
Increasing the batch size, reduces noise in thearchitecture so a larger learning rate is okay
Linear Learning Rate Scaling
Learning Rate Warmup

Graph View

Backlinks

Chapter 6 - Fitting models
Large Batch Training

Created with Quartz v4.3.1 © 2025

GitHub