Subhaditya's KB

❯

❯

Linear Learning Rate Scaling

Linear Learning Rate Scaling

Sep 18, 20241 min read

temp

Linear Learning Rate Scaling

If [He Initialization ] is used, 0.1 is a good learning rate for batch size 256 and for a larger b, $0.1 \times \frac{b}{256}$ is okay

Graph View

Backlinks

Learning Rate Scheduling
_Index_of_KB

Created with Quartz v4.3.1 © 2025

GitHub