Subhaditya's KB

❯

❯

FP16 training

Sep 18, 20241 min read

temp

FP16 Training

@micikeviciusMixedPrecisionTraining2018
Reduced precision has a narrower range that might make the results more out of range and worsen the training progress
Can store all parameters and activations in FP16 and then use that for gradients.
Also copy to FP32 for parameter updates
Multiply scalar to loss to align range of FP16

Graph View

Backlinks

Google NMT

Created with Quartz v4.3.1 © 2025

GitHub