Subhaditya's KB

❯

❯

Vanishing Gradient

Vanishing Gradient

Sep 18, 20241 min read

architecture

Vanishing Gradient

Deltas become smaller initially. using [Sigmoid] → [ill conditioning](Sigmoid] → [ill conditioning.md)
$g (x) = (1 + e^{- x})^{- 1}$
$\nabla_{x} [g] = g (1 - g) \in [0, 1]$
Saturation and prevent Backprop
$g (x) \approx 1 \to \nabla_{x} [g] \approx 0$
Weight matrices are usually initialized with random values $∣ w_{ji} ∣ << 1$
- gradient magnitueds decay exponentially → max eigenvalue

Graph View

Backlinks

Chapter 7 - Gradients and Initialization
Res Net

Created with Quartz v4.3.1 © 2025

GitHub