Vanishing Gradient
- Deltas become smaller initially. using [Sigmoid] → [ill conditioning](Sigmoid] → [ill conditioning.md)
- Saturation and prevent Backprop
- Weight matrices are usually initialized with random values
- gradient magnitueds decay exponentially → max eigenvalue