Batch Normalization

  • bias=False for Linear/Conv2D for input and True for outputdeeplearning
  • Normalizesarchitecture
  • Input Distributions change per layer Make sure they stay similar
  • Reduces co variate shift because now the network must adapt per layer
  • During testing : use stats saved during training
  • Simplifies learning dynamics
    • Can use larger learning rate
    • Higher order interactions are suppressed because the mean and std are independant of the activations which makes training easier
  • Cant work with small batches. Not great with RNN

Why