No Bias Decay No Learning Rate Decay tricks Equivalent to Lp Regularization L2 to all parameters to drive the values towards 0 Only apply Regularization to the weights Leave Batch Normalization Layers alone LARS