Linear Learning Rate Scaling

  • If [He Initialization ] is used, 0.1 is a good learning rate for batch size 256 and for a larger b, is okay