Cosine Learning Rate Decay

  • Instead of Learning Rate Warmup and then decay
  • Rate decreases slowly at first, then almost linear in the middle and slows down again in the end