Optimization Gradient Descent Adagrad Rmsprop Adam Learning Rate Decay tricks Early Stopping tricks …