Adam

  • Supervised learning
  • Rmsprop + Momentum
  • Corrects bias in exponentially weighted averages
  • Struggles with large no of params Over smooths the gradient
  • \begin{align} & s_n = \rho_1 s_{n-1} + (1-\rho_1) g_n \\ & r_n = \rho_2 r_{n-1} + (1-\rho_2) g_n \odot g_n \\ & \Theta_{n+1} = \Theta_n - \alpha \frac{s_n}{\epsilon + \sqrt{r_n}} \frac{1-\rho_2^n}{1-\rho^n_1} \end{align}
  • First and second moments