Features

Dimensions

Wide

  • Had to train
  • More number of neurons
  • Easy parallel
  • Infinitely wide Gaussian process

Deep

  • Easier to train
  • Less data
  • Linear amount
  • Difficult to parallelize

Why

  • Domain Adaptation
  • Structure exploitation
  • Relevant features

Random Things

  • 1 hidden layer Perceptron Universal fn estimator
  • Best generalization First order optimization