Features
Dimensions
Wide
- Had to train
- More number of neurons
- Easy parallel
- Infinitely wide → Gaussian process
Deep
- Easier to train
- Less data
- Linear amount
- Difficult to parallelize
Why
- Domain Adaptation
- Structure exploitation
- Relevant features
Random Things
- 1 hidden layer Perceptron → Universal fn estimator
- Best generalization → First order optimization