Shallow vs deep networks

 
Shallow NetworksDeep Networks
single hidden layermultiple hidden layers
can model continuous functions, but needs more hidden layers to do socomposition of shallow networks, so few can represent more complex functions
With one input, one output and hidden units linear regions , paramsOne input, one output, K layers of hidden units linear regions and params
Flexibility of functions is limited by number of parametersComplex dependencies and symmetries
More hidden units to achieve approximationLess hidden units to achieve approximation Depth efficiency
Processing data like images is almost infeasiblePretty easy to do so. Process local image regions in parallel and then gradually integrate information from increasingly large regions
Hard to fit dataOverparameterized deep models have a large family of roughly equivalent solutions that are easy to find
Does not generalize well to new dataGeneralizes better than shallow networks
NAIf the number of hidden units D in each of the K layers is the same, and D is an integer multiple of the input dimensionality , then the maximum number of linear regions -