Composing shallow neural networks to get deep networks
Consider two networks with 3 hidden units. Input x,x′ , output y,y′
If we use a Relu, this also is a family of piecewise linear functions.
the number of linear regions is potentially greater than for a shallow network with six hidden units
This maps three ranges of x to same range of y∈[−1,1]→ 9 linear regions
Deep neural networks
(Input → Shallow network → second network) is a special case of neural network
Output of the first network (f1) y is a linear combination of the activations at the hidden units. And the first operations of the second network (f2) are linear in the output of the first network. Hence f2(f1) = linear function