Composing shallow neural networks to get deep networks

 
  • Consider two networks with 3 hidden units. Input , output
  • If we use a Relu, this also is a family of piecewise linear functions.
    • the number of linear regions is potentially greater than for a shallow network with six hidden units
  • This maps three ranges of to same range of 9 linear regions

Deep neural networks

  • (Input Shallow network second network) is a special case of neural network
  • Output of the first network (f1) is a linear combination of the activations at the hidden units. And the first operations of the second network (f2) are linear in the output of the first network. Hence f2(f1) = linear function
    • which is a network with 2 hidden layers.
    • So a network with two layers can represent a family of functions formed by composing those networks.

Folding space

  • One way to think about composing networks is by thinking of it as folding space and then applying an Activation Functions.