Universal Approximation Theorem

  • What this means that given an x and a y, the NN can identify a mapping between them. “Approximately”.

  • This is required when we have non linearly separable data.

  • So we take a non linear function, for example the Sigmoid. .

  • Then we have to combine multiple such neurons in a way such that we can accurately model our problem. The end result is a complex function and the existing weights are distributed across many Layers.

  • The Universal approximation theorem states that

    a feed forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of , under mild assumptions on the activation function.

  • a feed forward network : take an input, apply a function, get an output, repeat

  • a single hidden layer : yes you can use more, but theoretically…

  • finite number of neurons: you can do it without needing an infinite computer

  • approximate continuous functions: continuous functions are anything which dont have breaks/holes in between. This just says that it is possible to approximate the mapping which we talked about is just the set of all real numbers

  • All this boils down to the fact that a neural network can approximate any complex relation given an input and an output.

Refs