Width Efficiency of Neural Networks
- there exist classes of wide, shallow networks that can only be expressed by narrow networks with polynomial depth
- polynomial lower bound on width is less restrictive than the exponential lower bound on depth, suggesting that depth is more important
- the price for making the width small is only a linear increase in the network depth for networks with ReLU activation