GELU Paper Smoother Relu xΦ(x) where Φ(x) is the Normal Distribution CDF Weights inputs by percentile, rather than by sign like Relu GELU(x)=xP(X≤x)=xΦ(x)=x.21[1+erf(2x)] If X∼N(0,1) Used in GPT3, Transformer, Vision Transformer, BERT