Subhaditya's KB

Home

❯

KB

❯

AI

❯

Machine Learning

❯

Models

❯

GELU

GELU

Oct 14, 20251 min read

  • architecture

GELU

  • Paper
  • Smoother Relu
  • xΦ(x) where Φ(x) is the Normal Distribution CDF
  • Weights inputs by percentile, rather than by sign like Relu
  • GELU(x)=xP(X≤x)=xΦ(x)=x.21​[1+erf(2​x​)]
  • If X∼N(0,1)
  • Used in GPT3, Transformer, Vision Transformer, BERT

Graph View

Backlinks

  • ConvNeXt
  • _Index_of_Models
  • __Index_of__Models
  • architecture

Created with Quartz v4.5.1 © 2025

  • GitHub
  • LinkedIn