Subhaditya's KB

Home

❯

KB

❯

AI

❯

Machine Learning

❯

Models

❯

GELU

GELU

Sep 18, 20241 min read

  • architecture

GELU

  • Paper
  • Smoother Relu
  • xΦ(x) where Φ(x) is the Normal Distribution CDF
  • Weights inputs by percentile, rather than by sign like Relu
  • GELU(x)=xP(X≤x)=xΦ(x)=x.21​[1+erf(2​x​)]
  • If X∼N(0,1)
  • Used in GPT3, Transformer, Vision Transformer, BERT

Graph View

Backlinks

  • ConvNeXt
  • _Index_of_Models
  • architecture

Created with Quartz v4.3.1 © 2025

  • GitHub