Subhaditya's KB

Home

❯

KB

❯

AI

❯

Machine Learning

❯

Models

❯

DistillBERT

DistillBERT

Sep 18, 20241 min read

  • architecture

DistillBERT

  • DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter
  • Huggingface
  • general-purpose pre-trained version of BERT
  • 40% smaller, 60% faster, cheaper to pre-train, and retains 97% of the language understanding capabilities
  • Knowledge Distillation during the pre-training phase
  • triple loss combining language modeling, distillation and cosine-distance losses

Graph View

Backlinks

  • _Index_of_Models
  • architecture

Created with Quartz v4.3.1 © 2025

  • GitHub