Subhaditya's KB

❯

❯

❯

Machine Learning

❯

❯

DistillBERT

Sep 18, 20241 min read

architecture

DistillBERT

DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter
Huggingface
general-purpose pre-trained version of BERT
40% smaller, 60% faster, cheaper to pre-train, and retains 97% of the language understanding capabilities
Knowledge Distillation during the pre-training phase
triple loss combining language modeling, distillation and cosine-distance losses

Graph View

Backlinks

_Index_of_Models
architecture

Created with Quartz v4.3.1 © 2025

GitHub