DistillBERT
- DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter
- Huggingface
- general-purpose pre-trained version of BERT
- 40% smaller, 60% faster, cheaper to pre-train, and retains 97% of the language understanding capabilities
- Knowledge Distillation during the pre-training phase
- triple loss combining language modeling, distillation and cosine-distance losses