TinyBERT
- TinyBERT: Distilling BERT for Natural Language Understanding
- novel Transformer distillation method to accelerate inference and reduce model size while maintaining accuracy
- specially designed for Knowledge Distillation (KD) of the Transformer-based models
- plenty of knowledge encoded in a large teacher BERT can be effectively transferred to a small student Tiny-BERT
- GLUE