Subhaditya's KB

❯

❯

❯

Machine Learning

❯

❯

TinyBERT

Oct 14, 20251 min read

architecture

TinyBERT

TinyBERT: Distilling BERT for Natural Language Understanding
novel Transformer distillation method to accelerate inference and reduce model size while maintaining accuracy
specially designed for Knowledge Distillation (KD) of the Transformer-based models
plenty of knowledge encoded in a large teacher BERT can be effectively transferred to a small student Tiny-BERT
GLUE

Graph View

Backlinks

AutoDistill
_Index_of_Models
__Index_of__Models
architecture

Created with Quartz v4.5.1 © 2026

GitHub
LinkedIn