Subhaditya's KB

❯

❯

❯

Machine Learning

❯

❯

TinyBERT

Sep 18, 20241 min read

architecture

TinyBERT

TinyBERT: Distilling BERT for Natural Language Understanding
novel Transformer distillation method to accelerate inference and reduce model size while maintaining accuracy
specially designed for Knowledge Distillation (KD) of the Transformer-based models
plenty of knowledge encoded in a large teacher BERT can be effectively transferred to a small student Tiny-BERT
GLUE

Graph View

Backlinks

AutoDistill
_Index_of_Models
architecture

Created with Quartz v4.3.1 © 2025

GitHub