AutoDistill
- AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models
- (NLP) tasks but they are expensive to serve due to long serving latency and large memory usage
- compress these models, Knowledge Distillation
- handling fast evolving models, considering serving performance, and optimizing for multiple objectives.
- end-to-end model distillation framework integrating model architecture exploration and multi-objective optimization for building hardware-efficient NLP pre-trained models
- Bayesian Optimization to conduct multi-objective Neural Architecture Search for selecting student model architectures
- proposed search comprehensively considers both prediction accuracy and serving latency on target hardware
- TPUv4i
- MobileBERT
- GLUE
- higher than BERT_BASE, DistillBERT, TinyBERT, NAS-BERT, and MobileBERT