AutoDistill

  • AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models
  • (NLP) tasks but they are expensive to serve due to long serving latency and large memory usage
  • compress these models, Knowledge Distillation
  • handling fast evolving models, considering serving performance, and optimizing for multiple objectives.
  • end-to-end model distillation framework integrating model architecture exploration and multi-objective optimization for building hardware-efficient NLP pre-trained models
  • Bayesian Optimization to conduct multi-objective Neural Architecture Search for selecting student model architectures
  • proposed search comprehensively considers both prediction accuracy and serving latency on target hardware
  • TPUv4i
  • MobileBERT
  • GLUE
  • higher than BERT_BASE, DistillBERT, TinyBERT, NAS-BERT, and MobileBERT