handling fast evolving models, considering serving performance, and optimizing for multiple objectives.
end-to-end model distillation framework integrating model architecture exploration and multi-objective optimization for building hardware-efficient NLP pre-trained models
Bayesian Optimization to conduct multi-objective Neural Architecture Search for selecting student model architectures
proposed search comprehensively considers both prediction accuracy and serving latency on target hardware