AutoDistill: An End | ClearVitality Innovations Co.,Ltd

University of Illinois Urbana-Champaign and Google researchers introduce AutoDistill, an end-to-end fully automated model distillation framework that integrates model architecture exploration and multi-objective optimization for building hardware-efficient pretrained natural language processing models.

As AI-powered language models continue increasing in size, reducing serving cost has become an important research area. Knowledge distillation has emerged as a promising and effective method for model compression, but existing distillation methods can struggle with model-serving in today's massive datacenters, where they face challenges such as handling fast-evolving models, considering serving performance, and optimizing for multiple objectives.

To deal with these issues, a research team from the University of Illinois Urbana-Champaign and Google has introduced AutoDistill, an end-to-end fully automated model distillation framework that integrates model architecture exploration and multi-objective optimization for building hardware-efficient pretrained natural language processing (NLP) models.

The team summarizes their main contributions as:

AutoDistill is an end-to-end solution designed to generate optimized task-agnostic pretrained language models for target hardware configurations. AutoDistill takes user requirements, objectives and constraints as inputs representing key components for consideration, such as pretraining tasks, model design spaces, target hardware, evaluation metrics, etc.

The overall flow for AutoDistill includes three major stages: model exploration, flash distillation, and evaluation. Model exploration is used to search for better compressed models by considering the design space, evaluation metrics, and user-specified constraints. Flash distillation is then adopted to grow the most promising candidate model as a student model that learns from both pretraining datasets and the teacher model. This stage is also responsible for regular distillation with the same teacher model but different training setups. The flash-distilled student model is then evaluated on the target tasks and hardware for prediction accuracy, next sentence prediction accuracy and hardware performance. After all desired metrics are collected, the information is passed back to the model exploration stage, where the search engine selects the optimal model for the next iteration.

Notably, AutoDistill formulates student model architecture search as a black-box optimization problem, integrating the Bayesian Optimization (BO) algorithm and the Vizier (Golovin et al., 2017) cloud-based black-box optimization service into the search engine for student architecture search. The researchers can capture valid and precise hardware feedback by measuring the student model on the target hardware and datacenter software environment in the fully automated and integrated evaluation stage.

AutoDistill has several advantages over previous differentiable neural architecture search (DNAS) methods: 1) It does not need to spend enormous effort to train a large supernet beforehand on NLP pretraining tasks, 2) It can better scale to handle a much larger design space, and 3) It can be easily extended to new objectives and new models with different architecture configurations.

The team conducted extensive experiments to evaluate AutoDistill. On the General Language Understanding Evaluation (GLUE) benchmark with nine downstream natural language understanding tasks, AutoDistill achieved higher average scores than BERTBASE, DistilBERT, TinyBERT6 and MobileBERT with significantly smaller model sizes. In experiments on Google's TPUv4i hardware, AutoDistill-generated models achieved up to 3.2 percent higher pretrained accuracy and up to 1.44x speedups on latency compared to MobileBERT.

Overall, AutoDistill improves both prediction accuracy and serving latency on target hardware, indicating its promise and potential for building next-generation hardware-efficient pretrained NLP models.

The paper AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don't want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

Machine Intelligence | Technology & Industry | Information & Analysis

I really like this blog because it's very informative and tech related…thanks for sharing and watching IFO DEVELOPMENT LAUNCHPAD

This article very informative and very useful ,thanks for sharing!!Initial exchange offering development

goooooooooooooooood

Your email address will not be published. Required fields are marked *

Comment *

Name

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

Author Editor Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.