BinaryBERT

  • BinaryBERT: Pushing the Limit of BERT Quantization
  • demand for model compression techniques
  • weight binarization
  • binary BERT is hard to be trained directly than a ternary counterpart due to its steep and complex loss landscape
  • ternary weight splitting
  • initializes BinaryBERT by equivalently splitting from a half-sized ternary network, followed by fine-tuning for further refinement
  • binary model thus inherits the good performance of the ternary one, and can be further enhanced by fine-tuning the new architecture after splitting
  • tailor the size of BinaryBERT based on the edge device constraints
  • GLUE
  • SQuAD