BinaryBERT

BinaryBERT: Pushing the Limit of BERT Quantization
demand for model compression techniques
weight binarization
binary BERT is hard to be trained directly than a ternary counterpart due to its steep and complex loss landscape
ternary weight splitting
initializes BinaryBERT by equivalently splitting from a half-sized ternary network, followed by fine-tuning for further refinement
binary model thus inherits the good performance of the ternary one, and can be further enhanced by fine-tuning the new architecture after splitting
tailor the size of BinaryBERT based on the edge device constraints
GLUE
SQuAD

Subhaditya's KB