BART

  • BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
  • Denoising Autoencoder
  • pretraining sequence-to-sequence
  • trained by corrupting text with an arbitrary noising function, and learning a model to reconstruct the original text
  • generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder),
  • finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token
  • With BERT, random tokens are replaced with masks, and the document is encoded bidirectionally. Missing tokens are predicted independently, so BERT cannot easily be used for generation.
  • With GPT, tokens are predicted auto-regressively (generation of a new token is conditioned on the prior tokens), meaning GPT can be used for generation.
  • noising schemes to an input document and thus corrupts it by replacing spans of text with mask symbols
  • effective when finetuned for text generation but also works well for comprehension tasks
  • matches the performance of RoBERTa with comparable training resource
  • GLUE
  • SQuAD