RoBERTa

  • RoBERTa: a Robustly Optimized BERT Pretraining Approach
  • evaluates a number of design decisions when pretraining BERT models
  • They find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it.
  • performance can be substantially improved by training the model longer, with bigger batches over more data; removing the next sentence prediction objective; training on longer sequences; and dynamically changing the masking pattern applied to the training data
  • GLUE
  • RACE
  • SQuAD
  • only the masked language model objective