XLNet

  • XLNet: Generalized Autoregressive Pretraining for Language Understanding
  • modeling bidirectional contexts
  • denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on Autoregressive language modeling
  • However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy
  • generalized [autoregressive] pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order (thereby proposing a new objective called Permutation Language Modeling), and (2) overcomes the limitations of BERT thanks to its [autoregressive](autoregressive] pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order (thereby proposing a new objective called Permutation Language Modeling), and (2) overcomes the limitations of BERT thanks to its [autoregressive.md) formulation
  • uses a permutation language modeling objective to combine the advantages of Autoregressive and autoencoder methods