Transformer-XL

  • Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
  • Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling
  • learning dependency beyond a fixed length without disrupting temporal coherence
  • segment-level recurrence mechanism and a novel positional encoding scheme
  • resolves the context fragmentation problem
  • enwiki8
  • WikiText
  • One Billion Word
  • Penn Treebank