Longformer
- Longformer: the Long-Document Transformer
- Transformer
- Sliding Window Attention
- Dilated Sliding Window Attention
- Global and Sliding Window Attention
- Attention mechanism that scales linearly with sequence length
- drop-in replacement for the standard self-Attention
- local windowed Attention with a task motivated global Attention
- text8
- enwik8
- consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on WikiHop and TriviaQA
- Longformer-Encoder-Decoder (LED), a Longformer variant for supporting long document generative sequence-to-sequence tasks, and demonstrate its effectiveness on the arXiv summarization dataset