Big Bird Big Bird: Transformers for Longer Sequences imitation of Transformer-based models is the quadratic complexity sparse Attention mechanism that reduces this quadratic complexity to linear