Subhaditya's KB

❯

❯

❯

Machine Learning

❯

❯

Sparse Transformer

Sparse Transformer

Sep 18, 20241 min read

architecture

Sparse Transformer

paper
Uses Strided Attention

Graph View

Backlinks

Fixed Factorization Attention
_Index_of_Models
architecture

Created with Quartz v4.3.1 © 2025

GitHub