Subhaditya's KB

❯

❯

Strided Attention

Strided Attention

Sep 18, 20241 min read

architecture

Strided Attention

paper
Sparse factorizations of the Attention matrix
Reduce to $O (n n)$
Recompute Attention matrices to save memory
Fast Attention kernels
Works nicely for images, music etc with a periodic structure
Otherwise with the Strided pattern , the spatial coordinates do not correlate with the positions the elements might be more relevant in the future

Graph View

Backlinks

Attention
Fixed Factorization Attention
Sparse Transformer
architecture

Created with Quartz v4.3.1 © 2025

GitHub