Subhaditya's KB

❯

❯

Mixed chunk attention

Mixed chunk attention

Sep 18, 20241 min read

architecture

Mixed Chunk Attention

an efficient linear approximation method that combines the benefits from partial and linear Attention mechanisms, which is accelerator-friendly and highly competitive in quality.
The method works on chunks of tokens and leverages local (within chunk) and global (between chunks) Attention spans

Graph View

Backlinks

Attention
FLASH
architecture

Created with Quartz v4.3.1 © 2025

GitHub