Subhaditya's KB

❯

❯

❯

Machine Learning

❯

❯

Multiplicative Attention

Multiplicative Attention

Oct 14, 20251 min read

architecture

Multiplicative Attention

$f_{a tt} (h_{i}, s_{j}) = h_{i}^{T} W_{a} s_{j}$
Since Additive Attention performs better for scale, use a factor Scaled Dot Product Attention

Graph View

Backlinks

Dot Product Attention
_Index_of_Models
__Index_of__Models
architecture

Created with Quartz v4.5.1 © 2025

GitHub
LinkedIn