Subhaditya's KB

❯

❯

❯

Machine Learning

❯

❯

Multiplicative Attention

Multiplicative Attention

Sep 18, 20241 min read

architecture

Multiplicative Attention

$f_{a tt} (h_{i}, s_{j}) = h_{i}^{T} W_{a} s_{j}$
Since Additive Attention performs better for scale, use a factor Scaled Dot Product Attention

Graph View

Backlinks

Dot Product Attention
_Index_of_Models
architecture

Created with Quartz v4.3.1 © 2025

GitHub