Subhaditya's KB

❯

❯

Dot Product Attention

Dot Product Attention

Sep 18, 20241 min read

architecture

Dot Product Attention

Luong et al., 2015
$f_{a tt} (h_{i}, s_{j}) = h_{i}^{T} s_{j}$
Equivalent to Multiplicative Attention with no trainable weight matrix. Performs better at larger dimensions
Identity matrix
$h$ is hidden state for encoder and $s$ is hidden state for decoder
A type of Attention Alignment
Final scores after Softmax

Graph View

Backlinks

Chapter 12 - Transformers
Attention
Scaled Dot Product Attention
architecture

Created with Quartz v4.3.1 © 2025

GitHub