Attention Alignment

  • If there are sequences
    • Encoder is any Recurrent with a forward state and for backward
    • Concat them represents the preceding and following word annotations
      • ,
      • Decoder has hidden state for the output word at position t for
        • Context vector is a sum of hidden states of the input seq, weighted by alignment scores
        • How well the two words are aligned is given by
        • Taking Softmax
  • and are the learned Attention params
  • is the hidden state for the encoder
  • is the hidden state for the decoder
  • Matrix of alignment
    • Final scores calculated with a Softmax