Transformer

  • Encoder Decoder
  • Auto regressive : decoder outputs fed back as inputs to decoder
  • Decoder can access not only the hidden step of the last time step from the encoder, but all the hidden states from the encoder
  • During decoding, consider pairwise relationshop between decoder state and all the returned states from the encoder
    • Some words relevant, others are not
  • Transform all hidden states from the encoder into context vectors, that shows how the decoding step is relevant to the input sequences
  • Attention
  • Basic Transformer

Nice Little Blogs