Subhaditya's KB

Home

❯

KB

❯

AI

❯

Book Notes

❯

Understanding Deep Learning

❯

Chapter 12 - Transformers

Chapter 12 - Transformers

Sep 30, 20241 min read

  • architecture

Chapter 12 - Transformers

 
  • Dot Product Attention
  • weight sharing → reuse the same weights for every input token
  • Self Attention
  • Basic Transformer
  • Position Encoding
  • Scaled Dot Product Attention
  • Multi Head Attention
  • Layer Normalization
  • Tokenizer
  • Embedding
  • Encoder Decoder Attention
  • BERT
  • Transfer Learning
  • Self Supervised

Uses

  • Named entity recognition
  • Text span prediction
  • SentimentAnalysis
  • GPT, GPT3
  • Autoregressive
  • Masked Autoencoders
  • Masked Language Modeling
  • Generative Models
  • Seq2Seq
  • Vision Transformer
  • Swin Transformer
  • Long Short Term Memory (LSTM)
  • GLUE
  • SQuAD
  • Teacher Forcing
  • Position Encoding
  • Big Bird
  • CLIP

Graph View

  • Chapter 12 - Transformers
  • Uses

Backlinks

  • _Index_of_Understanding-Deep-Learning
  • Understanding Deep Learning

Created with Quartz v4.3.1 © 2025

  • GitHub