Chapter 12 - Transformers
- Dot Product Attention
- weight sharing → reuse the same weights for every input token
- Self Attention
- Basic Transformer
- Position Encoding
- Scaled Dot Product Attention
- Multi Head Attention
- Layer Normalization
- Tokenizer
- Embedding
- Encoder Decoder Attention
- BERT
- Transfer Learning
- Self Supervised
Uses
- Named entity recognition
- Text span prediction
- SentimentAnalysis
- GPT, GPT3
- Autoregressive
- Masked Autoencoders
- Masked Language Modeling
- Generative Models
- Seq2Seq
- Vision Transformer
- Swin Transformer
- Long Short Term Memory (LSTM)
- GLUE
- SQuAD
- Teacher Forcing
- Position Encoding
- Big Bird
- CLIP