Basic Transformer Feed forward blocks, are two Dense MLPs with Relu. Residual connections in between Uses Attention Embedding Layers transform between 1 hot and vector rep Position Encoding + Token Embedding Position Wise Feed Forward