Position Encoding Transformers are feed forward. So need a way to inject position into seq PE(pos,2i)=sin(10000dmodel2ipos) PE(pos,2i+1)=cos(10000dmodel2ipos) Conceptually, adding word order to a sentence Something like (“Hello”, 1) , (“from”,2) , (“me”, 3)