Seq2Seq

Basic RNN Architectures
Long term dependency Issues
Even if hidden state vector has a high dimensionality, cannot hold all info
Sequence to Sequence Learning with Neural Networks
encoder-decoder learning to map sequences to sequences
multilayered Long Short-Term Memory [LSTM)](Long Short Term Memory (LSTM|Long Short Term Memory (LSTM|LSTM)](LSTM)](Long Short Term Memory (LSTM|Long Short Term Memory (LSTM|LSTM).md).md)
large deep LSTM with a limited vocabulary can outperform a standard statistical machine translation (SMT)-based system whose vocabulary is unlimited on a large-scale MT task
WMT14
BLEU score
reversing the order of the words in all source sentences (but not target sentences) improved the LSTM’s performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier

Subhaditya's KB