math exam with dependent questions, e.g. a) depends on b), b) on c) and so on
if a) is wrong, all subsequent questions are also wrong
teacher forcing: after answering question a), the teacher compares it to the correct solution and grades it and then gives us the correct answer for a) to continue with
for example in sequence generation with RNN the situation is similar
each prediction depends on the last one, thus when one is wrong all subsequent will be wrong as well
no memorization can happen
the network can not look into the future
ground truth is only fed as last yt−1 prediction not as the current yt
loss does not need to be updated at each timestep, only needs to have a list with the true predictions of the model from which then the loss is calculated
pros
training converges faster, because early predictions are very bad
cons
no ground truth label during inference, thus no teacher forcing
discrepancy between training and inference scores
can lead to poor model performance and instability