Un-LSTM

Due to the powerful ability of modeling long-term dynamic in videos, LSTM is used in both the encoder and decoder [37].
Since its superior ability to model temporal dynamics, most of them use LSTM or LSTM variant to encode temporal dynamics in videos or to infer the future frames [37], [146], [147], [164], [165]
can be employed for self-supervised feature learning without using human- annotations
encoder-decoder pipeline in which the encoder to model spatial and temporal features from the given video clips and the decoder to generate future frames based on feature extracted by encoder.

Subhaditya's KB