Un-LSTM

  • Due to the powerful ability of modeling long-term dynamic in videos, LSTM is used in both the encoder and decoder [37].
  • Since its superior ability to model temporal dynamics, most of them use LSTM or LSTM variant to encode temporal dynamics in videos or to infer the future frames [37], [146], [147], [164], [165]
  • can be employed for self-supervised feature learning without using human- annotations
  • encoder-decoder pipeline in which the encoder to model spatial and temporal features from the given video clips and the decoder to generate future frames based on feature extracted by encoder.