Learning with Labels Generated by Hard-code Programs
Applying hard-code programs is another way to automatically generate semantic labels such as salience, foreground masks, contours, depth for images and videos
very large-scale datasets with generated semantic labels can be used for self- supervised feature learning
Various hard-code programs have been applied to generate labels for self- supervised learning methods include methods for foreground object segmentation [81], edge detection [47], and relative depth prediction [92]
Pathak et al. proposed to learn features by training a ConvNet to segment foreground objects in each frame of a video while the label is the mask of moving objects in videos [81]
Li et al. proposed to learn features by training a ConvNet for edge prediction while labels are motion edges obtained from flow fields
After GAN-based methods obtained breakthrough results in image generation, researchers employed GAN to generate videos [85], [86], [144]
VideoGAN
To model the motion of objects in videos, a two-stream network is proposed for video generation while one stream is to model the static regions in in videos as background and another stream is to model moving object in videos as foreground
Videos are generated by the combination of the foreground and background streams
each random variable in the latent space represents one video clip
Tulyakov et al. argues that this assumption increases difficulties of the generation
MocoGAN
use the combination of two subspace to represent a video by disentangling the # context and motions in videos [86]
context space which each variable from this space represents one identity
motion space while the trajectory in this space represents the motion of the identity
With the two sub-spaces, the network is able to generate videos with higher inception score.
The generator learns to map latent vectors from latent space into videos, while discriminator learns to distinguish the real world videos with generated videos.
After the video generation training on large-scale unlabeled dataset finished, the parameters of discriminator can be transferred to other downstream tasks [85].