📄 Cross Modal-based Methods 📄 Downstream Task 📄 Ego-motion 📄 Feature Map Visualization 📄 Free Semantic Label-based Method 📄 Human Action Recognition 📄 Image Classification 📄 Image Generation with Colorization 📄 Image Generation with Inpainting 📄 Image Generation with Super Resolution 📄 Image Generation 📄 Kernel Visualization 📄 Learning from RGB-Flow Correspondence 📄 Learning from Video Colorization 📄 Learning from Video Prediction 📄 Learning from Visual-Audio Correspondence 📄 Learning with Context Similarity 📄 Learning with Labels Generated by Game Engines 📄 Learning with Labels Generated by Hard-code Programs 📄 Learning with Spatial Context Structure 📄 Nearest Neighbor Retrieval 📄 Object Detection 📄 Pretext Task 📄 Pretext Tasks 📄 Pseudo Label 📄 Self Supervised Survey 📄 Self-supervised Learning 📄 Semantic Segmentation 📄 Semi Supervised 📄 Semi-Supervised Learning Formulation 📄 Smoothness 📄 Spatial Context Structure 📄 Spatiotemporal Convolutional Neural Network 📄 Supervised Learning Formulation 📄 Temporal Context Structure 📄 Temporal order recognition 📄 Temporal order verification 📄 Video Generation 📄 Weakly Supervised Learning Formulation 📄 Weakly-supervised Learning