SceneNet RGB-D
- large indoor synthetic video dataset which consists of 5 million rendered RGB-D images from over 15K trajectories in synthetic layouts with random but physically simulated object poses
- pixel level annotations for scene understanding problems such as semantic segmentation, instance segmentation, and object detection, and also for geometric computer vision problems such as optical flow, depth estimation, camera pose estimation, and 3D reconstruction