Cross Modal-based Methods train ConvNets to verify whether two different channels of input data are corresponding to each other Visual-Audio Correspondence Verification RGB-Flow Correspondence Verification egomotion