FLAVA
- FLAVA: a Foundational Language and Vision Alignment Model
- foundational vision and language alignment model that performs well on all three target modalities: 1) vision, 2) language, and 3) vision & language
- use a single holistic universal model, as a “foundation”, that targets all modalities at once
- wide range of 35 tasks spanning these target modalities