FLAVA

  • FLAVA: a Foundational Language and Vision Alignment Model
  • foundational vision and language alignment model that performs well on all three target modalities: 1) vision, 2) language, and 3) vision & language
  • use a single holistic universal model, as a “foundation”, that targets all modalities at once
  • wide range of 35 tasks spanning these target modalities