multi-lingual speech recognition, translation and language identification
goal of a speech recognition system should be to work reliably out of the box in a broad range of environments without requiring supervised fine-tuning of a decoder for every deployment distribution
lack of a high-quality pre-trained decoder.
680,000 hours of labeled audio data
broken in 30 second segments paired with the subset of the transcript that occurs
within that time segment.