Whisper

Audio-to-Text converter
multi-lingual speech recognition, translation and language identification
goal of a speech recognition system should be to work reliably out of the box in a broad range of environments without requiring supervised fine-tuning of a decoder for every deployment distribution
lack of a high-quality pre-trained decoder.
680,000 hours of labeled audio data
broken in 30 second segments paired with the subset of the transcript that occurs within that time segment.
encoder-deccoder transformer

Subhaditya's KB