Whisper
- Audio-to-Text converter
- multi-lingual speech recognition, translation and language identification
- goal of a speech recognition system should be to work reliably out of the box in a broad range of environments without requiring supervised fine-tuning of a decoder for every deployment distribution
- lack of a high-quality pre-trained decoder.
- 680,000 hours of labeled audio data
- broken in 30 second segments paired with the subset of the transcript that occurs within that time segment.
- encoder-deccoder transformer