dGSLM
- Generative Spoken Dialogue Language Modeling
- dGSLM
- first “textless” model able to generate audio samples of naturalistic spoken dialogues
- unsupervised spoken unit discovery coupled with a dual-Tower Transformer architecture with cross-Attention trained on 2000 hours of two-channel raw conversational audio Fisher Spanish-English without any text or labels
- generate speech, laughter and other paralinguistic signals in the two channels simultaneously and reproduces naturalistic turn taking