dGSLM

  • Generative Spoken Dialogue Language Modeling
  • dGSLM
  • first “textless” model able to generate audio samples of naturalistic spoken dialogues
  • unsupervised spoken unit discovery coupled with a dual-Tower Transformer architecture with cross-Attention trained on 2000 hours of two-channel raw conversational audio Fisher Spanish-English without any text or labels
  • generate speech, laughter and other paralinguistic signals in the two channels simultaneously and reproduces naturalistic turn taking