S2ST

  • Direct Speech-to-speech Translation with Discrete Units
  • direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation
  • self-supervised discrete speech encoder on the target speech
  • training a sequence-to-sequence speech-to-unit translation
  • model to predict the discrete representations of the target speech
  • When target text transcripts are available, they design a joint speech and text training framework that enables the model to generate dual Modality output (speech and text) simultaneously in the same inference pass
  • Fisher Spanish-English