Data Aug for Spoken Language

  • Comparing Data Augmentation and Annotation Standardization to Improve End-to-end Spoken Language Understanding Models
  • All-neural end-to-end (E2E) Spoken Language Understanding (SLU) models can improve performance over traditional compositional SLU models, but have the challenge of requiring high-quality training data with both audio and annotations
  • they struggle with performance on “golden utterances”, which are essential for defining and supporting Features, but may lack sufficient training data
  • using data augmentation to compare two data-centric AI methods to improve performance on golden utterances
  • improving the annotation quality of existing training utterances and augmenting the training data with varying amounts of synthetic data
  • both data-centric approaches to improving E2E SLU achieved the desired effect, although data augmentation was much more powerful than annotation standardization.
  • leads to improvement in intent recognition error rate (IRER) on their golden utterance test set by 93% relative to the baseline without seeing a negative impact on other test metrics