Data Aug for Spoken Language
- Comparing Data Augmentation and Annotation Standardization to Improve End-to-end Spoken Language Understanding Models
- All-neural end-to-end (E2E) Spoken Language Understanding (SLU) models can improve performance over traditional compositional SLU models, but have the challenge of requiring high-quality training data with both audio and annotations
- they struggle with performance on “golden utterances”, which are essential for defining and supporting Features, but may lack sufficient training data
- using data augmentation to compare two data-centric AI methods to improve performance on golden utterances
- improving the annotation quality of existing training utterances and augmenting the training data with varying amounts of synthetic data
- both data-centric approaches to improving E2E SLU achieved the desired effect, although data augmentation was much more powerful than annotation standardization.
- leads to improvement in intent recognition error rate (IRER) on their golden utterance test set by 93% relative to the baseline without seeing a negative impact on other test metrics