All-neural end-to-end (E2E) Spoken Language Understanding (SLU) models can improve performance over traditional compositional SLU models, but have the challenge of requiring high-quality training data with both audio and annotations
they struggle with performance on “golden utterances”, which are essential for defining and supporting Features, but may lack sufficient training data
using data augmentation to compare two data-centric AI methods to improve performance on golden utterances
improving the annotation quality of existing training utterances and augmenting the training data with varying amounts of synthetic data
both data-centric approaches to improving E2E SLU achieved the desired effect, although data augmentation was much more powerful than annotation standardization.
leads to improvement in intent recognition error rate (IRER) on their golden utterance test set by 93% relative to the baseline without seeing a negative impact on other test metrics