data augmentation to improve performance of deep neural network (DNN) embeddings for speaker recognition
trained to discriminate between speakers, maps variable-length utterances to fixed-dimensional embeddings called x-vectors
prior studies have found that embeddings leverage large-scale training datasets better than i-vectors, it can be challenging to collect substantial quantities of labeled data for training
use data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness
Their data augmentation strategy employs additive noises and reverberation
Reverberation involves convolving room Impulse responses (RIR) with audio