X Vectors

X-Vectors: Robust DNN Embeddings for Speaker Recognition
data augmentation to improve performance of deep neural network (DNN) embeddings for speaker recognition
trained to discriminate between speakers, maps variable-length utterances to fixed-dimensional embeddings called x-vectors
prior studies have found that embeddings leverage large-scale training datasets better than i-vectors, it can be challenging to collect substantial quantities of labeled data for training
use data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness
Their data augmentation strategy employs additive noises and reverberation
Reverberation involves convolving room Impulse responses (RIR) with audio
simulated RIRs described by Ko et al.
reverberation itself is performed with the multicondition training tools in the Kaldi ASpIRE recipe
For additive noise, they use the MUSAN dataset,
PLDA classifier is used in the x-vector framework to make the final decision, similar to i-vector systems
x-vectors are compared with i-vector baselines on Speakers in the Wild and NIST SRE 2016 Cantonese where they achieve superior performance on the evaluation datasets

Subhaditya's KB