Speaker Verification

Deep Neural Networks for Small Footprint Text-dependent Speaker Verification
nvestigates the use of deep neural networks (DNNs) to train speaker embeddings for a small footprint text-dependent speaker verification task
stacked filterbank Features as input
During speaker enrollment, the trained DNN is used to extract speaker-specific Features/embeddings by averaging the activations from the last hidden layer (called deep-vectors or “d-vectors” for short), which is taken as the speaker model
d-vector is extracted for each utterance and compared to the enrolled speaker model to make a verification decision by calculating the [cosine distance](cosine distance.md) between the test d-vector and the claimed speaker’s d-vector, similar to the i-vector framework
A verification decision is made by comparing the distance to a threshold
DNN based speaker verification system achieves good performance compared to a popular i-vector system on a small footprint text-dependent speaker verification task

Subhaditya's KB