Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion
transform recorded speech to sound as though it had been recorded in a studio
end-to-end feed-forward WaveNet architecture, trained with multi-scale adversarial discriminators in both the time domain and the time-frequency domain
relies on the deep feature matching losses of the discriminators to improve the perceptual quality of enhanced speech