Friday, 14th of February

11h (room R2014, 660 building) (see location)

Stéphane RIVAUD

(Sony)

Perceptual GAN for audio synthesis

Generative Adversarial Networks constitute a family of frameworks for unsupervised learning of implicit generative models.
While producing astonishing results on image synthesis, audio synthesis state-of-the-art faces its own difficulties.
We present a formulation of adversarial learning emphasizing on the perceptual quality of generated audio samples.
Our formulation allows to include a priori knowledge on human audio perception and naturally exhibits a trade-off between accuracy and numerical stability during gradient descent - we propose a heuristic to optimize the this tradeoff.

Our formulation of the learning problem is based on the Kantorovitch-Rubinstein duality theorem, and is a natural generalization of the Wasserstein GAN.
It allows us to get rid of the checkerboard artifacts while using transposed convolutions to synthesize audio waveforms.
We also show that the model successfully discover degrees of variability that are relevant from a music production point of view.
We take advantage of this by developing a kick drum sampler where we smartly navigate through the latent space.
Such a design paradigm is flexible to be adapted to existing workflow, and was greatly appreciated by professionnals.

It however still needs theoretical and practical investigations to confirm the underlying claims.


All TAU seminars: here