UnSE: Unsupervised Speech Enhancement using Optimal Transport Online Supplement

Authors

Wenbin Jiang, Fei Wen, Yifan Zhang, Kai Yu

Abstract

Most deep learning-based speech enhancement methods usually use supervised learning, which requires massive noisy-to-clean training pairs. However, the synthesized training data can only partially cover some realistic environments, and it is generally difficult or almost impossible to collect pairs of noisy and ground-truth clean speech in some scenarios. To address this problem, we propose an unsupervised speech enhancement method that does not require any paired noisy-to-clean training data. Specifically, based on the optimal transport criterion, the speech enhancement model is trained in an unsupervised manner only using a noisy speech based fidelity loss and a distribution divergence loss, by which the divergence between the output and (unpaired) clean speech is minimized. Experimental results show that the proposed unsupervised method can achieve competitive performance with supervised methods on the VCTK + DEMAND benchmark and better performance on the CHiME4 benchmark.

Datasets

  • The VCTK+DEMAND dataset is used for demo.
  • Audio samples of the test set we processed are available at the repository (VCTK).
  • Setups

  • The neural network architecture of the denoising model (i.e., generator) and discriminator are detailed in generator.py and discriminator.py, respectively.
  • The configurations of the both models are detailed in model_arch.py.
  • Compared methods

  • OMLSA: Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging
  • SEGAN: Speech Enhancement Generative Adversarial Network
  • SASEGAN: Self-Attention Generative Adversarial Network for Speech Enhancement
  • DOTN: Discriminator-Constrained Optimal Transport Network

  • Audio Samples

    Model\id(noise) p257_006(cafe) p257_073(living) p257_286(bus) p232_227(office) p232_378(psquare)
    Clean
    Noisy
    OMLSA
    SEGAN
    SASEGAN
    DOTN
    Proposed

    Spectrogram of the samples in first column (except clean)