Project Details
Deep Neural Networks for Nonlinear Multichannel Speech Enhancement
Applicant
Professor Dr.-Ing. Timo Gerkmann
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Communication Technology and Networks, High-Frequency Technology and Photonic Systems, Signal Processing and Machine Learning for Information Technology
Communication Technology and Networks, High-Frequency Technology and Photonic Systems, Signal Processing and Machine Learning for Information Technology
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 508337379
In this project, we explore how the flexible nonlinear modelling capacity of deep neural networks can be employed to push the performance of multichannel speech enhancement algorithms beyond the limits imposed by traditional linear beamforming. To understand speech in noisy environments, a growing number of hearing-impaired human listeners in our aging society, as well as human-machine interfaces, rely on speech enhancement algorithms. These aim to improve speech quality and intelligibility by suppressing background noise and other unwanted effects such as reverberation. In a multichannel setting, algorithms can leverage spatial information in addition to exploiting the tempo-spectral characteristics of the noisy signal. Traditionally, this has been done by concatenating a linear spatial filter, a so-called beamformer, and a possibly nonlinear and machine learning-based spectral single-channel postfilter. In contrast, statistical analyses and experimental evaluations of our preliminary work reveal that a joint spatial-spectral nonlinear filter may outperform the traditional approach if the noise is non-Gaussian. However, the estimation of the parameters of such analytical estimators has proven to be difficult in practice. Consequently, this project targets the development and analysis of robust joint spatial-spectral nonlinear filters using deep neural networks as flexible and powerful nonlinear function approximators. For this, concepts from information theory, statistical signal processing, and machine learning are combined. Upon success, this project may pave the way towards a novel class of nonlinear multichannel speech signal processing schemes and is thus of high relevance both for academia and industry.
DFG Programme
Research Grants