Project Details
Projekt Print View

Neurobiological and algorithmic mechanisms of audiovisual speech processing

Subject Area Human Cognitive and Systems Neuroscience
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term since 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 523344822
 
Oral communication is of fundamental importance in many areas of life. However, understanding speech can be challenging, especially in noisy or reverberant environments, such as in a busy pub or restaurant. Such complex acoustic conditions pose a particular problem for people with hearing impairments, for whom the issue persists even when they wear hearing aids, since current aids struggle to enhance speech-in-noise comprehension for their wearers. A potential avenue for speech-in-noise enhancement is the use of other sensory modalities. In particular, seeing a speaker's face can significantly aid with understanding them even in adverse listening conditions, such as when several competing talkers are present, as well as with reducing listening effort. Similarly, speech recognition in a computer can be significantly improved when the acoustic signal is supplemented by a video of the speaker's moving face. However, the neurobiological mechanisms by which humans integrate the visual with the auditory information for improved speech comprehension remain largely unclear. Similarly, we still lack precise computer measures that inform on the intelligibility of audiovisual speech. Moreover, usage of visual information for speech enhancement in a hearing aid requires the generation of artificial moving faces from speech, an issue that has only recently begun to be investigated. This project aims to address these gaps in knowledge and capability through a combined effort of neurobiological research and algorithmic development. The project will therefore be led jointly by two experts in each area. In a first step, we will investigate computer measures of audiovisual speech intelligibility and compare them to neurocognitive mechanisms of audiovisual speech comprehension. The latter will be assessed through behavioural measurements and electroencephalographic recordings (EEG) from human participants. In a second step, we will then employ the developed computer measures to develop synthesized facial animations that are paired to a speech signal and that are optimized for enhancing the comprehension of speech in complex acoustic conditions. As a third goal, we will investigate the effects of the synthesized facial animations on listening effort and compare them to computer measures of listening effort for these audiovisual signals. The proposed research will contribute to clarifying the neurobiological mechanisms of audiovisual speech processing, especially related to the role of cortical tracking of speech rhythms. It will also provide improved computer algorithms for predicting the intelligibility and listening effort of audiovisual speech. Last but not least, the project will develop facial animations that will be optimized to enhance speech-in-noise comprehension, with potential applications in future audiovisual hearing aids.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung