Project Details
Projekt Print View

Iterative Information Fusion in Automatis Speech Recognition According to the Turbo Principle

Subject Area Electronic Semiconductors, Components and Circuits, Integrated Systems, Sensor Technology, Theoretical Electrical Engineering
Term from 2019 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 414091002
 
The intelligent fusion of information plays a major role in two opposing megatrends of information technology: (1) Decentralization (internet, internet of things, decentralized network control, sensor networks, industry 4.0, ...), and since recently in the field of automatic speech recognition also (2) centralization (Siri, Google Home, Amazon Alexa, YouTube). Both trends have in common that multiple information sources are being used: It may be multimodal approaches (e.g., audiovisual speech recognition: microphone, camera), or uni-modal (speech recognition only with microphone signals). The uni-modal approach may operate multi-channel or single-channel, in the latter case using, e.g., information of two different feature representations.In prior works of the applicant the turbo principle known from Digital Communications for iterative fusion of information has been successfully transferred to automatic speech recognition (ASR). One objective of this project is to further explore the still widely uncovered potential of turbo information fusion in the field of automatic speech recognition. It is not only capable of fusing feature representations very well, but can also perform fusion of acoustic models. Since modelling in ASR is meanwhile performed with deep neural networks, and a variety of network model topologies are subject to research nowadays, fusion is a hot topic, but high-performance fusion approaches rarely come with modularity. However, since modularity in information fusion in both trends (1) and (2) is almost indispensable, in this project turbo information fusion shall be further developed to become completely modular, thereby proving high flexibility and relevance for a wide range of applications.A further objective is to acquire a deeper knowledge of the iteratively operating turbo information fusion. Why is it performing so well? And how about the relation between its performance and statistical dependence of the information sources? Controlled experiments with synthetic data allowing perfect modelling shall provide answers. Also the both useful and theoretically demanding so-called EXIT charts known from Digital Communications shall be developed further with the ultimate goal to be able to predict the performance of turbo information fusion. Even more, using the EXIT analysis tool, it shall become possible that the fusion can be designed in a way such that after a few iterations indeed a high quality recognition result is obtained.Finally, we plan to explore ASR with turbo information fusion with more than two information sources or recognizers, respectively. Besides the fusion of a couple of complementary models the scenario of spatially distributed microphones and ASR systems is of interest: Is turbo information fusion capable of obtaining a performance gain from spatially distributed microphones in, e.g., a reverberant environment?
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung