Simultanes Dolmetschen von Vorlesungen von/nach Deutsch
Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing
Zusammenfassung der Projektergebnisse
Speech translation (ST) is one of the most challenging yet attractive and interesting from the application point of view. In this project, KIT addressed one of the most challenging conditions for speech translation: streaming speech translation of lectures from and to German. By spotting and tracking down its main issues (data sparsity, quality and latency of its components, domain mismatches, etc.) and then investigating and researching novel, advanced techniques to tackle those issues, we have managed to build a high-quality lecture translation system. The proposed achievements were presented at well-known, international conferences and lead to the successful participation at several international evaluation campaigns. For example, our English speech recognition achieves super-human performance for a standard test set on conversational speech with a low latency. Or our multilingual translation system pioneers the research field with the idea of making the learned common representation interlingual. In addition, the techniques were integrated into a real-world application of speech translation, the KIT lecture translator. Our speech translation framework, beside becoming a useful tool for lecturers and students, also helps us to collect more lecture data and user feedbacks, shedding the light for more research on how to leverage those kinds of data to improve lecture translation systems. We also initialize a prototype model of direct speech translation, urging the efforts to build more and larger end-to-end speech translation corpora in the community. Within the project, we also developed significant contribution to one of the most researched questions in the speech and speech translation community at the moment: The comparison between end-ot-end ASR vs hybrid ASR or end-to-end speech translation and cascaded speech translation. Thereby, the developed techniques are a valuable contribution in reducing the gap between the different approach. The most valuable lesson learned from this project is how we foresee and estimate the potential of some directions and come up with modern and advanced research along those directions. Being able to do this early enough, we can contribute greatly to the research community as well as strive to get high quality research and application.
Projektbezogene Publikationen (Auswahl)
-
„Building Real-time Speech Recognition without CMVN“, In Proceedings of the 20th International Conference on Speech and Computer (SPECOM 2018). Leipzig, Germany – September 2018
Thai Son Nguyen, Matthias Sperber, Sebastian Stüker, Alexander Waibel
-
„Inspection of Multilingual Neural Machine Translation“, In Proceedings of the International Conference on Language Resources and Evaluation 2018 (LREC 2018). Miyazaki, Japan – May 2018
Carlos Mullov, Jan Niehues, Alexander Waibel
-
„KIT Lecture Translator: Multilingual Speech Translation with One-Shot Learning“, In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018). Santa Fe, New Mexico, USA - August 2018
Florian Dessloch, Thanh-Le Ha, Markus Muller, Jan Niehues, Thai-Son Nguyen, Ngoc- Quan Pham, Elizabeth Salesky, Matthias Sperber, Sebastian Stüker, Thomas Zenkel, Alexander Waibel
-
„Towards one-shot learning for rare-word translation with external experts“, In Proceedings of 2nd Workshop on Neural Machine Translation and Generation (WNMT 2018). Melbourne, Australia – July 2018
Ngoc- Quan Pham, Jan Niehues, Alexander Waibel
-
„Improving Zero-shot Translation with Language-Independent Constraints“, In Proceedings of the 4th Conference in Machine Translation (WMT 2019). Florence, Italy – August 2019
Ngoc- Quan Pham, Jan Niehues, Thanh-Le Ha, Alexander Waibel
-
„Toward Cross-Domain Speech Recognition with End-to-End Models“, In Proceedings of the Workshop on Life-Long Learning for Spoken Language Systems 2019 (LifeLongNLP 2019). Singapore, Singapore – December 2019
Thai-Son Nguyen, Sebastian Stüker, Alexander Waibel
-
„Relative Positional Encoding for Speech Recognition and Direct Translation“, In Proceedings of the 21st Annual Conference of the International Speech Communication Association (Interspeech 2020). Shanghai, China (online event) - October 2020
Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky, Sebastian Stüker, Jan Niehues, Alexander Waibel
-
„Super-Human Performance in Online Low-Latency Recognition of Conversational Speech“, In Proceedings of the 22nd Annual Conference of the International Speech Communication Association (Interspeech 2021). Brno, Czech Republic (online/onside event) - August 2021
Thai-Son Nguyen, Sebastian Stüker, Alex Waibel