Erschließung des lexikalisch-semantischen Wissens aus dynamischen und linguistischen Quellen und Integration ins Question Answering zum diskursiven Wissenserwerb im E-Learning
Final Report Abstract
The two main goals of QA-EL are leveraging lexical-semantic information from collaborative and expert-built resources and using it in question answering (QA) for educational purposes. During the project runtime we made significant progress addressing these goals: We published a comprehensive analysis of lexical information in collaboratively created and expert-built resources and developed a novel method for aligning senses in WordNet and Wikipedia that advanced the state-of-the-art. A generalization of this method to aligning senses in arbitrary lexical resources led to the creation of the large-scale lexical-semantic resource UBY, which we published together with open-source software for creating the UBY database and accessing the information in UBY. Then, we successfully leveraged lexical-semantic knowledge for information retrieval-based QA. We harnessed question paraphrases from the social Q&A site WikiAnswers and used them to evaluate pre-processing strategies and various similarity metrics for the identification of question paraphrases. These are required to retrieve related questions in a QA framework, which defines answer retrieval as retrieval of question paraphrases and their corresponding answers from social Q&A sites. We laid important foundations for QA-EL in our corpus-based works: We created question corpora from social Q&A sites and analysed them with respect to question type, subjectivity, and quality. This work entailed a detailed study on the requirements of educational QA systems using data from social media, which identified the focal points for our research in educational QA. In this work, we identified the quality of user-generated discourse to be an essential requirement for educational natural language processing: it is particularly important for question answering in educational settings, as unreliable answers can confuse the learners and obstruct the learning process. We adapted our research programme accordingly: Even though this topic was not explicitly contained in our original project plan, we were able to additionally address quality assessment and text simplification of user-generated discourse. Some of the challenges originally laid out in the work plan of QA-EL have had to be addressed ahead of time within other researchers’ work. This concerns developing new metrics for semantic relatedness and semantic similarity. Therefore, we extended word-based semantic relatedness models to phrase-based similarities generated by applying monolingual translation models. In this work, we exploited new opportunities opened up by collaborative lexical-semantic resources by extracting a corpus of parallel definitions from traditional and collaborative knowledge sources, e.g. Wikipedia, Wiktionary and WordNet. We used this corpus to train a monolingual translation model, which we successfully employed for question retrieval on social Q&A data from the educational domain. An overarching goal of the QA-EL project was the development of natural language processing tools and data sets according to high engineering standards. We, for instance, developed DKPro-UGD as a UIMA-based toolkit for data-cleansing of usergenerated discourse – a prerequisite for linguistically processing data from social media. The tool was applied in the preparation of data from social Q&A sites, Wikipedia, FAQs and lecture slides as knowledge sources for an educational QA system. We designed and implemented an efficient demo question answering system using these datasets. QA-EL opened up new research questions for educational QA that we will address in the future in collaboration with the German Institute for International Educational Research and Educational Information (DIPF).
Publications
- 2009. Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL-IJCNLP 2009), pp. 728-736, Suntec, Singapore
Delphine Bernhard and Iryna Gurevych
- 2009. Educational Question Answering based on Social Media Content. In V. Dimitrova et al. (Eds.): Proceedings of the 14th International Conference on Artificial Intelligence in Education. Building learning systems that care: From knowledge representation to affective modelling (AIED 2009), Frontiers in Artificial Intelligence and Applications 200, pp. 133-140, IOS Press, Amsterdam, The Netherlands
Iryna Gurevych, Delphine Bernhard, Kateryna Ignatova and Cigdem Toprak
- 2010. A Monolingual Tree-based Translation Model for Sentence Simplification. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pp. 1353-1361, Beijing, China
Zhemin Zhu, Delphine Bernhard and Iryna Gurevych
- 2010. Combining Probabilistic and Translation-Based Models for Information Retrieval based on Word Sense Annotations. In: C. Peters et al. (Eds.): Multilingual Information Access Evaluation I - Text Retrieval Experiments, LNCS 6241, pp. 120-127, Springer-Verlag Berlin/Heidelberg, Germany
Elisabeth Wolf, Delphine Bernhard and Iryna Gurevych
- 2011. The People’s Web meetis Linguistic Knowledge: Automatic Sense Alignment of Wikipedia and WordNet. In: Proceedings of the 9th International Conference on Computational Semantics (IWCS), pp. 205-214, Oxford, UK
Elisabeth Niemann (née Wolf) and Iryna Gurevych
- 2012. UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pp. 580-590, Avignon, France
Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M. Meyer and Christian Wirth