Erschließung des lexikalisch-semantischen Wissens aus dynamischen und linguistischen Quellen und Integration ins Question Answering zum diskursiven Wissenserwerb im E-Learning

Applicant Professorin Dr. Iryna Gurevych

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing

Term from 2007 to 2015

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 37353858

Final Report Year 2014

Final Report Abstract

The two main goals of QA-EL are leveraging lexical-semantic information from collaborative and expert-built resources and using it in question answering (QA) for educational purposes. During the project runtime we made signiﬁcant progress addressing these goals: We published a comprehensive analysis of lexical information in collaboratively created and expert-built resources and developed a novel method for aligning senses in WordNet and Wikipedia that advanced the state-of-the-art. A generalization of this method to aligning senses in arbitrary lexical resources led to the creation of the large-scale lexical-semantic resource UBY, which we published together with open-source software for creating the UBY database and accessing the information in UBY. Then, we successfully leveraged lexical-semantic knowledge for information retrieval-based QA. We harnessed question paraphrases from the social Q&A site WikiAnswers and used them to evaluate pre-processing strategies and various similarity metrics for the identiﬁcation of question paraphrases. These are required to retrieve related questions in a QA framework, which deﬁnes answer retrieval as retrieval of question paraphrases and their corresponding answers from social Q&A sites. We laid important foundations for QA-EL in our corpus-based works: We created question corpora from social Q&A sites and analysed them with respect to question type, subjectivity, and quality. This work entailed a detailed study on the requirements of educational QA systems using data from social media, which identiﬁed the focal points for our research in educational QA. In this work, we identiﬁed the quality of user-generated discourse to be an essential requirement for educational natural language processing: it is particularly important for question answering in educational settings, as unreliable answers can confuse the learners and obstruct the learning process. We adapted our research programme accordingly: Even though this topic was not explicitly contained in our original project plan, we were able to additionally address quality assessment and text simpliﬁcation of user-generated discourse. Some of the challenges originally laid out in the work plan of QA-EL have had to be addressed ahead of time within other researchers’ work. This concerns developing new metrics for semantic relatedness and semantic similarity. Therefore, we extended word-based semantic relatedness models to phrase-based similarities generated by applying monolingual translation models. In this work, we exploited new opportunities opened up by collaborative lexical-semantic resources by extracting a corpus of parallel deﬁnitions from traditional and collaborative knowledge sources, e.g. Wikipedia, Wiktionary and WordNet. We used this corpus to train a monolingual translation model, which we successfully employed for question retrieval on social Q&A data from the educational domain. An overarching goal of the QA-EL project was the development of natural language processing tools and data sets according to high engineering standards. We, for instance, developed DKPro-UGD as a UIMA-based toolkit for data-cleansing of usergenerated discourse – a prerequisite for linguistically processing data from social media. The tool was applied in the preparation of data from social Q&A sites, Wikipedia, FAQs and lecture slides as knowledge sources for an educational QA system. We designed and implemented an efﬁcient demo question answering system using these datasets. QA-EL opened up new research questions for educational QA that we will address in the future in collaboration with the German Institute for International Educational Research and Educational Information (DIPF).

Publications

2009. Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL-IJCNLP 2009), pp. 728-736, Suntec, Singapore
Delphine Bernhard and Iryna Gurevych
2009. Educational Question Answering based on Social Media Content. In V. Dimitrova et al. (Eds.): Proceedings of the 14th International Conference on Artiﬁcial Intelligence in Education. Building learning systems that care: From knowledge representation to affective modelling (AIED 2009), Frontiers in Artiﬁcial Intelligence and Applications 200, pp. 133-140, IOS Press, Amsterdam, The Netherlands
Iryna Gurevych, Delphine Bernhard, Kateryna Ignatova and Cigdem Toprak
2010. A Monolingual Tree-based Translation Model for Sentence Simpliﬁcation. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pp. 1353-1361, Beijing, China
Zhemin Zhu, Delphine Bernhard and Iryna Gurevych
2010. Combining Probabilistic and Translation-Based Models for Information Retrieval based on Word Sense Annotations. In: C. Peters et al. (Eds.): Multilingual Information Access Evaluation I - Text Retrieval Experiments, LNCS 6241, pp. 120-127, Springer-Verlag Berlin/Heidelberg, Germany
Elisabeth Wolf, Delphine Bernhard and Iryna Gurevych
2011. The People’s Web meetis Linguistic Knowledge: Automatic Sense Alignment of Wikipedia and WordNet. In: Proceedings of the 9th International Conference on Computational Semantics (IWCS), pp. 205-214, Oxford, UK
Elisabeth Niemann (née Wolf) and Iryna Gurevych
2012. UBY - A Large-Scale Uniﬁed Lexical-Semantic Resource Based on LMF. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pp. 580-590, Avignon, France
Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M. Meyer and Christian Wirth

Servicenavigation

Hauptnavigation

Erschließung des lexikalisch-semantischen Wissens aus dynamischen und linguistischen Quellen und Integration ins Question Answering zum diskursiven Wissenserwerb im E-Learning

Final Report Abstract

Publications

Additional Information

Servicenavigation

Hauptnavigation

Erschließung des lexikalisch-semantischen Wissens aus dynamischen und linguistischen Quellen und Integration ins Question Answering zum diskursiven Wissenserwerb im E-Learning

Final Report Abstract

Publications

Additional Information

Textvergrößerung und Kontrastanpassung