Comprehensive Modeling of Conversational Contributions in Prose Texts
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Final Report Abstract
In sum, the project has made contributions on a number of levels. The first level is advances in the specific modelling of quotation detection, namely robust models (study 1) and rich datasets (study 2). The second level is in line with the development in computational linguistics to unify previously distinct tasks and concerns a better understanding of quotation detection as a more general information extraction task involving span detection (study 4) and slot filling (study 3). The third level is formed by the application of these ideas for an application in digital humanities (study 5). The uptake of the models and software that we developed by other research projects demonstrates that quotation detection, as we defined it, can now be carried out at a reasonable level of accuracy and robustness. The next frontier consists in integrating such "local" quotation information into a "global" understanding of a complete conversation, or beyond the conversation, into relations between actors. Some of these questions are being addressed in our currently ongoing project, MARDY (Modeling argumentation dynamics), which aims at building discourse networks (linking political actors and the claims that they make) from newspaper reports. However, as we note above, newspaper reports are considerably more formulaic in their use of reported speech, and the integration of quotation detection into more global text understanding of literary texts is, to our knowledge, still an open problem. At the interpersonal level, one of the outcomes of the project that the establishment of personal contacts between the Theoretical Computational Linguistics group at IMS Stuttgart and the chair of Romance Philology (Literary Studies), Hanno Ehrlicher, at Tübingen University, due to a shared interest in the analysis of quotations in historical corpora.
Publications
- DERE: A task and domain-independent slot filling framework for declarative relation extraction. Proceedings of EMNLP. Brussels, Belgium, 2018
Heike Adel, Laura Ana Maria Bostan, Sean Papay, Sebastian Padó and Roman Klinger
(See online at https://doi.org/10.18653/v1/D18-2008) - Quotation Detection and Classification with a Corpus-Agnostic Model. Proceedings of RANLP. Varna, Bulgaria, 2019
Sean Papay and Sebastian Padó
(See online at https://doi.org/10.26615/978-954-452-056-4_103) - Dissecting Span Identification Tasks with Performance Prediction. Proceedings of EMNLP, pages 4881–4895
Sean Papay, Roman Klinger and Sebastian Padó
(See online at https://doi.org/10.18653/v1/2020.emnlp-main.396) - RiQuA: A Corpus of Rich Quotation Annotation for English Literary Text. Proceedings of LREC, pages 835-841
Sean Papay and Sebastian Padó