Project Details
Projekt Print View

Comprehensive Modeling of Conversational Contributions in Prose Texts

Subject Area General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2017 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 350397899
 
Final Report Year 2022

Final Report Abstract

In sum, the project has made contributions on a number of levels. The first level is advances in the specific modelling of quotation detection, namely robust models (study 1) and rich datasets (study 2). The second level is in line with the development in computational linguistics to unify previously distinct tasks and concerns a better understanding of quotation detection as a more general information extraction task involving span detection (study 4) and slot filling (study 3). The third level is formed by the application of these ideas for an application in digital humanities (study 5). The uptake of the models and software that we developed by other research projects demonstrates that quotation detection, as we defined it, can now be carried out at a reasonable level of accuracy and robustness. The next frontier consists in integrating such "local" quotation information into a "global" understanding of a complete conversation, or beyond the conversation, into relations between actors. Some of these questions are being addressed in our currently ongoing project, MARDY (Modeling argumentation dynamics), which aims at building discourse networks (linking political actors and the claims that they make) from newspaper reports. However, as we note above, newspaper reports are considerably more formulaic in their use of reported speech, and the integration of quotation detection into more global text understanding of literary texts is, to our knowledge, still an open problem. At the interpersonal level, one of the outcomes of the project that the establishment of personal contacts between the Theoretical Computational Linguistics group at IMS Stuttgart and the chair of Romance Philology (Literary Studies), Hanno Ehrlicher, at Tübingen University, due to a shared interest in the analysis of quotations in historical corpora.

Publications

  • DERE: A task and domain-independent slot filling framework for declarative relation extraction. Proceedings of EMNLP. Brussels, Belgium, 2018
    Heike Adel, Laura Ana Maria Bostan, Sean Papay, Sebastian Padó and Roman Klinger
    (See online at https://doi.org/10.18653/v1/D18-2008)
  • Quotation Detection and Classification with a Corpus-Agnostic Model. Proceedings of RANLP. Varna, Bulgaria, 2019
    Sean Papay and Sebastian Padó
    (See online at https://doi.org/10.26615/978-954-452-056-4_103)
  • Dissecting Span Identification Tasks with Performance Prediction. Proceedings of EMNLP, pages 4881–4895
    Sean Papay, Roman Klinger and Sebastian Padó
    (See online at https://doi.org/10.18653/v1/2020.emnlp-main.396)
  • RiQuA: A Corpus of Rich Quotation Annotation for English Literary Text. Proceedings of LREC, pages 835-841
    Sean Papay and Sebastian Padó
 
 

Additional Information

Textvergrößerung und Kontrastanpassung