Interactive distributed corpus exploration and annotation infrastructure for large corpora and knowledge-bases

Applicants Dr.-Ing. Richard Eckart de Castilho; Professorin Dr. Iryna Gurevych

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing

Term from 2016 to 2022

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 315979217

Project Description

The goal of this project is a research infrastructure for corpus annotation that scales to large text document collections by flexibly building subcorpora. The infrastructure addresses the needs of computational linguists and corpus linguists for a generic tool to perform selective semantic annotation tasks within and across documents. Such an infrastructure is important because it enables the targeted exploitation of the huge amounts of digitally available text for linguistic analysis. The expert user should be supported by the infrastructure in exploring the large document collections, in setting up an annotation scheme, and in extracting task-specific subcorpora from a large background corpus. The annotation of the corpora should be flexibly distributable to remotely working annotation teams of different qualification levels and backgrounds. Their work should be supported through prioritisation and annotation suggestions based on machine learning technology to efficiently create a large corpus with high-quality annotations for training and evaluating the respective algorithms. Thus, infrastructure should enable the annotation of the same corpus from multiple perspectives by multiple researchers and annotations teams working in parallel. Custom corpora should be importable by the users as needed. Further functionality is needed to maintain and expand the knowledge bases used during the semantic annotation tasks as well as to connect to external standard knowledge bases.

DFG Programme Research data and software (Scientific Library Services and Information Systems)

Servicenavigation

Hauptnavigation

Interactive distributed corpus exploration and annotation infrastructure for large corpora and knowledge-bases

Additional Information

Servicenavigation

Hauptnavigation

Interactive distributed corpus exploration and annotation infrastructure for large corpora and knowledge-bases

Additional Information

Textvergrößerung und Kontrastanpassung