Detailseite
Projekt Druckansicht

Event-basierte Exploration und Analyse von Linked Open Data

Fachliche Zuordnung Sicherheit und Verlässlichkeit, Betriebs-, Kommunikations- und verteilte Systeme
Förderung Förderung von 2014 bis 2018
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 249438185
 
Erstellungsjahr 2018

Zusammenfassung der Projektergebnisse

Driven by the continuous and almost exponential increase of textual data on the Web, novel and sophisticated methods in support of the extraction, exploration, and analysis of information from such texts remain to be major challenges in many research disciplines. While there is a plethora of “interesting” information to be extracted from text, the focus of this joint project between researchers from the TU Ilmenau and Heidelberg University was on events. Event descriptions, typically comprised of a location, a time, and actors involved in an event, provide an important means to analyze and explore complex phenomena. They are a key ingredient for constructing timelines, detecting geographic or temporal hot spots in terms of activity, or simply to provide a chronological summary for a location or actor. In this project called EventAE (for Event Analysis and Exploration), we contributed to this challenge by providing the community with • a comprehensive pipeline to extract event information from diverse types of textual documents, • an event query and analysis framework called STARK (for Spatio-Temporal data Analytics on spaRK), and • a publically accessible event data repository containing large-scale data sets related to events as well as locations and actors. The event extraction pipeline realizes novel methods in terms of employing standard tools for Named Entity Recognition (NER) and constructing event specifications, for both English and German texts, in particular because of modeling correlations among event components using networks. The STARK framework provides a more comprehensive and efficient approach to support querying and processing spatio-temporal data than existing approaches. It furthermore provides the core backend in support of the querying, analysis and exploration of event data extracted using the NER pipeline. The event repository provides the community with diverse data sets, ranging from gazetteer-like specifications of locations and actors (in combination with Wikidata) to event specifications extracted from diverse text sources such as Wikipedia and German and English news outlets. All components are readily accessible through the project website and can be used for extensions and comparisons. We envision that the results of this project not only have an impact on future research in event extraction and exploration from text data, but also that the framework developed in this project provides other research communities with an infrastructure to perform text analysis tasks in a more effective way. The interest in event related information is not specific to computer science, computational linguistics or NLP, but has been and continues to be of central importance in many fields in the social sciences, the humanities, and medicine. These communities continue to struggle with an ever growing amount of text corpora that need to be efficiently explored regarding domain specific research questions. We hope that the results, methods, and tools developed in this project provide not only these communities with an easy entrance into the field of data science and text analysis. Collaborations with researchers from respective communities during this project have clearly shown the need, utility and in particular interest of the results obtained in this joint project.

Projektbezogene Publikationen (Auswahl)

 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung