Detailseite
Projekt Druckansicht

Adaptive und skalierende Techniken zur Ereigniserkennung in Twitter-Datenströmen

Fachliche Zuordnung Sicherheit und Verlässlichkeit, Betriebs-, Kommunikations- und verteilte Systeme
Softwaretechnik und Programmiersprachen
Förderung Förderung von 2015 bis 2018
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 275968728
 
Erstellungsjahr 2019

Zusammenfassung der Projektergebnisse

This project addressed the development of adaptive event detection techniques for Twitter. In particular, we focussed on the task of first story detection, i.e., the detection of general unknown events. Even though Twitter with its 330 million monthly active users who produce over 500 million tweets per day is an influential source of information, topic detection and tracking involves several new challenges. In comparison to traditional news media articles, Twitter “documents” are much shorter and contain a substantial amount of spam, advertising, typos, slang, etc. Our project planned to follow an empirical approach to study several existing event detection techniques in terms of how different configuration settings impact the produced results. Based on this understanding of the interplay between parameter settings and result quality, we proposed to design methods for automatic parameter adjustments, enabling a technique to adapt to quantitative and qualitative changes in the input Twitter data stream. At this point, we encountered unforeseen challenges that prevented us from conducting the work program as planned. Our experiments showed that existing event detection techniques are highly unstable w.r.t. small variations in parameter values, preventing us from understanding the aforementioned interplay. As a consequence, we deviated from the original work program and began to address these challenges in two ways. First, we started to investigate how event detection techniques could be made more stable. Our idea was to process the same Twitter data stream with different parameter settings in parallel and then only report events that had been detected by multiple parallel workers. This approach ultimately proved to be unsuccessful as the sets of events reported by different detectors were often completely disjoint. Nevertheless, we were able to abstract this idea of using parallelization to improve result quality into a general building block for data stream processing and demonstrate its benefits in other applications domains. Second, we began an effort to render research on event detection techniques for Twitter more reproducible. Apart from defining general measures that can be used to qualitatively and quantitatively compare different techniques, we proposed a benchmark to evaluate the performance and stability of techniques. This benchmark consists of a data generator that produces a data stream with the same statistical properties as the Twitter data stream and a ground truth of events that should be reported by any event detection technique applied to it. We hope that these proposals will be adopted by the research community working on event detection techniques for Twitter and help to evaluate the contribution made by future approaches more systematically.

Projektbezogene Publikationen (Auswahl)

 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung