Project Details
Algorithms for Data Stream Processing
Applicant
Dr. Lasse Kliemann
Subject Area
Theoretical Computer Science
Term
from 2012 to 2018
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 216752274
Massive and continuous data is generated by varied sources, such as data network switches, satellite imagery, sensor networks, web-click and transaction data, web-crawlers, and require efficient, onthe fly analysis for early warning of critical or significant behavior. For example, for an internet service provider, it is useful to have an early warning of distributed denial of service attacks being perpetrated in the network. For massive graphs whose edges are being incrementally discovered using some crawling mechanism, it is useful to know the emergence of new communities or clusters and the dispersion of older clusters. Many such real-time monitoring applications are tolerant of errors and require statistically significant answers, rather than exact answers. The data stream model has emerged as a useful computational abstraction for massive data analysis. In this model an algorithm is typically given a relatively small amount of memory to summarize a large and (often rapidly arriving) dataset and is usually allowed a single pass or a small number of passes over the data. Here the smallness of the memory refers to the paucity of memory available as a function of the size of the data, which is very large. The available (small) memory is used to keep a summary of the data, usually in a randomized form. This summary is used to answer global queries on the data-set. This model has its origins in work by Morris, Munro and Patterson and Flajolet and Martin. Since the 1990s, the streaming model became popular due to the seminal works of Alon, Matias and Szegedy and a simultaneous growth in applications for monitoring massive data from various quarters, including networking and sensor networks. A survey of data stream computations may be found in Muthukrishnan. The goals of this project are to develop new algorithms and refined analytical methods through an interactive process between theory and experiment in the framework of Algorithm Engineering (AE) for the following problems: 1. Design of space-efficient algorithms for empirical estimation of entropy and small moments, using our Taylor polynomial estimator. 2. Design of universal sketches for frequency moments. 3. Providing fast source of random bits and de-randomization to improve the success of data streaming systems. 4. Design of pass-efficient algorithms for spectral sparsifiers over graph streams.
DFG Programme
Priority Programmes
Subproject of
SPP 1307:
Algorithm Engineering