Project Details
Practical and Parallel Text Compression for Highly Repetitive Data
Applicant
Professor Dr. Johannes Christian Fischer
Subject Area
Theoretical Computer Science
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 501086801
We want to develop practical algorithms for compressing highly repetitive data that overcome the shortcomings of currently common compressors such as gzip or bzip2. These have been established in the 1990s and targeted hardware that was standard in those days; their main disadvantage is that they do not capture repetitions of substrings that are far apart. The first goal is to design and engineer a compression tool that does also benefit from such long range repetitions, but still has only moderate memory requirements.As a second goal, we want to exploit the shared-memory parallelism present in virtually any CPU in order to speed up compression, without losing too much compression ratio. Here, we want to have a broader look at compression algorithms, and in particular include grammar compressors which offer excellent opportunitie for parallelization.In the ideal case, both ideas to make better use of modern resources will be integrated into production-ready software repositories (like Linux distributions) so that end consumers can benefit easily from our algorithm engineering efforts.
DFG Programme
Research Grants