Project Details
Projekt Print View

Learning-Based Wavelet Video Coding Using Deep Adaptive Lifting

Subject Area Communication Technology and Networks, High-Frequency Technology and Photonic Systems, Signal Processing and Machine Learning for Information Technology
Term since 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 461649014
 
Learning-based methods resulting from artificial intelligence have been used successfully in various fields of image and video processing. In the field of lossy image compression significant progress regarding the rate-distortion performance compared to classic image coders has been achieved as well. This ratio describes the maximum achievable compression for a certain reproduction fidelity. Moreover, classic image and video coders are based on the concept of variable rates. This allows for providing various bit rates in dependence of the desired reconstruction quality. An exemplary use case can be described by supplying networks with varying channel capacities with the same codec. Current end-to-end trained learning-based methods are characterized by their good signal adaptivity, resulting in an improved compression performance compared to classic approaches. However, a crucial disadvantage is given by the lack of understanding regarding the manner of functioning of neural networks, which is caused by the fact that deep learning architectures are usually not designed systematically but manually in a trial-and-error fashion. Moreover, the training of neural networks requires large computational complexity, since variable rates are usually obtained by training multiple models separately. Therefore, in this research proposal a novel variable rate learning-based video coder shall be developed using motion compensated wavelet lifting. Besides rate adaptivity, spatial and temporal scalability is achieved, resulting in a fully scalable bit stream. This method is based on the so-called lifting structure offering the advantage of applying any non-linear operation without harming the reconstruction property of the transform. This also enables the possibility of implementing neural networks within the lifting structure and, thereby, increases the efficiency of the wavelet lifting. Learned wavelet coefficients are expected to achieve a better signal adaptivity and data compaction. In contrast to end-to-end trained methods, a further advantage of the proposed approach is given by the better understanding regarding the manner of functioning of neural networks due to the well-known architecture of the lifting structure. Thereby, changes in the network architectures can directly be tracked and interpreted. Moreover, the lifting structure is characterized by providing a fully in-place calculation, which does not need any auxiliary memory. Applying such a deep adaptive lifting structure to video compression has not been considered so far and describes a promising new concept for learning-based video compression, combining variable rates and high interpretability in one model.
DFG Programme Research Grants
International Connection Australia
Cooperation Partner Professor Dr. David Taubman
 
 

Additional Information

Textvergrößerung und Kontrastanpassung