Project Details
Hierarchical models for the recognition of human activities in video data
Applicant
Dr. Hildegard Kühne
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
from 2016 to 2020
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 311269674
With the growing amount of video data recorded and distributed everyday there is also a growing need for automated processing. To address the complexity of these data, video-based action recognition needs to advance from simple classification of pre-segmented clips with only one clearly defined activity towards the analysis of longer video sequences. First approaches to deal with the recognition of those sequences have been made, but they usually consider stringent timelines without regarding the specific hierarchical nature of human activities. The proposed project fills this gap by focusing on the analysis of temporal hierarchies of human activities in video sequences. It is assumed that human activities are made up of basic building blocks that can be subsumed over several stages to form larger, meaningful activities. In this context, this work aims at the exploration of hierarchical temporal structures for video-based action recognition with the goal is to analyze and recognize complex human activities in videos. To transfer hierarchical models to real action recognition scenarios, a three-stage approach is proposed. First, a bottom-up recognition system for human actions based on small temporal entities will be built. The entities will be concatenated and pooled over several temporal layers to build a high-level representation. The system will be built on generative models, as they have been successfully applied in the context of similar problems. Second, to avoid the task of labeling data, semi- and unsupervised training procedures will be implemented and evaluated. Therefore, existing unlabeled training material will be segmented and clustered, either in a semi- or unsupervised way and the resulting units will form the input for an automatically generated grammar and language format. The resulting training procedure should be able to segment a given training set into small parts, to combine them by clustering, and to build an overall representation of the activity domain by the generation of a grammar based on the defined entities. Third, as the system provides for generative, temporal modeling over time, those properties will be exploited with regard to its potential in integrating context knowledge: the generative nature of the overall model allows easy integration of context in the form of probability distributions at any stage of the recognition process and the temporal modeling provides not only an integration of context but also the assessment of context, e.g. in the form of object states, over time. The overall final system should provide both hierarchical recognition and analysis of human actions with regard to environmental context as well as the training routines needed to apply this model to a large variety of different datasets and application domains. We hope that the system will provide new ways to deal with the challenges of analyzing complex activities over time and that it will allow new applications in this field.
DFG Programme
Research Grants
Co-Investigator
Professor Dr. Jürgen Gall