Project Details
Synthetic data and unsupervised annotations for data-driven video analysis.
Applicant
Professor Dr. Daniel Cremers, since 11/2022
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
since 2021
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 413611294
Computer vision is becoming more and more important as autonomous cars and robots start to appear in our cities. Any autonomous vehicle should be able to interpret the dynamic scene around it in order to act accordingly. Deep learning has recently pushed computer vision forward, enabling detection and segmentation of thousands of objects in an image. It is therefore natural to consider a data-driven approach to tackle the dynamic scene understanding problem, i.e., to understand dynamic objects, in particular pedestrians, around our robot or autonomous car. The main problem we face when following a data-driven video analysis paradigm is the lack of data, as video datasets are comparably much smaller than image datasets, and neural networks are incredibly data-hungry. Adding the temporal component poses two important challenges for precise annotation: (i) it is harder to obtain high-quality temporally consistent annotations, and (ii) the quantity of data needed is much larger to cover for spatial as well as temporal diversity.For these reasons, it is of high interest to study the possibility of using synthetic data to trainneural networks that work on spatio-temporal video data. Our main goal is to study novel ways of training spatio-temporal neural network models without relying on huge amounts of carefully annotated video data. Since we have a good knowledge on how to use deep learning to analyze the spatial component, i.e., an image, the emphasis of this project is on the temporal domain.We propose three distinctive approaches to advance the state-of-the-art in data-driven video analysis without the use of large-scale expensively annotated video data: A Leveraging synthetic data on the temporal domain - The goal is to learn motion modelsfrom synthetic data that generalize to real videos. We will study the latent representationsof synthetic and real temporal data, in order to better train models only on synthetic datathat transfer to real scenarios.B Understanding and bridging the domain gap for spatio-temporal data - We willstudy and understand the reasons for the domain gap between synthetic and real data, fo-cusing especially on the temporal (motion) domain. We will then turn towards closing thegap with the help of generative adversarial networks. The main goal is to generate syntheticmotions closer to the real motions in an unsupervised way, so that generation and motionmodel training happen at the same time.C Motion analogies for weakly supervised annotations – We propose to use motionas cue for weak supervision of 2D image annotations. The main idea is to establish motion analogies between annotated and non-annotated videos to later perform automatic label transfer. Training with synthetic data, real data and potentially weak annotations will be the final goal of the project.
DFG Programme
Research Units
Subproject of
FOR 2987:
Learning and Simulation in Visual Computing
Ehemalige Antragstellerin
Professorin Dr.-Ing. Laura Leal-Taixe, until 11/2022