Project Details
Spatio-Temporal Hypercolumns for Instance-based Semantic Segmentation in Video
Applicant
Professor Dr.-Ing. Thomas Brox
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
from 2017 to 2021
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 387723725
Video segmentation is one of the most challenging open problems in computer vision. Although multiple approaches have been proposed in the literature to address this task, state-of-the-art algorithms are still far from reaching human-level performance in realistic unconstrained videos. In this work, we propose a two year research program that focuses on studying the interaction between video segmentation and object recognition, introducing thus category-specific information in order to improve the video segmentation process. As starting point of our approach, we will generalize to the spatio-temporal domain state-of-the-art algorithms for static generic segmentation and semantic segmentation, by taking into account optical flow estimation. During the first year of our two year research project, we will seek for an effective combination of our recent works Convolutional Oriented Boundaries (COB) and FlowNet, in order to build a spatio-temporal video segmentation algorithm that involves local and spatial information as well as temporal consistency. Once we have extracted a consistent spatio-temporal video segmentation, we will propagate the pixel labels along frames through trajectory motion affinities and build a spatio-temporal representations for the objects and surfaces which we call Convolutional Temporal Tubes (CTT). During the second year, we will extend our previous work on Hypercolumns [3] by instantiating a spatiotemporalhypercolumn framework on the CTT, in order to refine the spatial support of objects and surfaces given their semantic characteristics while preserving temporal consistency. This representation of a video in terms of spatio-temporal regions that are stable over time while being aware of semantics and of individual instances of objects is the final objective for this two year research project. The realization of our research programme is expected to bridge the gap between human and computer performance in video segmentation for the current benchmarks. These results will enable further research in scene and object structure recovery, 3D reconstruction, video understanding, actionand object recognition, among many other applications. This project seeks to strengthen scientific exchanges between Germany and Colombia, and will be conducted in close collaboration by researchers in both countries.
DFG Programme
Research Grants
International Connection
Colombia
Partner Organisation
Universidad de los Andes
Cooperation Partner
Professor Pablo Andres Arbelaez Escalante, Ph.D.