Project Details
Projekt Print View

Making Machine Learning on Static and Dynamic 3D Data Practical

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2019 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 405799936
 
In the last five years, advances in deep learning have led to significant progress in allowing computers to understand the real world from visual input, thus opening up many opportunities ranging from robotics to virtual and augmented reality, as well as medical and industry 4.0 applications. Most of these machine learning architectures are convolutional neural networks (CNNs), which are able to learn powerful features from images, and even generate highly-realistic pictures from scratch using generative adversarial networks (GANs). In the 2D image domain, we have seen tremendous success in both discriminative and generative tasks.Unfortunately, for 3D data, e.g. data obtained from 3D scans on autonomous cars, research is only at the infancy. This 3D direction requires further exploration, as our world is inherently three-dimensional (e.g. humans see with two eyes), and even four-dimensional when considering the temporal domain. In fact, performing scene understanding in 3D has significant advantages; for instance, a machine learning approach does not need to learn viewpoint invariance, and thus requires less training data. However, the additional third dimension (and fourth for dynamics) comes at significant computational and memory overhead, which has so far been the major bottleneck in these applications.In this proposal, we address this shortcoming by developing efficient machine learning algorithms for 3D and 4D data analysis. In particular, we will develop deep learning architectures and training methods capable of efficiently modeling different types of static and dynamic 3D data representations, including sparse sparse spatial and temporal representations on voxel volumes, RGB-D images, point clouds, multi-view images, and meshes. We will further construct new datasets designed for our scenario, captured from the real-world, as well as synthetically generated with simulated renderings, augmented to reduce the reality gap between artificial and real data. Finally, we will develop new neural network architectures designed for discriminative and generative applications embedded in spatial and specifically temporal domains. In order to showcase our learning methods, we will apply them to static and dynamic 3D reconstruction tasks, as well as semantic scene understanding in 3D and 4D with an emphasis on fusing the spatial and temporal domains.
DFG Programme Research Grants
International Connection Russia
Partner Organisation Russian Science Foundation
Cooperation Partner Professor Dr. Evgeny Burnaev
 
 

Additional Information

Textvergrößerung und Kontrastanpassung