Motion and Depth Estimation From Spatio-temporally Coded Video
Final Report Abstract
When a 3D object is imaged through a lens, objects at different distances to the camera are recorded with different sharpness of detail. The further away an object is placed from the focal plane of the camera, the blurrier it occurs in the image. The shape of the blur actually depends on the shape of the aperture. During the fellowship we developed and evaluated an algorithm to find optimized aperture shapes for depth estimation in various setups, e.g., the classical depth from defocus setting, or the recent depth from a single defocused image. Our criterion is intuitive, can be optimized quickly and the resulting aperture shapes outperform all aperture shapes known in literature up to now. During the fellowship we also contributed to the state-of-the-art in depth estimation from a single image. We extended the depth estimation capability of a current algorithm to both sides of the focal plane. The approach prots from asymmetric coded aperture masks and increases the volume of distinguishable depth levels considerably. The main aim of the project was to consider motion estimation on defocused images. For objects moving in depth, the motion also changes the appearance of the objects due to depth dependent defocus blur. We introduced a new motion estimation algorithm that models changes in appearance due to changes in depth. With this algorithm we could sucessfully estimate correspondences between coded aperture video frames. This enables the estimation of 3D motion of objects from a single monocular video sequence. We contributed to the reconstruction of sharp images from coded aperture encoded frame by making use of the segmentation provided by our depth estimation algorithm and the presence of a preceeding and succeeding video frame. Thus coded aperture video cannot only be used for obtaining a depth and motion estimate of the scene, but also an estimate of the true underlying appearance of the scene. In the last part of the fellowship we considered the effect of motion blur on video sequences. When the shutter of a camera is opened for an extended period of time, and objects move relative to the camera, objects with different motions are recorded with different amounts of motion blur. The faster an object moves, the blurrier it appears in the direction of its motion. We encoded the motion blur of objects by opening and closing the shutter of a video camera during the exposure time of each frame. We evaluated classical optical ow algorithms and also adaptations of the algorithms from the other parts of the fellowship for their accuracy to obtain motion estimates from this input data. Given the motion estimates that describe the displacement of the objects during each frame of the video sequence, we developed a reconstruction method that is more exible than current state-of-the-art in image deblurring. Unlike previous methods it can deal with nearly arbitrary motion of an arbitrary amount of objects. In summary we considered spatial and temporal encodings of a moving 3D scene that provide us with additional information on scene depth and motion. We developed algorithms to extract this additional information, and we developed algorithms to reformat the encoded signal to its conventional representation. Thus we can obtain depth, motion and texture from a single input video without requireing any additional measurement devices.
Publications
-
"Optimising Aperture Shapes for Depth Estimation", Proc. VMV 2013, September 2013
A. Sellent, P. Favaro
-
"Coded Aperture Flow", Proc. GCPR 2014, September 2014
A. Sellent, P. Favaro
-
"Optimized Aperture Shapes for Depth Estimation", Pattern Recognition Letters, Volume 40, April 2014
A. Sellent, P. Favaro
-
"Which Side of the Focal Plane are You on?", Proc. ICCP 2014, May 2014
A. Sellent, P. Favaro