bsc-thesis/abstract.tex
Simon Meister 8edfcbac9f WIP
2017-10-30 10:45:38 +01:00

37 lines
1.8 KiB
TeX

\begin{abstract}
Many state of the art energy-minimization approaches to optical flow and scene
flow estimation rely on a (piecewise) rigid scene model, where the scene is
represented as an ensemble of distinct, rigidly moving components, a static
background and a moving camera.
By constraining the optimization problem with a physically sound scene model,
these approaches enable higly accurate motion estimation.
With the advent of deep learning methods, it has become popular to re-purpose
generic deep networks for classical computer vision problems involving
pixel-wise estimation.
Following this trend, many recent end-to-end deep learning approaches to optical
flow and scene flow directly predict full resolution flow fields with
a generic network for dense, pixel-wise prediction, thereby ignoring the
inherent structure of the underlying motion estimation problem and any physical
constraints within the scene.
We introduce a scalable end-to-end deep learning approach for dense motion estimation
that respects the structure of the scene as being composed of distinct objects,
thus combining the representation learning benefits of end-to-end deep networks
with a physically plausible scene model.
Building on recent advanced in region-based convolutional networks (R-CNNs),
we integrate motion estimation with instance segmentation.
Given two consecutive frames from a monocular RGBD camera,
our resulting end-to-end deep network detects objects with accurate per-pixel
masks and estimates the 3D motion of each detected object between the frames.
By additionally estimating a global camera motion in the same network,
we compose a dense optical flow field based on instance-level and global motion
predictions.
We demonstrate the feasibility of our approach on the KITTI 2015 optical flow
benchmark.
\end{abstract}