bsc-thesis/abstract.tex

\begin{abstract}

Many state of the art energy-minimization approaches to optical flow and scene
flow estimation rely on a rigid scene model, where the scene is
represented as an ensemble of distinct, rigidly moving components, a static
background and a moving camera.
By constraining the optimization problem with a physically sound scene model,
these approaches enable state-of-the art motion estimation.

With the advent of deep learning methods, it has become popular to re-purpose
generic deep networks for classical computer vision problems involving
pixel-wise estimation.

Following this trend, many recent end-to-end deep learning approaches to optical
flow and scene flow directly predict full resolution flow fields with
a generic network for dense, pixel-wise prediction, thereby ignoring the
inherent structure of the underlying motion estimation problem and any physical
constraints within the scene.

We introduce a scalable end-to-end deep learning approach for dense motion estimation
that respects the structure of the scene as being composed of distinct objects,
thus combining the representation learning benefits and speed of end-to-end deep networks
with a physically plausible scene model.

Building on recent advanced in region-based convolutional networks (R-CNNs),
we integrate motion estimation with instance segmentation.
Given two consecutive frames from a monocular RGBD camera,
our resulting end-to-end deep network detects objects with accurate per-pixel
masks and estimates the 3D motion of each detected object between the frames.
By additionally estimating a global camera motion in the same network,
we compose a dense optical flow field based on instance-level and global motion
predictions.

\end{abstract}

\renewcommand{\abstractname}{Zusammenfassung}
\begin{abstract}
\todo{german abstract}
\end{abstract}