Merge branch 'master' of github.com:simonmeister/bsc-thesis

This commit is contained in:
Simon Meister 2017-11-06 17:20:33 +01:00
commit e832c23983
3 changed files with 15 additions and 6 deletions

View File

@ -89,7 +89,7 @@ predict $\sin(\alpha)$, $\sin(\beta)$, $\sin(\gamma)$ and $t_t^{cam}$ in the sam
\subsection{Supervision}
\paragraph{Per-RoI supervision with motion ground truth}
\paragraph{Per-RoI supervision with 3D motion ground truth}
The most straightforward way to supervise the object motions is by using ground truth
motions computed from ground truth object poses, which is in general
only practical when training on synthetic datasets.
@ -124,7 +124,7 @@ We supervise the camera motion with ground truth analogously to the
object motions, with the only difference being that we only have
a rotation and translation, but no pivot term for the camera motion.
\paragraph{Per-RoI supervision \emph{without} motion ground truth}
\paragraph{Per-RoI supervision \emph{without} 3D motion ground truth}
A more general way to supervise the object motions is a re-projection
loss similar to the unsupervised loss in SfM-Net \cite{SfmNet},
which we can apply to coordinates within the object bounding boxes,

View File

@ -1,7 +1,8 @@
\subsection{Summary}
We have introduced an extension on top of region-based convolutional networks to enable object motion estimation
in parallel to instance segmentation.
\todo{complete}
We have introduced an extension on top of region-based convolutional networks to enable 3D object motion estimation
in parallel to instance segmentation, given two consecutive frames. Additionally, our network estimates the 3D
motion of the camera between frames. Based on this, we compose optical flow from 3D motions in a end.
\subsection{Future Work}
\paragraph{Predicting depth}
@ -28,3 +29,11 @@ On Cityscapes, we could continue train the instance segmentation components to
improve detection and masks and avoid forgetting instance segmentation.
As an alternative to this training scheme, we could investigate training on a pure
instance segmentation dataset with unsupervised warping-based proxy losses for the motion (and depth) prediction.
\paragraph{Temporal consistency}
A next step after the two aforementioned ones could be to extend our network to exploit more than two
temporally consecutive frames, which has previously been shown to be beneficial in the
context of scene flow \cite{TemporalSF}.
In fact, by incorporating recurrent neural networks, e.g. LSTMs \cite{LSTM},
into our architecture, we could enable temporally consistent motion estimation
from image sequences of arbitrary length.

View File

@ -5,7 +5,7 @@ computations. To make our code easy to extend and flexible, we build on
the TensorFlow Object detection API \cite{TensorFlowObjectDetection}, which provides a Faster R-CNN baseline
implementation.
On top of this, we implemented Mask R-CNN and the Feature Pyramid Network (FPN)
as well all extensions for motion estimation and related evaluations
as well as extensions for motion estimation and related evaluations
and postprocessings. In addition, we generated all ground truth for
Motion R-CNN in the form of TFRecords from the raw Virtual KITTI
data to enable fast loading during training.