mirror of
https://github.com/tu-darmstadt-informatik/bsc-thesis.git
synced 2025-12-13 09:55:49 +00:00
35 lines
1.9 KiB
TeX
35 lines
1.9 KiB
TeX
\subsection{Motivation \& Goals}
|
|
|
|
% introduce problem to sovle
|
|
% mention classical non deep-learning works, then say it would be nice to go end-to-end deep
|
|
|
|
Recently, SfM-Net \cite{} introduced an end-to-end deep learning approach for predicting depth
|
|
and dense optical flow in monocular image sequences based on estimating the 3D motion of individual objects and the camera.
|
|
SfM-Net predicts a batch of binary full image masks specyfing the object memberships of individual pixels with a standard encoder-decoder
|
|
network for pixel-wise prediction. A fully connected network branching off the encoder predicts a 3D motion for each object.
|
|
However, due to the fixed number of objects masks, it can only predict a small number of motions and
|
|
often fails to properly segment the pixels into the correct masks or assigns background pixels to object motions.
|
|
|
|
Thus, their approach is very unlikely to scale to dynamic scenes with a potentially
|
|
large number of diverse objects due to the inflexible nature of their instance segmentation technique.
|
|
|
|
A scalable approach to instance segmentation based on region-based convolutional networks
|
|
was recently introduced with Mask R-CNN \cite{}, which inherits the ability to detect
|
|
a large number of objects from a large number of classes at once from Faster R-CNN
|
|
and predicts pixel-precise segmentation masks for each detected object.
|
|
|
|
We propose \emph{Motion R-CNN}, which combines the scalable instance segmentation capabilities of
|
|
Mask R-CNN with the end-to-end 3D motion estimation approach introduced with SfM-Net.
|
|
For this, we naturally integrate 3D motion prediction for individual objects into the per-RoI R-CNN head
|
|
in parallel to classification and bounding box refinement.
|
|
|
|
\subsection{Related Work}
|
|
|
|
\paragraph{Deep optical flow estimation}
|
|
\paragraph{Deep scene flow estimation}
|
|
\paragraph{Structure from motion}
|
|
SfM-Net, SE3 Nets,
|
|
|
|
|
|
Behl2017ICCV
|