mirror of
https://github.com/tu-darmstadt-informatik/bsc-thesis.git
synced 2026-01-20 20:11:16 +00:00
28 lines
1.7 KiB
TeX
28 lines
1.7 KiB
TeX
We have introduced a extension on top of region-based convolutional networks to enable object motion estimation
|
|
in parallel to instance segmentation.
|
|
|
|
\subsection{Future Work}
|
|
\paragraph{Predicting depth}
|
|
In most cases, we want to work with RGB frames without depth available.
|
|
To do so, we could integrate depth prediction into our network by branching off a
|
|
depth network from the backbone in parallel to the RPN, as in Figure \ref{}.
|
|
Although single-frame monocular depth prediction with deep networks was already done
|
|
to some level of success,
|
|
our two-frame input should allow the network to make use of epipolar
|
|
geometry for making a more reliable depth estimate.
|
|
|
|
\paragraph{Training on real world data}
|
|
Due to the amount of supervision required by the different components of the network
|
|
and the complexity of the optimization problem,
|
|
we trained Motion R-CNN on the simple synthetic Virtual KITTI dataset for now.
|
|
A next step would be training on a more realistic dataset.
|
|
For example, we can first pre-train the RPN on an object detection dataset like
|
|
Cityscapes. As soon as the RPN works reliably, we could execute alternating
|
|
steps of training on, for example, Cityscapes and the KITTI stereo and optical flow datasets.
|
|
On KITTI stereo and flow, we could run the instance segmentation component in testing mode and only penalize
|
|
the motion losses (and depth prediction), as no instance segmentation ground truth exists.
|
|
On Cityscapes, we could continue train the full instance segmentation Mask R-CNN to
|
|
improve detection and masks and avoid any forgetting effects.
|
|
As an alternative to this training scheme, we could investigate training on a pure
|
|
instance segmentation dataset with unsupervised warping-based proxy losses for the motion (and depth) prediction.
|