mirror of
https://github.com/tu-darmstadt-informatik/bsc-thesis.git
synced 2025-12-13 09:55:49 +00:00
WIP
This commit is contained in:
parent
eafc297758
commit
266dd4179e
@ -1,4 +1,4 @@
|
||||
Here, we will give a more detailed description of previous works
|
||||
In this section, we will give a more detailed description of previous works
|
||||
we directly build on and other prerequisites.
|
||||
|
||||
\subsection{Optical flow and scene flow}
|
||||
@ -50,16 +50,17 @@ most popular deep networks for object detection, and have recently also been app
|
||||
\paragraph{R-CNN}
|
||||
Region-based convolutional networks (R-CNNs) \cite{RCNN} use a non-learned algorithm external to a standard encoder CNN
|
||||
for computing \emph{region proposals} in the shape of 2D bounding boxes, which represent regions that may contain an object.
|
||||
For each of the region proposals, the input image is cropped at the proposed region and the crop is
|
||||
For each of the region proposals, the input image is cropped using the regions bounding box and the crop is
|
||||
passed through a CNN, which performs classification of the object (or non-object, if the region shows background). % and box refinement!
|
||||
|
||||
\paragraph{Fast R-CNN}
|
||||
The original R-CNN involves computing one forward pass of the CNN for each of the region proposals,
|
||||
which is costly, as there is generally a large amount of proposals.
|
||||
which is costly, as there is generally a large number of proposals.
|
||||
Fast R-CNN \cite{FastRCNN} significantly reduces computation by performing only a single forward pass with the whole image
|
||||
as input to the CNN (compared to the sequential input of crops in the case of R-CNN).
|
||||
Then, fixed size crops are taken from the compressed feature map of the image,
|
||||
collected into a batch and passed into a small Fast R-CNN
|
||||
each corresponding to one of the proposal bounding boxes.
|
||||
The crops are collected into a batch and passed into a small Fast R-CNN
|
||||
\emph{head} network, which performs classification and prediction of refined boxes for all regions in one forward pass.
|
||||
This technique is called \emph{RoI pooling}. % TODO explain how RoI pooling converts full image box coords to crop ranges
|
||||
Thus, given region proposals, the per-region computation is reduced to a single pass through the complete network,
|
||||
@ -75,7 +76,7 @@ and again, improved accuracy.
|
||||
This unified network operates in two stages.
|
||||
In the \emph{first stage}, one forward pass is performed on the \emph{backbone} network,
|
||||
which is a deep feature encoder CNN with the original image as input.
|
||||
Next, the \emph{backbone} features are passed into a small, fully convolutional \emph{Region Proposal Network (RPN)} head, which
|
||||
Next, the \emph{backbone} output features are passed into a small, fully convolutional \emph{Region Proposal Network (RPN)} head, which
|
||||
predicts objectness scores and regresses bounding boxes at each of its output positions.
|
||||
At any position, bounding boxes are predicted as offsets relative to a fixed set of \emph{anchors} with different
|
||||
aspect ratios.
|
||||
@ -84,10 +85,9 @@ For each anchor at a given position, the objectness score tells us how likely th
|
||||
The region proposals can then be obtained as the N highest scoring anchor boxes.
|
||||
|
||||
The \emph{second stage} corresponds to the original Fast R-CNN head network, performing classification
|
||||
and bounding box refinement for each region proposal.
|
||||
and bounding box refinement for each region proposal. % TODO verify that it isn't modified
|
||||
As in Fast R-CNN, RoI pooling is used to crop one fixed size feature map for each of the region proposals.
|
||||
|
||||
|
||||
\paragraph{Mask R-CNN}
|
||||
Faster R-CNN and the earlier systems detect and classify objects at bounding box granularity.
|
||||
However, it can be helpful to know class and object (instance) membership of all individual pixels,
|
||||
@ -101,5 +101,10 @@ In addition to extending the original Faster R-CNN head, Mask R-CNN also introdu
|
||||
variant based on Feature Pyramid Networks \cite{FPN}.
|
||||
Figure \ref{} compares the two Mask R-CNN head variants.
|
||||
|
||||
\paragraph{Feature Pyramid Networks}
|
||||
\todo{TODO}
|
||||
|
||||
\paragraph{Supervision of the RPN}
|
||||
\todo{TODO}
|
||||
\paragraph{Supervision of the RoI head}
|
||||
\todo{TODO}
|
||||
|
||||
2
bib.bib
2
bib.bib
@ -182,7 +182,7 @@
|
||||
@inproceedings{CensusTerm,
|
||||
author = {Fridtjof Stein},
|
||||
title = {Efficient Computation of Optical Flow Using the Census Transform},
|
||||
booktitle = {DAGM},
|
||||
booktitle = {{DAGM} Symposium},
|
||||
year = {2004}}
|
||||
|
||||
@inproceedings{DeeperDepth,
|
||||
|
||||
@ -1,3 +1,4 @@
|
||||
\subsection{Summary}
|
||||
We have introduced an extension on top of region-based convolutional networks to enable object motion estimation
|
||||
in parallel to instance segmentation.
|
||||
\todo{complete}
|
||||
|
||||
@ -10,9 +10,8 @@ if technically feasible, as camera sensors are cheap and ubiquitous.
|
||||
For example, in autonomous driving, it is crucial to not only know the position
|
||||
of each obstacle, but to also know if and where the obstacle is moving,
|
||||
and to use sensors that will not make the system too expensive for widespread use.
|
||||
There are many other applications.. %TODO(make motivation wider)
|
||||
|
||||
A promising approach for 3D scene understanding in these situations are deep neural
|
||||
A promising approach for 3D scene understanding in situations like these are deep neural
|
||||
networks, which have recently achieved breakthroughs in object detection, instance segmentation and classification
|
||||
in still images and are more and more often being applied to video data.
|
||||
A key benefit of end-to-end deep networks is that they can, in principle,
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user