Review: ION —Inside-Outside Net, 2nd Runner Up in 2015 COCO Detection (Object Detection)

From Vanilla RNN to 4-Direction IRNNIn original Fast R-CNN, ROI pooling is performed at conv5 only.3.1..Vanilla RNN (Plain Tanh RNN)In Vanilla RNN (Plain Tanh RNN), tanh is used for activation:3.2..4-Direction IRNNAnd the team from Prof..Hinton suggested IRNN, which is a kind of RNN, that is composed of ReLUs and initialized with the identity matrix..(If interested, you can visit the paper in arXiv, the paper called “A Simple Way to Initialize Recurrent Networks of Rectified Linear Units”.)Since it is done within an image, it is a lateral RNN..An example is as shown below..This step is repeated for each row (Right/Left) or column (Down/Up).And a naive version of 4-direction IRNN using the ReLU-based IRNN is as follows:The equations above are those for right-direction IRNN, similar for left, up and down directions..To make it efficient, authors simplify the above 4-direction IRNN..To simply, first, the hidden-to-output is merged into a single conv, i.e..concatenation followed by 1×1 convolution:Second, the input-to-hidden is also simplified by 1×1 convolution and shared the conv with 4 recurrent transitions:Finally, two modified IRNNs are stacked together to extract the context features:Two Stacked IRNNAfter the above modifications, the RNN equation becomes:As we can see, there is no input x, because the input is implicitly done by the 1×1 convolution..And it is found that the W can also be removed due to similar detection performance:This is like an accumulator, but with ReLU after each step..Hence, the IRNN consist of repeated steps of: accumulate, ReLU, accumulate, etc.. More details

Leave a Reply