Review: SegNet (Semantic Segmentation)Encoder Decoder Architecture, Using Max Pooling Indices to Upsample, Outperforms FCN, DeepLabv1, DeconvNetSH TsangBlockedUnblockFollowFollowingFeb 10SegNet by Authors (https://www.
youtube.
com/watch?v=CxanE_W46ts)In this story, SegNet, by University of Cambridge, is briefly reviewed.
Originally, it was submitted to 2015 CVPR, but at last it is not being published in CVPR (But it’s 2015 arXiv tech report version still got over 100 citations).
Instead, it is published in 2017 TPAMI with more than 1800 citations.
And right now the first author has become the Director of Deep Learning and AI in Magic Leap Inc.
(SH Tsang @ Medium)Below is the demo from authors:SegNet by Authors (https://www.
youtube.
com/watch?v=CxanE_W46ts)There is also an interesting demo in that we can choose a random image or even upload our own image to try the SegNet.
I have tried as below:http://mi.
eng.
cam.
ac.
uk/projects/segnet/demo.
phpThe segmentation result for a road scene image that I found from internetOutlineEncoder Decoder ArchitectureDifferences from DeconvNet and U-NetResults1.
Encoder Decoder ArchitectureSegNet has an encoder network and a corresponding decoder network, followed by a final pixelwise classification layer.
1.
1.
EncoderAt the encoder, convolutions and max pooling are performed.
There are 13 convolutional layers from VGG-16.
(The original fully connected layers are discarded.
)While doing 2×2 max pooling, the corresponding max pooling indices (locations) are stored.
1.
2.
DecoderUpsampling Using Max-Pooling IndicesAt the decoder, upsampling and convolutions are performed.
At the end, there is softmax classifier for each pixel.
During upsampling, the max pooling indices at the corresponding encoder layer are recalled to upsample as shown above.
Finally, a K-class softmax classifier is used to predict the class for each pixel.
2.
Differences from DeconvNet and U-NetDeconvNet and U-Net have similar structures as SegNet.
2.
1.
Differences from DeconvNetSimilar upsampling approach called unpooling is used.
However, there are fully-connected layers which make the model larger.
2.
2.
Differences from U-NetIt is used for biomedical image segmentation.
Instead of using pooling indices, the entire feature maps are transfer from encoder to decoder, then with concatenation to perform convolution.
This makes the model larger and need more memory.
3.
ResultsTwo datasets are tried.
One is CamVid dataset for Road Scene Segmentation.
One is SUN RGB-D dataset for Indoor Scene Segmentation.
3.
1.
CamVid dataset for Road Scene SegmentationAs shown above, SegNet obtains very good results for many classes.
It also got the highest class average and global average.
SegNet obtains highest global average accuracy (G), class average accuracy (C), mIOU and Boundary F1-measure (BF).
It outperforms FCN, DeepLabv1 and DeconvNet.
3.
2.
SUN RGB-D Dataset for Indoor Scene SegmentationOnly RGB is used, depth (D) information are not used.
Again, SegNet outperforms FCN, DeconvNet, and DeepLabv1.
SegNet only got a bit inferior to DeepLabv1 for mIOU.
Higher accuracy for large-size classes.
Lower accuracy for small-size classes.
3.
3.
Memory and Inference TimeSegNet is slower than FCN and DeepLabv1 because SegNet contains the decoder architecture.
And it is faster than DeconvNet because it does not have fully connected layers.
And SegNet has low memory requirement during both training and testing.
And the model size is much smaller than FCN and DeconvNet.
References[2015 arXiv] [SegNet]SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling[2017 TPAMI] [SegNet]SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image SegmentationMy Previous ReviewsImage Classification[LeNet] [AlexNet] [ZFNet] [VGGNet] [SPPNet] [PReLU-Net] [STN] [DeepImage] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet]Object Detection[OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [DeepID-Net] [R-FCN] [ION] [MultiPathNet] [NoC] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]Semantic Segmentation[FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [ParseNet] [DilatedNet] [PSPNet] [DeepLabv3]Biomedical Image Segmentation[CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet]Instance Segmentation[DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]Super Resolution[SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN].