Review: WRNs — Wide Residual Networks (Image Classification)

The design of ResNet blockWRN-d-2 (k=2), Error Rate (%) in CIFAR-10 DatasetB(3;3): Original «basic» block, in the first figure (a)B(3;1;3): With one extra 1×1 layer in between two 3×3 layersB(1;3;1): With the same dimensionality of all convolutions, «straightened» bottleneckB(1;3): The network has alternating 1×1, 3×3 convolutionsB(3;1): The network has alternating 3×3, 1×1 convolutionsB(3;1;1): Network-in-Network style blockB(3;3) has the smallest error rate (5.73%).Note: Number of depths (layers) are different is to keep the number of parameters close to each other.2.2..With shallower network, training time can be shorter since parallel computations are performed on GPUs no matter how wide.And it is the first paper to obtain lower than 20% for CIFAR-100 without any strong data augmentation!!!3.2..DropoutDropout in Original ResNet (Left) and Dropout in WRNs (Right)Dropout Is BetterTop: With dropout, consistent gain is obtained for different depth, k, and datasets.Bottom right: With dropout, the training loss is higher but test error is lower meaning that dropout reduce overfitting successfully.3.3..ImageNet & COCOSingle Crop Single Model Validation Error, ImageNetThe above networks obtain similar accuracy than the original one with 2 times fewer layers.WRN-50–2-Bottleneck: Outperforms ResNet-152 and having 3 times fewer layers, which means the training time is significantly faster.WRN-34–2: Outperforms ResNet-152 and Inception-v4-based models3.4..Training TimeTraining Time for Each Batch with Batch Size of 32, CIFAR-10WRN-16–10 and WRN-28–10: The training time is much lower than the 1004-layer Pre-Activation ResNet, and having lower error rate.WRN-40–4: The training time is lower than the 164-layer Pre-Activation ResNet, and having lower error rate.Since training takes much time, it can take couples of days or even weeks..Indeed, in recent research, many researchers are still focusing on how to reduce the training time or number of epoches for training.In WRNs, it reduce the training time but with the expense of increasing the number of parameters due to the widening of the network.References[2016 BMVC] [WRNs]Wide Residual NetworksMy Related Reviews on Image Classification[LeNet] [AlexNet] [ZFNet] [VGGNet] [SPPNet] [PReLU-Net] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [Stochastic Depth] [DenseNet]. More details

Leave a Reply