Fig. 1. Diagram of basic composition of standard convolutional neural network
Fig. 2. Diagram of max pooling process
Fig. 3. Structural diagram of VGGNet model
Fig. 4. Inception module in GoogLeNet network
Fig. 5. Residual module in ResNet network
Fig. 6. Partial dataset images and corresponding semantic segmentation effect diagrams. (a) PASCAL VOC 2012; (b) PASCAL-CONTEXT; (c) MICROSOFT COCO; (d) CITYSCAPES
Fig. 7. Classification of common deep learning semantic segmentation methods
Fig. 8. FCN network processing diagram
Fig. 9. Structural diagram of SegNet model
Fig. 10. Effects of using CRF tuning iterations in DeepLab. (a) GT; (b) CNNout; (c) CRFit1; (d) CRFit2; (e) CRFit10
Fig. 11. Comparison of three models of CRFasRNN, FCN-8s, and DeepLab
Fig. 12. Diagram of pyramid pooling module in PSPNet
Fig. 13. Diagram of multi-scale CNN network architecture proposed by Roy
Fig. 14. Structural diagram of ReSeg model
Fig. 15. Diagram of GRU calculation process
Item | LeNet5 | AlexNet | VGGNet | GoogLeNet | ResNet |
---|
Year | 1994 | 2012 | 2014 | 2014 | 2015 | Layer | 7 | 8 | 19 | 22 | 152 | Conv | 2 | 5 | 16 | 21 | 151 | Kernel size | 5 | 11,5,3 | 3 | 7,1,3,5 | 7,1,3,5 | Linear | 3 | 3 | 3 | 1 | 1 | Linear size | 120,84,10 | 4096,4096,1000 | 4096,4096,1000 | 1000 | 1000 | Activation function | Sigmoid | ReLU | ReLU | ReLU | ReLU | Classifier | Multi-layerperception | Softmax | Softmax | Softmax | Softmax | Data augment | × | √ | √ | √ | √ | Bath normalization | × | × | × | × | √ | Local responsenormalization | × | √ | × | √ | × | Graphicsprocessing unit | × | √ | √ | √ | √ | Inception | × | × | × | √ | × | Dropout | × | √ | √ | √ | √ | TOP-5(error) | N/A | 16.4% | 7.32% | 6.67% | 3.57% |
|
Table 1. Information summary of common image classification networks
Dataset | Classes | Sample(training) | Sample(validation) | Sample(test) | Purpose | Year |
---|
PASCAL VOC 2012[18] | 21 | 1464 | 1449 | 1452 | Generic | 2012 | PASCAL VOC 2012+[19] | 21 | 10582 | 1449 | 1452 | Generic | 2014 | PASCAL-CONTEXT[20] | 540 | 4998 | 5105 | - | Generic | 2014 | PASCAL-PERSON-PART[20] | 6 | 1716 | - | 1817 | Person | 2014 | PASCAL-COW-PART[21] | 4 | 294 | - | 227 | Cow | 2015 | SBD[22] | 21 | 8498 | 2857 | - | Generic | 2011 | MICROSOFT COCO[23] | 80+ | 82783 | 40504 | 81434 | Generic | 2014 | CITYSCAPES(fine)[24] | 19 | 2975 | 500 | 1525 | Urban | 2015 | CITYSCAPES(coarse)[24] | 19 | 22973 | 500 | - | Urban | 2015 | CAMVID[25-26] | 32 | 361 | 100 | 233 | Driving | 2009 | KITTI-Ros[27] | 11 | 170 | - | 46 | Driving | 2015 | KITTI-Zhang[28] | 10 | 140 | - | 112 | Driving | 2015 |
|
Table 2. Information summary of common semantic segmentation datasets
Model name | Year | Architecture | Accuracy | Efficiency | Training | Contribution |
---|
FCN[32] | 2015 | VGG-16(FCN) | C | C | C | Forerunner | SegNet[33] | 2017 | VGG-16 + Decoder | A | B | C | Encoder-decoder | DeepLab[34-37] | 2017 | VGG-16 + ResNet-101 | A | C | C | Standalone CRF,Atrous convolutions | CRFasRNN[38] | 2015 | FCN-8s | C | B | A | CRF reformulated as RNN | ParseNet[39] | 2015 | VGG-16 | A | C | C | Global context feature fusion | SharpMask [40] | 2016 | DeepMask | A | C | C | Top-down refinement module | PSPNet[41] | 2016 | ResNet-101 | A | B | C | Pyramid pooling module | Multi-scale-CNN-Raj[42] | 2015 | VGG-16(FCN) | A | C | C | Multi-scale architecture | Multi-scale-CNN-Eigen[43] | 2015 | Custom | A | C | C | Multi-scalesequential refinement | Multi-scale-CNN-Roy[44] | 2016 | Multi-scale-CNN-Eigen | A | C | C | Multi-scale coarse-to-fine refinement | Multi-scale-CNN-Bian[45] | 2016 | FCN | B | C | B | Independently trainedMulti-scale FCNs | ReSeg[46] | 2016 | VGG-16 + ReNet | B | C | C | Extension of ReNet tosemantic segmentation | LSTM-CF[47] | 2016 | Fast R-CNN +DeepMask | A | C | C | Fusion of contextualinformationfrom multiple sources | RCNN[48] | 2014 | MDRNN | A | B | C | Different input sizes,image context | 2D-LSTM[49] | 2015 | MDRNN | B | B | C | Image context modelling | DAG-RNN [50] | 2015 | Elman network | A | C | C | Graph image structurefor context modelling | MINC-CNN[51] | 2015 | GoogLeNet(FCN) | C | C | C | Patchwise CNN,Standalone CRF | DeepMask[52] | 2015 | VGG-A | A | C | C | Proposals generationfor segmentation |
|
Table 3. Information summary of common deep learning semantic segmentation methods