• Laser & Optoelectronics Progress
  • Vol. 56, Issue 15, 150003 (2019)
Xiangfu Zhang, Jian Liu*, Zhangsong Shi, Zhonghong Wu, and Zhi Wang
Author Affiliations
  • College of Weapons Engineering, Naval University of Engineering, Wuhan, Hubei 430032, China
  • show less
    DOI: 10.3788/LOP56.150003 Cite this Article Set citation alerts
    Xiangfu Zhang, Jian Liu, Zhangsong Shi, Zhonghong Wu, Zhi Wang. Review of Deep Learning-Based Semantic Segmentation[J]. Laser & Optoelectronics Progress, 2019, 56(15): 150003 Copy Citation Text show less
    Diagram of basic composition of standard convolutional neural network
    Fig. 1. Diagram of basic composition of standard convolutional neural network
    Diagram of max pooling process
    Fig. 2. Diagram of max pooling process
    Structural diagram of VGGNet model
    Fig. 3. Structural diagram of VGGNet model
    Inception module in GoogLeNet network
    Fig. 4. Inception module in GoogLeNet network
    Residual module in ResNet network
    Fig. 5. Residual module in ResNet network
    Partial dataset images and corresponding semantic segmentation effect diagrams. (a) PASCAL VOC 2012; (b) PASCAL-CONTEXT; (c) MICROSOFT COCO; (d) CITYSCAPES
    Fig. 6. Partial dataset images and corresponding semantic segmentation effect diagrams. (a) PASCAL VOC 2012; (b) PASCAL-CONTEXT; (c) MICROSOFT COCO; (d) CITYSCAPES
    Classification of common deep learning semantic segmentation methods
    Fig. 7. Classification of common deep learning semantic segmentation methods
    FCN network processing diagram
    Fig. 8. FCN network processing diagram
    Structural diagram of SegNet model
    Fig. 9. Structural diagram of SegNet model
    Effects of using CRF tuning iterations in DeepLab. (a) GT; (b) CNNout; (c) CRFit1; (d) CRFit2; (e) CRFit10
    Fig. 10. Effects of using CRF tuning iterations in DeepLab. (a) GT; (b) CNNout; (c) CRFit1; (d) CRFit2; (e) CRFit10
    Comparison of three models of CRFasRNN, FCN-8s, and DeepLab
    Fig. 11. Comparison of three models of CRFasRNN, FCN-8s, and DeepLab
    Diagram of pyramid pooling module in PSPNet
    Fig. 12. Diagram of pyramid pooling module in PSPNet
    Diagram of multi-scale CNN network architecture proposed by Roy
    Fig. 13. Diagram of multi-scale CNN network architecture proposed by Roy
    Structural diagram of ReSeg model
    Fig. 14. Structural diagram of ReSeg model
    Diagram of GRU calculation process
    Fig. 15. Diagram of GRU calculation process
    ItemLeNet5AlexNetVGGNetGoogLeNetResNet
    Year19942012201420142015
    Layer781922152
    Conv251621151
    Kernel size511,5,337,1,3,57,1,3,5
    Linear33311
    Linear size120,84,104096,4096,10004096,4096,100010001000
    Activation functionSigmoidReLUReLUReLUReLU
    ClassifierMulti-layerperceptionSoftmaxSoftmaxSoftmaxSoftmax
    Data augment×
    Bath normalization××××
    Local responsenormalization×××
    Graphicsprocessing unit×
    Inception××××
    Dropout×
    TOP-5(error)N/A16.4%7.32%6.67%3.57%
    Table 1. Information summary of common image classification networks
    DatasetClassesSample(training)Sample(validation)Sample(test)PurposeYear
    PASCAL VOC 2012[18]21146414491452Generic2012
    PASCAL VOC 2012+[19]211058214491452Generic2014
    PASCAL-CONTEXT[20]54049985105-Generic2014
    PASCAL-PERSON-PART[20]61716-1817Person2014
    PASCAL-COW-PART[21]4294-227Cow2015
    SBD[22]2184982857-Generic2011
    MICROSOFT COCO[23]80+827834050481434Generic2014
    CITYSCAPES(fine)[24]1929755001525Urban2015
    CITYSCAPES(coarse)[24]1922973500-Urban2015
    CAMVID[25-26]32361100233Driving2009
    KITTI-Ros[27]11170-46Driving2015
    KITTI-Zhang[28]10140-112Driving2015
    Table 2. Information summary of common semantic segmentation datasets
    Model nameYearArchitectureAccuracyEfficiencyTrainingContribution
    FCN[32]2015VGG-16(FCN)CCCForerunner
    SegNet[33]2017VGG-16 + DecoderABCEncoder-decoder
    DeepLab[34-37]2017VGG-16 + ResNet-101ACCStandalone CRF,Atrous convolutions
    CRFasRNN[38]2015FCN-8sCBACRF reformulated as RNN
    ParseNet[39]2015VGG-16ACCGlobal context feature fusion
    SharpMask [40]2016DeepMaskACCTop-down refinement module
    PSPNet[41]2016ResNet-101ABCPyramid pooling module
    Multi-scale-CNN-Raj[42]2015VGG-16(FCN)ACCMulti-scale architecture
    Multi-scale-CNN-Eigen[43]2015CustomACCMulti-scalesequential refinement
    Multi-scale-CNN-Roy[44]2016Multi-scale-CNN-EigenACCMulti-scale coarse-to-fine refinement
    Multi-scale-CNN-Bian[45]2016FCNBCBIndependently trainedMulti-scale FCNs
    ReSeg[46]2016VGG-16 + ReNetBCCExtension of ReNet tosemantic segmentation
    LSTM-CF[47]2016Fast R-CNN +DeepMaskACCFusion of contextualinformationfrom multiple sources
    RCNN[48]2014MDRNNABCDifferent input sizes,image context
    2D-LSTM[49]2015MDRNNBBCImage context modelling
    DAG-RNN [50]2015Elman networkACCGraph image structurefor context modelling
    MINC-CNN[51]2015GoogLeNet(FCN)CCCPatchwise CNN,Standalone CRF
    DeepMask[52]2015VGG-AACCProposals generationfor segmentation
    Table 3. Information summary of common deep learning semantic segmentation methods
    Xiangfu Zhang, Jian Liu, Zhangsong Shi, Zhonghong Wu, Zhi Wang. Review of Deep Learning-Based Semantic Segmentation[J]. Laser & Optoelectronics Progress, 2019, 56(15): 150003
    Download Citation