• Laser & Optoelectronics Progress
  • Vol. 58, Issue 12, 1200002 (2021)
Longfei Wang and Chunman Yan*
Author Affiliations
  • School of Physics and Electronic Engineering, Northwest Normal University, Lanzhou, Gansu 730030, China
  • show less
    DOI: 10.3788/LOP202158.1200002 Cite this Article Set citation alerts
    Longfei Wang, Chunman Yan. Review on Semantic Segmentation of Road Scenes[J]. Laser & Optoelectronics Progress, 2021, 58(12): 1200002 Copy Citation Text show less
    Development history of image semantic segmentation
    Fig. 1. Development history of image semantic segmentation
    Structural diagram of fully convolutional network[24]
    Fig. 2. Structural diagram of fully convolutional network[24]
    Semantic segmentation method based on strong supervision
    Fig. 3. Semantic segmentation method based on strong supervision
    Schematic of expansion convolution[25]. (a) Ordinary convolution; (b) expansion convolution with expansion rate of 2; (c) expansion convolution with expansion rate of 4
    Fig. 4. Schematic of expansion convolution[25]. (a) Ordinary convolution; (b) expansion convolution with expansion rate of 2; (c) expansion convolution with expansion rate of 4
    Structural diagram of SegNet network[52]
    Fig. 5. Structural diagram of SegNet network[52]
    Semantic segmentation method based on weak supervision and unsupervision
    Fig. 6. Semantic segmentation method based on weak supervision and unsupervision
    TypeAdvantageDisadvantage
    Strongly supervisedHigh segmentation accuracy based on densely annotated datasetsBeing excessively dependent on dataset marked by dense set, inability to migrate, and poor segmentation accuracy for unknown scenes
    Weakly supervisedOnly image-level annotated dataset required to complete trainingLarge number of datasets needed, long time, and lower accuracy than that of strong supervision
    UnsupervisedBeing independent on manual intensive annotation dataset and strongly adaptable to unknown environmentBeing difficult to adapt and no high segmentation accuracy at present
    Table 1. Advantage and disadvantage comparison among strongly supervised, weakly supervised and unsupervised semantic segmentation methods
    DatasetYearNumber of categoriesTotal amount of dataAreaEnvironment
    CamVid[86]200932700EuropeDay
    KITTI[87]201310Germany and AmericaDay
    Oxford Robotcar[88]20142×107OxfordAll weather conditions
    Cityscapes[89]20163420000Germany , Switzerland and FranceSpring, summer , and autumn
    SYNTHIA[90]20161113407Various scenes
    Comma.ai2016America
    Mapillary Vistas[91]20176625000America ,Europe ,Africa, Asia ,and OceaniaComplex weather
    Apollo Scape [92]201828143906ChinaComplex weather
    BDD100K[93]20181010000Multiple cities around the worldVarious scenes
    Udacity’s Driving[94]20183,89420,15000
    NuScenes20192314×105Boston and SingaporeDay
    D2-City201912ChinaComplex weather
    Waymo[95]20193000AmericaComplex weather
    Table 2. Common automatic driving datasets
    DatasetSummary
    KUL Belgium Traffic Sign[96]Dataset of traffic signs in Belgium
    German Traffic Sign[97]German traffic annotated dataset
    STSD[98]More than 20,000 images containing 3488 traffic signs
    LISA[99]7855 annotations with more than 6610 frames
    Tsinghua-Tencent 100K[100]Dataset with 100000 pictures, including 30000 traffic sign examples
    Table 3. Common traffic sign datasets
    MethodYearContribution
    Normalized cut2000Dividing graph into k subgraphs and then minimizing them
    Grab cut2004Using image texture and boundary information dependent on small amount of manual intervention to obtain better foreground and background segmentation
    GPB-UCM2011Using probability of each pixel as an edge, detecting target contour, generating contour map, and completing segmentation with complex steps and high complexity
    Random Decision Forest2016Combining multiple decision trees into classifier
    MCG2017On basis of GPS-UCM, using generated multiple contour segmentation blocks when combined with random forest classifier to get prediction object
    Table 4. Analysis and summary of traditional image semantic segmentation methods[9]
    MethodModelYearKey technologyPGMDatasetmloU /%
    Method based on enlarging receptive fieldMethod based on dilated convolutionDeepLab v12014Upsampling and structure predictionCRFPASCAL VOC 2012, Cityscapes71.6, 63.1
    ENet2016Decomposition filter and dilated convolutionCityscapes, CamVid58.3, 51.3
    DRN2017Dilated convolution
    Method based on optimizing convolution structureDeformable2017Deformable convolutionPASCAL VOC 201275.3
    MobileNet V12017Depth separable convolutionCOCO70.6
    MobileNet V22018Improved depth separable convolutionCOCO71.7
    TuSimple2018Upsampling convolution and mixed dilated convolutionPASCAL VOC 201283.1
    Method based on probability graphical modelDSM2016Modeling CRF through CNNCRFPASCAL VOC 201278.0
    C&G2016Embedding CRF into CNNCRFPASCAL VOC 201278.1
    DPN2015Integrating CNN with MRFMRFPASCAL VOC 201277.5
    QO2016Quadratic optimizationG-CRFPASCAL VOC 201280.2
    HOCRF+2016Embedding CRF into CNNHOCRFPASCAL VOC 201277.9
    Method based on feature fusionMethod based on ASPPDeepLab v32017Improved dilated convolutionand improved ASPPCRFPASCAL VOC 201286.9
    DeepLab v3+2018ASPP module with separable convolution and skip join fusion of different level featuresPASCAL VOC 2012, Cityscapes89.0, 82.1
    ICNet2017Cascaded model and feature fusionCityscapes, CamVid70.6, 67.1
    DenseASPP2018ASPP and densely connected networks to improve receptive fieldCityscapes80.6
    DMNet2019Dynamic convolution module and context-aware correlation filterPASCAL VOC 201284.4
    APCNet2019GLA and ACMPASCAL VOC 201284.2
    Method based on attention mechanismPSANet2018Attention mechanismPASCAL VOC 2012, Cityscapes85.7, 80.1
    CCNet2018Dilated convolution and feature weighted fusionCityscapes81.4
    BiseNet2018Spatial path and context pathCityscapes, CamVid78.9, 68.7
    ACNet2019Three parallel branch architecture and attention assistant module integrating attention mechanismNYUDv248.3
    DANet2019Dilated convolution,deconvolution and feature weighted fusionPASCAL VOC 2012, Cityscapes82.6, 81.5
    MethodModelYearKey technologyPGMDatasetmloU /%
    Method based on encoding and decodingSegNet2015Deconvolution, upsampling and dropout layerCamVid55.6
    DeconvNet2015Deconvolution and unpoolingPASCAL VOC 201269.6
    RefineNet2017Bilinear interpolation skip join and residual joinCityscapes73.6
    GCN+2017Large kernel convolution and global convolution networkPASCAL VOC 2012, Cityscapes82.2, 76.9
    DFANet2019Deep feature polymerization networkCityscapes, CamVid70.3, 64.7
    DUpsampling2019Fusion of different resolution featuresPASCAL VOC 201288.1
    SDN2019Capturing multi-scale context information to ensure fine recovery of target location informationPASCAL VOC 2012, CamVid86.6, 71.8
    Method based on RNNrCNN2014Multi size input windowSIFT Flow
    2D-LSTM2015Four different directions of RNNSIFT Flow
    ReSeg2016Extending of ReNet functionCamVid
    Method based on GAN2016GAN adversarial trainingPASCAL VOC 201254.3
    2016GAN domain adaptationCityscapes67.8
    Table 5. Analysis and summary of image semantic segmentation method based on strong supervision
    ModelParameterTime /msmloU /%
    FCN-850063.1
    DeepLab250.8400063.1
    SegNet29.589.257
    CRF-RNN70074.7
    ENet0.4135.457
    DeepLab v244400070.4
    PSPNet250.8128881.2
    DUC + HDC90080.1
    DenseASPP28.650080.6
    ESPNet0.460.3
    BiSeNet15.81368.4
    BiSeNet2492174.7
    DeepLab v3+200+60082.1
    ICNet26.53369.5
    DAFNet7.81071.3
    Table 6. Speed analysis of algorithms[5]
    Supervision informationModelYearKey technologyPGMDatasetmIoU /%
    Frame levelBoxSup2015MCGPASCAL VOC 2012/PASCAL-CONTEXT75.2/40.5
    DeepCut2016CRFCRF
    Scribble levelWTP2016ObjectnessPASCAL VOC 201249.1
    ScribbleSup2015HyperpixelCRFPASCAL VOC 201271.3
    Image levelMIL2015MCGImageNet42.0
    CCNN2015Class SizePASCAL VOC 201242.4
    SEC2016Saliency detection algorithmCRFPASCAL VOC 201250.7
    STC2015Saliency detection algorithmCRFPASCAL VOC 201249.8
    AugFeed2016MCGCRFPASCAL VOC 201254.34
    EM2017Saliency detection algorithmCRFPASCAL VOC 201258.71
    Image level and pixel levelDecoupled2015CRFPASCAL VOC 201266.6
    Image level, frame level and pixel levelWeaklySemi2015CRFPASCAL VOC 201273.9
    Table 7. Analysis and summary of image semantic segmentation method based on weak supervision[5]
    ModelYearKey technologyDatasetmIoU /%
    FCNWild2016Domain adaptive full convolution adversarial trainingCityscapes27.1
    ADDA2017Adversarial trainingNYU Depth v2
    FCAN2018Image domain adaptive network and feature adaptive networkCityscapes47.75
    Table 8. Analysis and summary of image semantic segmentation method based on unsupervision[9]
    Longfei Wang, Chunman Yan. Review on Semantic Segmentation of Road Scenes[J]. Laser & Optoelectronics Progress, 2021, 58(12): 1200002
    Download Citation