Review on Semantic Segmentation of Road Scenes

Longfei Wang; Chunman Yan

doi:10.3788/LOP202158.1200002

Type

Advantage

Disadvantage

Strongly supervised

High segmentation accuracy based on densely annotated datasets

Being excessively dependent on dataset marked by dense set, inability to migrate, and poor segmentation accuracy for unknown scenes

Weakly supervised

Only image-level annotated dataset required to complete training

Large number of datasets needed, long time, and lower accuracy than that of strong supervision

Unsupervised

Being independent on manual intensive annotation dataset and strongly adaptable to unknown environment

Being difficult to adapt and no high segmentation accuracy at present

Dataset

Year

Number of categories

Total amount of data

Area

Environment

CamVid^[86]

2009

700

Europe

Day

KITTI^[87]

2013

Germany and America

Day

Oxford Robotcar^[88]

2014

2×10⁷

Oxford

All weather conditions

Cityscapes^[89]

2016

20000

Germany , Switzerland and France

Spring, summer , and autumn

SYNTHIA^[90]

2016

13407

Various scenes

Comma.ai

2016

America

Mapillary Vistas^[91]

2017

25000

America ,Europe ,Africa, Asia ,and Oceania

Complex weather

Apollo Scape ^[92]

2018

143906

China

Complex weather

BDD100K^[93]

2018

10000

Multiple cities around the world

Various scenes

Udacity’s Driving^[94]

2018

3,8

9420,15000

NuScenes

2019

14×10⁵

Boston and Singapore

Day

D²-City

2019

China

Complex weather

Waymo^[95]

2019

3000

America

Complex weather

Dataset

Summary

KUL Belgium Traffic Sign^[96]

Dataset of traffic signs in Belgium

German Traffic Sign^[97]

German traffic annotated dataset

STSD^[98]

More than 20,000 images containing 3488 traffic signs

LISA^[99]

7855 annotations with more than 6610 frames

Tsinghua-Tencent 100K^[100]

Dataset with 100000 pictures, including 30000 traffic sign examples

Method

Year

Contribution

Normalized cut

2000

Dividing graph into k subgraphs and then minimizing them

Grab cut

2004

Using image texture and boundary information dependent on small amount of manual intervention to obtain better foreground and background segmentation

GPB-UCM

2011

Using probability of each pixel as an edge, detecting target contour, generating contour map, and completing segmentation with complex steps and high complexity

Random Decision Forest

2016

Combining multiple decision trees into classifier

MCG

2017

On basis of GPS-UCM, using generated multiple contour segmentation blocks when combined with random forest classifier to get prediction object

Method

Model

Year

Key technology

PGM

Dataset

mloU /%

Method based on enlarging receptive field

Method based on dilated convolution

DeepLab v1

2014

Upsampling and structure prediction

CRF

PASCAL VOC 2012, Cityscapes

71.6, 63.1

ENet

2016

Decomposition filter and dilated convolution

Cityscapes, CamVid

58.3, 51.3

DRN

2017

Dilated convolution

Method based on optimizing convolution structure

Deformable

2017

Deformable convolution

PASCAL VOC 2012

75.3

MobileNet V1

2017

Depth separable convolution

COCO

70.6

MobileNet V2

2018

Improved depth separable convolution

COCO

71.7

TuSimple

2018

Upsampling convolution and mixed dilated convolution

PASCAL VOC 2012

83.1

Method based on probability graphical model

DSM

2016

Modeling CRF through CNN

CRF

PASCAL VOC 2012

78.0

C&G

2016

Embedding CRF into CNN

CRF

PASCAL VOC 2012

78.1

DPN

2015

Integrating CNN with MRF

MRF

PASCAL VOC 2012

77.5

2016

Quadratic optimization

G-CRF

PASCAL VOC 2012

80.2

HOCRF+

2016

Embedding CRF into CNN

HOCRF

PASCAL VOC 2012

77.9

Method based on feature fusion

Method based on ASPP

DeepLab v3

2017

Improved dilated convolutionand improved ASPP

CRF

PASCAL VOC 2012

86.9

DeepLab v3+

2018

ASPP module with separable convolution and skip join fusion of different level features

PASCAL VOC 2012, Cityscapes

89.0, 82.1

ICNet

2017

Cascaded model and feature fusion

Cityscapes, CamVid

70.6, 67.1

DenseASPP

2018

ASPP and densely connected networks to improve receptive field

Cityscapes

80.6

DMNet

2019

Dynamic convolution module and context-aware correlation filter

PASCAL VOC 2012

84.4

APCNet

2019

GLA and ACM

PASCAL VOC 2012

84.2

Method based on attention mechanism

PSANet

2018

Attention mechanism

PASCAL VOC 2012, Cityscapes

85.7, 80.1

CCNet

2018

Dilated convolution and feature weighted fusion

Cityscapes

81.4

BiseNet

2018

Spatial path and context path

Cityscapes, CamVid

78.9, 68.7

ACNet

2019

Three parallel branch architecture and attention assistant module integrating attention mechanism

NYUDv2

48.3

DANet

2019

Dilated convolution,deconvolution and feature weighted fusion

PASCAL VOC 2012, Cityscapes

82.6, 81.5

Method

Model

Year

Key technology

PGM

Dataset

mloU /%

Method based on encoding and decoding

SegNet

2015

Deconvolution, upsampling and dropout layer

CamVid

55.6

DeconvNet

2015

Deconvolution and unpooling

PASCAL VOC 2012

69.6

RefineNet

2017

Bilinear interpolation skip join and residual join

Cityscapes

73.6

GCN+

2017

Large kernel convolution and global convolution network

PASCAL VOC 2012, Cityscapes

82.2, 76.9

DFANet

2019

Deep feature polymerization network

Cityscapes, CamVid

70.3, 64.7

DUpsampling

2019

Fusion of different resolution features

PASCAL VOC 2012

88.1

SDN

2019

Capturing multi-scale context information to ensure fine recovery of target location information

PASCAL VOC 2012, CamVid

86.6, 71.8

Method based on RNN

rCNN

2014

Multi size input window

SIFT Flow

2D-LSTM

2015

Four different directions of RNN

SIFT Flow

ReSeg

2016

Extending of ReNet function

CamVid

Method based on GAN

2016

GAN adversarial training

PASCAL VOC 2012

54.3

2016

GAN domain adaptation

Cityscapes

67.8

Model

Parameter

Time /ms

mloU /%

FCN-8

500

63.1

DeepLab

250.8

4000

63.1

SegNet

29.5

89.2

CRF-RNN

700

74.7

ENet

0.4

135.4

DeepLab v2

4000

70.4

PSPNet

250.8

1288

81.2

DUC + HDC

900

80.1

DenseASPP

28.6

500

80.6

ESPNet

0.4

60.3

BiSeNet1

5.8

68.4

BiSeNet2

74.7

DeepLab v3+

200+

600

82.1

ICNet

26.5

69.5

DAFNet

7.8

71.3

Supervision information

Model

Year

Key technology

PGM

Dataset

mIoU /%

Frame level

BoxSup

2015

MCG

PASCAL VOC 2012/PASCAL-CONTEXT

75.2/40.5

DeepCut

2016

CRF

Scribble level

WTP

2016

Objectness

PASCAL VOC 2012

49.1

ScribbleSup

2015

Hyperpixel

CRF

PASCAL VOC 2012

71.3

Image level

MIL

2015

MCG

ImageNet

42.0

CCNN

2015

Class Size

PASCAL VOC 2012

42.4

SEC

2016

Saliency detection algorithm

CRF

PASCAL VOC 2012

50.7

STC

2015

Saliency detection algorithm

CRF

PASCAL VOC 2012

49.8

AugFeed

2016

MCG

CRF

PASCAL VOC 2012

54.34

2017

Saliency detection algorithm

CRF

PASCAL VOC 2012

58.71

Image level and pixel level

Decoupled

2015

CRF

PASCAL VOC 2012

66.6

Image level, frame level and pixel level

WeaklySemi

2015

CRF

PASCAL VOC 2012

73.9

Model

Year

Key technology

Dataset

mIoU /%

FCNWild

2016

Domain adaptive full convolution adversarial training

Cityscapes

27.1

ADDA

2017

Adversarial training

NYU Depth v2

FCAN

2018

Image domain adaptive network and feature adaptive network

Cityscapes

47.75

Longfei Wang, Chunman Yan. Review on Semantic Segmentation of Road Scenes[J]. Laser & Optoelectronics Progress, 2021, 58(12): 1200002

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information