Real-time urban street view semantic segmentation based on cross-layer aggregation network

Zhiqiang HOU; Minjie CHENG; Sugang MA; Minjie QU; Xiaobao YANG

doi:10.37188/OPE.20243208.1212

Journals >Optics and Precision Engineering >Volume 32 >Issue 8 >Page 1212 > Article

Optics and Precision Engineering
Vol. 32, Issue 8, 1212 (2024)

Real-time urban street view semantic segmentation based on cross-layer aggregation network

Zhiqiang HOU^1,2, Minjie CHENG^1,2,*, Sugang MA^1,2, Minjie QU^1,2, and Xiaobao YANG^1,2

Author Affiliations

¹Xi'an University of Posts and Telecommunications， Institute of Computer， Xi'an702， China

²Xi'an University of Posts and Telecommunications， Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing， Xi'an71011， China

show less

DOI: 10.37188/OPE.20243208.1212 Cite this Article

Zhiqiang HOU, Minjie CHENG, Sugang MA, Minjie QU, Xiaobao YANG. Real-time urban street view semantic segmentation based on cross-layer aggregation network[J]. Optics and Precision Engineering, 2024, 32(8): 1212 Copy Citation Text

show less

Fig. 1. Overall structure of CLANet （Cross-Layer Aggregation Network， CLANet）

Download full size | View in the Article

Fig. 2. Cross-Layer Aggregation Module

Download full size | View in the Article

Fig. 3. Comparison diagram between DAPPM and CLA-PPM

Download full size | View in the Article

Fig. 4. Multi-Scale Fusion Module， MSFM

Download full size | View in the Article

Fig. 5. Accuracy-speed comparison on the Cityscapes test set

Download full size | View in the Article

Fig. 6. Accuracy-speed comparison on the CamVid test set

Download full size | View in the Article

Fig. 7. Visual segmentation results of the Cityscapes dataset

Download full size | View in the Article

Fig. 8. Visual segmentation results of the CamVid dataset

Download full size | View in the Article

Baseline	DAPPM	CLA-PPM		FLOPs/G	Params/M	mIoU/%	Speed/FPS
Baseline	DAPPM	Sparsity=2	Sparsity=3	FLOPs/G	Params/M	mIoU/%	Speed/FPS
√				97.7	14.2	74.5	96
√	√			98.8	15.53	75.2	83
√		√		98.0	14.68	75.0	89
√			√	97.9	14.52	74.9	90
√		√	√	98.2	15.0	75.3	85

Table 1. Ablation study of CLA-PPM on the Cityscapes validation

View in the Article

Baseline	CLAM	CLA-PPM	MSFM	FLOPs/G	Params/M	mIoU/%	Speed /FPS
√				97.7	14.2	74.5	96
√	√			103.1	14.5	75.2	86
√		√		98.2	15.0	75.3	85
√			√	98.3	14.22	74.8	93
√	√	√		103.6	15.32	75.7	77
√		√	√	98.8	15.04	75.5	83
√	√		√	103.7	14.53	75.4	84
√	√	√	√	104.3	15.35	76.0	75

Table 2. Ablation study of CLA-Net on the Cityscapes validation

View in the Article

Method	Reference	Resolution	mIoU/%		#FPS （PyTorch）	#FPS （TensorRT）
Method	Reference	Resolution	Val	Test	#FPS （PyTorch）	#FPS （TensorRT）
GAS^［19］	CVPR2020	769×1 537	—	71.8	108.4	—
HMSeg^［20］	BMVC2020	768×1 536	—	74.3	83.2	—
DCNet^［21］	ICPR2021	512×1 024	—	71.2	142	—
HyperSeg-M^［22］	CVPR2021	1 024×2 048	76.2	75.8	36.9	—
RELAXNet^［23］	Neurocomputing2022	512×1 024	—	74.8	64	—
FPANet C^［24］	APPL INTELL2022	1 024×2 048	—	75.9	31	—
BiAttnNet^［25］	SPL2022	512 × 1 024	—	74.7	89.2	—
LETNet^［26］	T-ITS2023	512×1 024	—	72.8	150	—
SRDENet^［27］	IET COMPUT VIS2023	512×1 024	—	75.4	65	—
BiSeNetV2^［28］	IJCV2021	512×1 024	73.4	72.6	—	156
BiSeNetV2-L^［28］	IJCV2021	512×1 024	75.8	75.3	—	47.3
FasterSeg^［29］	arXiv2019	1 024×2 048	73.1	71.5	—	163.9
STDC1-Seg50^［9］	CVPR2021	512×1 024	72.2	71.9	—	250.4
STDC1-Seg75^［9］	CVPR2021	768×1 536	74.5	75.3	—	126.7
CPANet-T50^［30］	CAC 2022	512 × 1 024	—	72.5	—	234.5
BiSeNetV3-50^［31］	Neurocomputing2023	512×1 024	73.4	73.5	—	244.3
CLANet-50（Ours）	—	512×1 024	73.3	73.0	143	294
CLANet-75（Ours）	—	768×1 536	76.0	75.8	75	164

Table 3. Comparison of accuracy and speed of Cityscapes

View in the Article

Method	Reference	Resolution	mIoU/%	#FPS （PyTorch）	#FPS （TensorRT）
CAS^［32］	CVPR2019	720×960	71.8	169	—
GAS^［19］	CVPR2020	720×960	72.8	153.1	—
DSANet^［33］	ExpertSyst. Appl. 2021	720×960	69.9	75.3	—
FSFNet^［34］	IEEE T INSTRUM MEAS2021	720×960	75.1	91	—
RELAXNet^［23］	Neurocomputing2022	720×960	71.2	79	—
FPANet B^［24］	APPL INTELL2022	720×960	72.9	88	—
LETNet^［26］	T-ITS2023	720×960	70.5	200	—
SRDENet^［27］	IET COMPUT VIS2023	720×960	74.8	78.3	—
BiSeNetV2^［28］	IJCV2021	720×960	72.4	—	124.5
BiSeNetV2-L^［28］	IJCV2021	720×960	73.2	—	32.7
STDC1-Seg50^［9］	CVPR2021	720×960	73.0	—	197.6
CPANet-T^［30］	CAC 2022	720×960	73.9	—	213.9
BiSeNetV3^［31］	Neurocomputing2023	720×960	75.1	—	198.4
CLANet（Ours）	—	720×960	74.8	116	239

Table 4. Comparison of accuracy and speed of CamVid

Download Citation

Save the article for my favorites

Paper Information

微信扫一扫：分享

微信扫一扫：分享