Semantic Segmentation for Road Scene Based on Multiscale Feature Fusion

Qingming Yi; Wenting Zhang; Min Shi; Jialin Shen; Aiwen Luo

doi:10.3788/LOP220914

Abstract

A lightweight network model based on multiscale feature information fusion (MIFNet) is developed in this study owing to the imbalance among the parameter amount, inference speed, and accuracy in many existing semantic segmentation network models. The MIFNet is constructed on the encoding-decoding architecture. In the encoding part, the split strategy and asymmetric convolution are flexibly applied to design lightweight bottleneck structure for feature extraction. The spatial attention mechanism and Laplace edge detection operator are introduced to fuse spatial and edge information to obtain rich feature information. In the decoding part, a new decoder is designed by introducing a channel attention mechanism to recover the size and detail information of the feature map for a complete semantic segmentation task. The MIFNet achieves accuracies of 73.1% and 67.7% on the Cityscapes and CamVid test sets, respectively, with only approximately 0.82 M parameters. Correspondingly, it reaches up to 73.68 frame/s and 85.16 frame/s inference speed, respectively using a single GTX 1080Ti GPU. The results show that the method achieves a good balance in terms of the parameter amount, inference speed, and accuracy, yielding a lightweight, fast, and accurate semantic segmentation.