• Opto-Electronic Engineering
  • Vol. 51, Issue 1, 230304-1 (2024)
Hao Hang, Yingping Huang*, Xurui Zhang, and Xin Luo
Author Affiliations
  • School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
  • show less
    DOI: 10.12086/oee.2024.230304 Cite this Article
    Hao Hang, Yingping Huang, Xurui Zhang, Xin Luo. Design of Swin Transformer for semantic segmentation of road scenes[J]. Opto-Electronic Engineering, 2024, 51(1): 230304-1 Copy Citation Text show less
    Network architecture
    Fig. 1. Network architecture
    Swin Transformer architecture
    Fig. 2. Swin Transformer architecture
    Swin Transformer block
    Fig. 3. Swin Transformer block
    Patch Merging module
    Fig. 4. Patch Merging module
    FCM module
    Fig. 5. FCM module
    AFM module
    Fig. 6. AFM module
    Comparison of segmentation effects of multiple methods in Cityscapes scenes
    Fig. 7. Comparison of segmentation effects of multiple methods in Cityscapes scenes
    Comparison of ablation experiment effects
    Fig. 8. Comparison of ablation experiment effects
    实验环境配置实验环境配置
    CPUAMD5600XdCPU核心数6
    GPUNVIDIA RTX3070主频3.7 GHz
    内存32 G显存11 G
    操作系统Ubuntu18.04编程语言Python 3.7
    深度学习框架Pytorch 1.10.0CUDA10.2
    Table 1. Experimental environment
    ClassesFCNPSPNetUNetDeepLabv3SwinTOurs
    Road97.198.098.098.198.098.1
    Sidewalk79.981.884.284.584.786.2
    Building89.391.191.191.791.491.6
    Wall44.248.248.751.254.455.5
    Fence48.350.351.553.657.359.9
    Pole30.645.748.250.355.557.2
    Traffic Light44.750.051.753.761.963.2
    Traffic Sign56.862.365.868.273.574.4
    Vegetation87.189.290.190.190.292.4
    Terrain60.462.865.364.261.363.2
    Sky90.894.293.895.394.295.1
    Person64.171.272.674.575.576.9
    Rider38.245.646.149.555.755.9
    Car90.492.092.292.693.893.5
    Truck51.368.563.474.473.672.5
    Bus72.080.377.683.279.479.9
    Train74.477.478.581.577.778.1
    Motocycle52.550.155.553.556.559.2
    Bicycle59.160.163.464.271.273.2
    MIoU/%64.9269.2870.4573.7173.1775.18
    Table 2. IoU and MIoU of various models on the Cityscapes dataset
    ClassesFCNPSPNetUNetDeepLabv3SwinTOurs
    Road98.198.598.899.199.199.1
    Sidewalk89.989.390.292.091.292.7
    Building96.394.796.196.296.596.8
    Wall52.272.160.773.171.472.3
    Fence60.369.368.572.571.474.6
    Pole36.674.759.274.374.177.7
    Traffic Light56.772.062.769.270.472.1
    Traffic Sign68.879.375.876.576.779.3
    Vegetation94.193.295.193.695.397.7
    Terrain74.479.878.378.179.280.3
    Sky95.897.297.897.597.597.9
    Person77.182.284.684.286.387.9
    Rider58.268.655.171.272.473.7
    Car96.496.096.296.397.697.6
    Truck62.379.576.475.573.576.2
    Bus85.087.389.691.785.687.7
    Train78.483.492.588.479.382.9
    Motocycle66.573.567.577.577.379.2
    Bicycle77.173.180.476.280.384.2
    MPA/%74.6479.9780.0682.3181.5984.83
    Table 3. PA and MPA of various models on the Cityscapes dataset
    方法MIoU/%MPA/%Param/MFLOPs/GFPS
    FCN64.9274.6434.9066.3858.61
    PSPNet69.2879.9751.86152.9781.25
    UNet70.4580.0649.10166.9254.52
    DeepLabv373.7182.3168.37235.3736.59
    SwinT73.1781.59121.25297.5712.22
    Ours75.1884.83123.77305.4614.83
    Table 4. Performance comparison of various semantic segmentation algorithms
    实验序号AFMFCMASPPMIoU/%MPA/%
    注:“√”表示网络中包含该结构,“×”表示在网络中去掉该结构。
    ×××73.181.6
    ××73.880.5
    ×74.983.3
    75.284.8
    Table 5. Ablation experiment
    Hao Hang, Yingping Huang, Xurui Zhang, Xin Luo. Design of Swin Transformer for semantic segmentation of road scenes[J]. Opto-Electronic Engineering, 2024, 51(1): 230304-1
    Download Citation