• Infrared and Laser Engineering
  • Vol. 53, Issue 5, 20240026 (2024)
Qi Chao, Yandong Zhao, and Shengbo Liu
Author Affiliations
  • School of Engineering, Beijing Forestry University, Beijing 100080, China
  • show less
    DOI: 10.3788/IRLA20240026 Cite this Article
    Qi Chao, Yandong Zhao, Shengbo Liu. Multi-modal-fusion-based 3D semantic segmentation algorithm[J]. Infrared and Laser Engineering, 2024, 53(5): 20240026 Copy Citation Text show less
    multi-modal network
    Fig. 1. multi-modal network
    Image feature generation network
    Fig. 2. Image feature generation network
    Information loss during voxel downsampling
    Fig. 3. Information loss during voxel downsampling
    Point cloud feature generation network
    Fig. 4. Point cloud feature generation network
    Dynamic feature fusion module
    Fig. 5. Dynamic feature fusion module
    Visualization diagram of data augmentation strategy. (a) Shows the original data of the point cloud; (b) Shows the complete point cloud of the enhanced instance object tree; (c) Shows the perspective of the device during data collection after pasting the point cloud; (d) Shows the original image data, and the green dots represent the projection of the instance object tree point cloud to the image; (e) Shows the foreground image of trees; (f) Shows the pasting effect of the foreground image of trees (for the convenience of observing and taking the image of the pasting position); (g) Shows the points (green dots in the figure) that match the projection of the pasted tree point cloud and the image Mask; (h) Shows the points in the tree point cloud that do not match the image Mask after pasting (green dots in the figure); (i) Shows the points that match the tree point cloud and image after mapping correction
    Fig. 6. Visualization diagram of data augmentation strategy. (a) Shows the original data of the point cloud; (b) Shows the complete point cloud of the enhanced instance object tree; (c) Shows the perspective of the device during data collection after pasting the point cloud; (d) Shows the original image data, and the green dots represent the projection of the instance object tree point cloud to the image; (e) Shows the foreground image of trees; (f) Shows the pasting effect of the foreground image of trees (for the convenience of observing and taking the image of the pasting position); (g) Shows the points (green dots in the figure) that match the projection of the pasted tree point cloud and the image Mask; (h) Shows the points in the tree point cloud that do not match the image Mask after pasting (green dots in the figure); (i) Shows the points that match the tree point cloud and image after mapping correction
    GT-Paste[11] data augmentation diagram. (a) Shows the original point cloud scene; (b) Shows the pasted point cloud scene, where purple and red represent the points that need to be filtered for occlusion; (c) Shows the filtered point cloud scene; (d) Shows the original scene of the image; (e) Shows the pasted image scene; (f) Shows the image scene after processing occlusion relationships
    Fig. 7. GT-Paste[11] data augmentation diagram. (a) Shows the original point cloud scene; (b) Shows the pasted point cloud scene, where purple and red represent the points that need to be filtered for occlusion; (c) Shows the filtered point cloud scene; (d) Shows the original scene of the image; (e) Shows the pasted image scene; (f) Shows the image scene after processing occlusion relationships
    The schematic diagram of the qualitative results of the model is shown in Figures (a) and (d), which represent the baseline (i.e. the first row of the ablation experiment) visualization of model false positives. Figures (b) and (e) represent the visualization of model false positives in the final model of this paper (i.e. the fourth row of the ablation experiment). Figures (c) and (f) show Ground Truth
    Fig. 8. The schematic diagram of the qualitative results of the model is shown in Figures (a) and (d), which represent the baseline (i.e. the first row of the ablation experiment) visualization of model false positives. Figures (b) and (e) represent the visualization of model false positives in the final model of this paper (i.e. the fourth row of the ablation experiment). Figures (c) and (f) show Ground Truth
    MethodmIoUCarTruckPedestrianBicycleRoadMotorcycleBarriesVegetationSpeed/ms
    SquSegv3[24]53.892.836.863.425.791.121.114.285.197
    KPconv[4]58.293.537.771.939.489.723.525.184.8
    (AF)2S3Net[14]62.093.241.673.145.590.639.926.086.7270
    SPVCNN[8]63.395.844.874.442.191.346.428.687.563
    Fus3DSeg[13]64.396.148.167.343.793.048.130.288.3
    Ours66.794.149.679.347.890.952.631.288.488
    Table 1. Performance comparison with other algorithms
    DepthestimateVPSnetworkDFMPointaugmentmIoUCarPedestrianVegetation
    <25 m>25 m<25 m>25 m<25 m>25 m
    62.895.286.479.362.790.379.8
    $ \surd $$ \surd $64.497.189.881.267.491.182.4
    $ \surd $$ \surd $64.697.389.982.869.691.282.1
    $ \surd $$ \surd $$ \surd $$ \surd $66.797.690.285.373.392.383.5
    Table 2. Ablation experiment
    CarPedestrianVegetation
    CN93.275.787.1
    VPS94.179.388.4
    Table 3. Comparison of voxel feature extraction network effects
    methodmAP
    Baseline55.4%
    Baseline + GT-Paste[11]57.0%
    Baseline + PointAugment57.2%
    Table 4. Comparison of object detection result