Qi Chao, Yandong Zhao, Shengbo Liu. Multi-modal-fusion-based 3D semantic segmentation algorithm[J]. Infrared and Laser Engineering, 2024, 53(5): 20240026

Search by keywords or author
- Infrared and Laser Engineering
- Vol. 53, Issue 5, 20240026 (2024)

Fig. 1. multi-modal network

Fig. 2. Image feature generation network

Fig. 3. Information loss during voxel downsampling

Fig. 4. Point cloud feature generation network

Fig. 5. Dynamic feature fusion module

Fig. 6. Visualization diagram of data augmentation strategy. (a) Shows the original data of the point cloud; (b) Shows the complete point cloud of the enhanced instance object tree; (c) Shows the perspective of the device during data collection after pasting the point cloud; (d) Shows the original image data, and the green dots represent the projection of the instance object tree point cloud to the image; (e) Shows the foreground image of trees; (f) Shows the pasting effect of the foreground image of trees (for the convenience of observing and taking the image of the pasting position); (g) Shows the points (green dots in the figure) that match the projection of the pasted tree point cloud and the image Mask; (h) Shows the points in the tree point cloud that do not match the image Mask after pasting (green dots in the figure); (i) Shows the points that match the tree point cloud and image after mapping correction
![GT-Paste[11] data augmentation diagram. (a) Shows the original point cloud scene; (b) Shows the pasted point cloud scene, where purple and red represent the points that need to be filtered for occlusion; (c) Shows the filtered point cloud scene; (d) Shows the original scene of the image; (e) Shows the pasted image scene; (f) Shows the image scene after processing occlusion relationships](/Images/icon/loading.gif)
Fig. 7. GT-Paste[11] data augmentation diagram. (a) Shows the original point cloud scene; (b) Shows the pasted point cloud scene, where purple and red represent the points that need to be filtered for occlusion; (c) Shows the filtered point cloud scene; (d) Shows the original scene of the image; (e) Shows the pasted image scene; (f) Shows the image scene after processing occlusion relationships

Fig. 8. The schematic diagram of the qualitative results of the model is shown in Figures (a) and (d), which represent the baseline (i.e. the first row of the ablation experiment) visualization of model false positives. Figures (b) and (e) represent the visualization of model false positives in the final model of this paper (i.e. the fourth row of the ablation experiment). Figures (c) and (f) show Ground Truth
|
Table 1. Performance comparison with other algorithms
|
Table 2. Ablation experiment
|
Table 3. Comparison of voxel feature extraction network effects
|
Table 4. Comparison of object detection result

Set citation alerts for the article
Please enter your email address