Multi-modal-fusion-based 3D semantic segmentation algorithm

Qi Chao; Yandong Zhao; Shengbo Liu

doi:10.3788/IRLA20240026

Journals >Infrared and Laser Engineering >Volume 53 >Issue 5 >Page 20240026 > Article

Infrared and Laser Engineering
Vol. 53, Issue 5, 20240026 (2024)

Multi-modal-fusion-based 3D semantic segmentation algorithm

Qi Chao, Yandong Zhao, and Shengbo Liu

Author Affiliations

School of Engineering, Beijing Forestry University, Beijing 100080, China

show less

DOI: 10.3788/IRLA20240026 Cite this Article

Qi Chao, Yandong Zhao, Shengbo Liu. Multi-modal-fusion-based 3D semantic segmentation algorithm[J]. Infrared and Laser Engineering, 2024, 53(5): 20240026 Copy Citation Text

show less

Fig. 1. multi-modal network

Download full size | View in the Article

Fig. 2. Image feature generation network

Download full size | View in the Article

Fig. 3. Information loss during voxel downsampling

Download full size | View in the Article

Fig. 4. Point cloud feature generation network

Download full size | View in the Article

Fig. 5. Dynamic feature fusion module

Download full size | View in the Article

Fig. 6. Visualization diagram of data augmentation strategy. (a) Shows the original data of the point cloud; (b) Shows the complete point cloud of the enhanced instance object tree; (c) Shows the perspective of the device during data collection after pasting the point cloud; (d) Shows the original image data, and the green dots represent the projection of the instance object tree point cloud to the image; (e) Shows the foreground image of trees; (f) Shows the pasting effect of the foreground image of trees (for the convenience of observing and taking the image of the pasting position); (g) Shows the points (green dots in the figure) that match the projection of the pasted tree point cloud and the image Mask; (h) Shows the points in the tree point cloud that do not match the image Mask after pasting (green dots in the figure); (i) Shows the points that match the tree point cloud and image after mapping correction

Download full size | View in the Article

Fig. 7. GT-Paste^[11] data augmentation diagram. (a) Shows the original point cloud scene; (b) Shows the pasted point cloud scene, where purple and red represent the points that need to be filtered for occlusion; (c) Shows the filtered point cloud scene; (d) Shows the original scene of the image; (e) Shows the pasted image scene; (f) Shows the image scene after processing occlusion relationships

Download full size | View in the Article

Fig. 8. The schematic diagram of the qualitative results of the model is shown in Figures (a) and (d), which represent the baseline (i.e. the first row of the ablation experiment) visualization of model false positives. Figures (b) and (e) represent the visualization of model false positives in the final model of this paper (i.e. the fourth row of the ablation experiment). Figures (c) and (f) show Ground Truth

Download full size | View in the Article

Method	mIoU	Car	Truck	Pedestrian	Bicycle	Road	Motorcycle	Barries	Vegetation	Speed/ms
SquSegv3^[24]	53.8	92.8	36.8	63.4	25.7	91.1	21.1	14.2	85.1	97
KPconv^[4]	58.2	93.5	37.7	71.9	39.4	89.7	23.5	25.1	84.8	−
(AF)2S3Net^[14]	62.0	93.2	41.6	73.1	45.5	90.6	39.9	26.0	86.7	270
SPVCNN^[8]	63.3	95.8	44.8	74.4	42.1	91.3	46.4	28.6	87.5	63
Fus3DSeg^[13]	64.3	96.1	48.1	67.3	43.7	93.0	48.1	30.2	88.3	−
Ours	66.7	94.1	49.6	79.3	47.8	90.9	52.6	31.2	88.4	88

Table 1. Performance comparison with other algorithms

View in the Article

Depthestimate	VPSnetwork	DFM	Pointaugment	mIoU	Car		Pedestrian		Vegetation
Depthestimate	VPSnetwork	DFM	Pointaugment	mIoU	<25 m	>25 m	<25 m	>25 m	<25 m	>25 m
				62.8	95.2	86.4	79.3	62.7	90.3	79.8
$ \surd $		$ \surd $		64.4	97.1	89.8	81.2	67.4	91.1	82.4
	$ \surd $	$ \surd $		64.6	97.3	89.9	82.8	69.6	91.2	82.1
$ \surd $	$ \surd $	$ \surd $	$ \surd $	66.7	97.6	90.2	85.3	73.3	92.3	83.5

Table 2. Ablation experiment

View in the Article

	Car	Pedestrian	Vegetation
CN	93.2	75.7	87.1
VPS	94.1	79.3	88.4

Table 3. Comparison of voxel feature extraction network effects

View in the Article

method	mAP
Baseline	55.4%
Baseline + GT-Paste^[11]	57.0%
Baseline + PointAugment	57.2%

Table 4. Comparison of object detection result

Qi Chao, Yandong Zhao, Shengbo Liu. Multi-modal-fusion-based 3D semantic segmentation algorithm[J]. Infrared and Laser Engineering, 2024, 53(5): 20240026

Download Citation

Tools

Save the article for my favorites

Paper Information

微信扫一扫：分享

微信扫一扫：分享