Author Affiliations
1School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China2Shanghai Intelligent and Connected Vehicle R&D Center Co., Ltd., Shanghai 201499, Chinashow less
Fig. 1. Structure of proposed 3D object detection algorithm
Fig. 2. Depth completion network
Fig. 3. Results of dense point cloud and sparse point cloud projected on the image respectively
Fig. 4. Point cloud image generated from depth map
Fig. 5. 3D object detection network based on key point feature pyramid
Fig. 6. Dense depth map generated from depth completion network
Fig. 7. Result of sparse point cloud projection on the image
Fig. 8. Dense depth map generated from depth completion network
Fig. 9. BEV map generated from dense point cloud after aerial view projection
Fig. 10. Detection result on BEV map
Fig. 11. Display effect of object detection stereo bounding box on camera RGB pictures
Algorithm | Car(IOU is 0.7) | Person(IOU is 0.5) | Bicycle(IOU is 0.5) |
---|
Easy | Moderate | Difficult | Easy | Moderate | Difficult | Easy | Moderate | Difficult |
---|
Proposed algorithm | 87.98 | 77.14 | 73.33 | 45.97 | 38.94 | 35.81 | 68.12 | 55.25 | 53.55 |
|
Table 1. Target detection accuracy of proposed algorithm on KITTI dataset
Input | Car(IOU is 0.7) | Person(IOU is 0.5) | Bicycle(IOU is 0.5) |
---|
Easy | Moderate | Difficult | Easy | Moderate | Difficult | Easy | Moderate | Difficult |
---|
Sparse point cloud | 4.50 | 3.15 | 2.88 | 0.96 | 0.94 | 0.9 | 0.82 | 0.78 | 0.70 | Proposed algorithm | 87.98 | 77.14 | 73.33 | 45.97 | 38.94 | 35.81 | 68.12 | 55.25 | 53.55 |
|
Table 2. Target detection accuracy under the condition of sparse point cloud BEV as theinput of key point feature pyramid network
Input | Car(IOU is 0.7) | Person(IOU is 0.5) | Bicycle(IOU is 0.5) |
---|
Easy | Moderate | Difficult | Easy | Moderate | Difficult | Easy | Moderate | Difficult |
---|
Point cloud organized as(x,y,z) | 60.25 | 52.80 | 45.60 | 35.86 | 31.38 | 29.72 | 41.82 | 39.56 | 36.09 | Proposed algorithm | 87.98 | 77.14 | 73.33 | 45.97 | 38.94 | 35.81 | 68.12 | 55.25 | 53.55 |
|
Table 3. Target detection accuracy under the condition of coded point cloud in previous view form
Input | Car(IOU is 0.7) | Person(IOU is 0.5) | Bicycl e(IOU is 0.5) |
---|
Easy | Moderate | Difficult | Easy | Moderate | Difficult | Easy | Moderate | Difficult |
---|
Image | 34.52 | 21.04 | 19.03 | 22.24 | 13.56 | 12.26 | 5.63 | 3.43 | 3.10 | Proposed algorithm | 87.98 | 77.14 | 73.33 | 45.97 | 38.94 | 35.81 | 68.12 | 55.25 | 53.55 |
|
Table 4. Target detection accuracy under the condition of only taking the picture as the input of depth complement network
Down sampling rate | Car(IOU is 0.7) | Person(IOU is 0.5) | Bicycle(IOU is 0.5) |
---|
Easy | Moderate | Difficult | Easy | Moderate | Difficult | Easy | Moderate | Difficult |
---|
1% | 44.30 | 32.03 | 28.32 | 31.56 | 22.36 | 20.79 | 16.27 | 12.85 | 11.50 | 6% | 73.54 | 60.17 | 55.80 | 38.91 | 33.40 | 29.95 | 48.2 | 40.76 | 39.80 | 8% | 81.85 | 69.84 | 67.80 | 41.20 | 35.21 | 31.90 | 53.52 | 47.92 | 43.77 | 10% | 87.98 | 77.14 | 73.33 | 45.97 | 38.94 | 35.81 | 68.12 | 55.25 | 53.55 | 12% | 88.20 | 78.26 | 73.65 | 46.01 | 39.70 | 36.11 | 68.80 | 55.73 | 54.02 |
|
Table 5. Target detection accuracy under different point cloud down sampling rates
Algorithm | Modality | Car(IOU is 0.7) | Person(IOU is 0.5) | Bicycle(IOU is 0.5) |
---|
Easy | Moderate | Difficult | Easy | Moderate | Difficult | Easy | Moderate | Difficult |
---|
VoxelNet | 64-line LiDAR | 77.47 | 65.11 | 57.73 | 39.48 | 33.69 | 31.51 | 61.22 | 48.36 | 44.37 | SECOND | 64-line LiDAR | 83.13 | 73.66 | 66.20 | 51.07 | 42.56 | 37.29 | 70.51 | 53.85 | 46.90 | PointRCNN | 64-line LiDAR | 85.94 | 75.76 | 68.32 | 49.43 | 41.78 | 38.63 | 73.93 | 59.60 | 53.59 | SS3D[30] | Camera | 10.78 | 7.68 | 6.51 | 2.31 | 1.78 | 1.48 | 2.80 | 1.45 | 1.35 | D4LCN[31] | Camera | 16.65 | 11.72 | 9.51 | 4.55 | 3.42 | 2.83 | 2.45 | 1.67 | 1.36 | AVOD | 64-line LiDAR+camera | 76.39 | 66.47 | 60.23 | 36.10 | 27.86 | 25.76 | 57.19 | 42.08 | 38.29 | Frustum PointNets | 64-line LiDAR+camera | 82.19 | 69.79 | 60.59 | 50.53 | 42.15 | 38.08 | 72.27 | 56.12 | 49.01 | Proposed algorithm | Sparse point cloud+camera | 87.98 | 77.14 | 73.33 | 45.97 | 38.94 | 35.81 | 68.12 | 55.25 | 53.55 |
|
Table 6. Comparison of 3D object detection algorithms on KITTI dataset
Parameter | VoxelNet | SECOND | PointRCNN | SS3D | D4LCN | Proposed algorithm |
---|
Running time /s | 0.23 | 0.05 | 0.1 | 0.05 | 0.2 | 0.08 |
|
Table 7. Running time comparison of 3D object detection algorithms on KITTI dataset