Fig. 1. Existing object detection algorithms using convolution feature to complete prediction task. (a) CenterNet detection model; (b) detection model based on multi-scale feature for total task prediction; (c) proposed MFT detection model
Fig. 2. The proposed MFT network structure
Fig. 3. MSH module architecture
Fig. 4. MRFH module architecture
Fig. 5. Architectures of different feature fusion methods
Fig. 6. Visual effect comparison of CenterNet and MFT detector on COCO dataset
Fig. 7. Visual effect comparison of CenterNet and MFT detector on COCO dataset
Condition | Method | Backbone | Size | V /(frame·s-1) | mAP /% | APS /% | APM /% | APL /% |
---|
V>60 frame/s | SSD[8] | VGG16 | 300×300 | 60.6 | 23.2 | 5.3 | 23.2 | 39.6 | SSD[8] | MobileNetV2 | 512×512 | 110.7 | 22.1 | 5.8 | 16.9 | 43.6 | CenterNet[14] | Res18 | 512×512 | 128.5 | 28.1 | 10.1 | 31.5 | 42.6 | TTFNet[30] | Res18 | 512×512 | 112.3 | 28.1 | 11.8 | 29.5 | 41.5 | MTF | Res18 | 512×512 | 94.5 | 31.5 | 14.9 | 35.3 | 44.3 | V<60 frame/s | FCOS[15] | Res18 | 1330×800 | 20.8 | 26.9 | 13.9 | 28.9 | 36 | CenterNet[14] | Res101 | 512×512 | 45.1 | 34.6 | 10.1 | 31.5 | 42.6 | SSD[8] | VGG16 | 512×512 | 23.4 | 26.8 | 9.0 | 28.9 | 41.9 | YOLOv3[29] | D53 | 608×608 | 30.3 | 33.0 | 18.3 | 25.4 | 41.9 | EfficientDet[28] | EfficientNet | 512×512 | 47.1 | 33.8 | 12.4 | 34.7 | 54.4 | MTF | Res50 | 512×512 | 54.9 | 35.3 | 12.9 | 34.3 | 44.3 |
|
Table 1. Comparison of different object detection algorithms on the COCO dataset
Module | ECF | MSH | MRFH | LWS | mAP /% |
---|
CenterNet (baseline) | | | | | 70.64 | +ECF | √ | | | | 72.28 | +MSH | √ | √ | | | 73.16 | +MRFH | √ | √ | √ | | 73.86 | MFT | √ | √ | √ | √ | 74.09 |
|
Table 2. Ablation results of different proposed modules on the PASCAL VOC dataset
Module | MSH | MRFH | MSH-mismatch | MRFH-mismatch | mAP /% |
---|
All-mismatch | √ | √ | √ | √ | 73.01 | MSH-mismatch | √ | √ | √ | | 73.18 | MFT | √ | √ | | | 73.86 |
|
Table 3. Reasonability of MSH module and MRFH module
MSH | MRFH | mAP /% | ΔmAP |
---|
× | × | 72.28 | 0 | √ | × | 73.16 | +0.88 | × | √ | 73.44 | +1.16 | √ | √ | 73.86 | +1.58 |
|
Table 4. Complementary experiment of MRFH and MSH
Module | Large | Medium | Small | mAP/% |
---|
MSH-L | √ | | | 73.44 | MSH-LM | √ | √ | | 73.51 | MSH | √ | √ | √ | 73.86 |
|
Table 5. Ablation results of different scale features reused by MSH module
Fusion method | w1 | w2 | w3 | w4 | mAP /% |
---|
Simple average | 1/4 | 1/4 | 1/4 | 1/4 | 73.86 | Learned weight | 0.2894 | 0.2551 | 0.3269 | 0.3307 | 74.09 |
|
Table 6. Experiment of different feature fusion methods