Author Affiliations
College of Missile Engineering, Rocket Force University of Engineering, Xi'an, Shaanxi 710025, Chinashow less
Fig. 1. Structure of RetinaNet
Fig. 2. Internal structure of dual attention SE-ResNeXt module
Fig. 3. Bottom-up short connection
Fig. 4. Structure of pixel-wise addition module
Fig. 5. Structure of GCU module
Fig. 6. Structure of object detection subnet
Fig. 7. Structure of DRF module
Fig. 8. Partial sample of VISDrone-g dataset
Fig. 9. Statistical of the VISDrone-g. (a) Object scale distribution characteristics; (b) object frame length and width proportional distribution characteristics
Fig. 10. Detailed explanation of COCO object detection and evaluation indexes
[26] Fig. 11. Visual contrast between DRF-RetinaNet and RetinaNet*. (a)(c)(e) DRF-RetinaNet's detection result; (b)(d)(f) RetinaNet's detection result
Fig. 12. Detection results of dim light
Fig. 13. Detection results of dense objects
Fig. 14. Detection results of oblique view
Fig. 15. Detection results of down view
αt | γ | AP /% | AP50 /% | F1-score |
---|
0.20 | 2.0 | 23.93 | 38.18 | 47.24 | 0.25 | 2.0 | 24.37 | 39.95 | 48.43 | 0.25 | 3.0 | 25.14 | 42.62 | 52.47 | 0.30 | 3.0 | 24.82 | 41.23 | 50.72 |
|
Table 1. Focal Loss parameter tuning
Module | Whether or not it contains |
---|
RetinaNet* | √ | | | | | | SE-ResNeXt | | √ | √ | √ | √ | √ | Bottom-up | | | √ | | √ | √ | GCU | | | | √ | √ | √ | DRF detection subnet | | | | | | √ | AP /% | 18.97 | 20.22 | 21.33 | 22.05 | 23.17 | 25.14 | AP50 /% | 28.65 | 30.64 | 32.78 | 34.33 | 37.83 | 42.62 | Note:*indicates that anchor parameters have been adjusted according to section 4.2. |
|
Table 2. Performance comparison of model components
Method | Input size | Basebone Network | AP /% | AP50 /% | AP75 /% | AR1 /% | AR10 /% | AR100 /% | Time /ms |
---|
Faster R-CNN | 600 | Resnet-50 | 16.72 | 24.32 | 14.15 | 4.21 | 12.47 | 16.65 | 137 | R-FCN | 600 | Resnet-101 | 19.35 | 30.18 | 19.52 | 5.65 | 18.73 | 22.56 | 178 | SSD | 512 | Vgg-16 | 12.23 | 17.29 | 11.54 | 3.71 | 11.22 | 15.41 | 54 | RFB-Net | 512 | Resnet-50 | 14.87 | 22.17 | 12.06 | 4.34 | 13.15 | 17.38 | 75 | YOLO v3 | 416 | Darknet-53 | 14.75 | 21.86 | 12.17 | 4.12 | 12.93 | 17.41 | 67 | RetinaNet | 608 | Resnet-50 | 16.35 | 23.18 | 13.92 | 4.85 | 14.75 | 18.36 | 85 | RetinaNet* | 608 | Resnet-50 | 18.97 | 28.65 | 17.42 | 4.92 | 17.25 | 20.52 | 88 | DRF-RetinaNet | 608 | SE-ResNeXt-50 | 25.14 | 42.62 | 24.71 | 7.82 | 24.22 | 31.24 | 103 | Note:*indicates that anchor parameters have been adjusted according to section 4.2. | |
|
Table 3. Performance comparison of each algorithm
Method | APsmall /% | APmedium /% | APlarge /% | ARsmall /% | ARmedium /% | ARlarge /% | F1-score |
---|
Faster R-CNN | 7.14 | 24.42 | 36.73 | 10.62 | 26.75 | 41.41 | 33.15 | R-FCN | 9.85 | 26.13 | 40.25 | 14.57 | 32.71 | 47.79 | 40.67 | SSD | 5.85 | 20.03 | 34.07 | 7.63 | 24.97 | 38.68 | 26.41 | RFB-Net | 6.62 | 22.18 | 34.28 | 9.55 | 25.77 | 40.82 | 33.13 | YOLO v3 | 6.25 | 22.26 | 36.17 | 9.72 | 25.72 | 40.27 | 32.73 | RetinaNet | 7.27 | 23.95 | 36.72 | 10.31 | 26.63 | 42.23 | 32.69 | RetinaNet* | 9.82 | 25.35 | 38.31 | 14.93 | 31.91 | 44.82 | 37.92 | DRF-RetinaNet | 13.62 | 40.34 | 55.95 | 17.42 | 49.97 | 61.53 | 52.47 | Note:*indicates that anchor parameters have been adjusted according to section 4.2. | |
|
Table 4. Performance comparison of algorithms for different scales object detection