• Opto-Electronic Engineering
  • Vol. 49, Issue 3, 210372-1 (2022)
Xu Chen, Dongliang Peng, and Yu Gu*
Author Affiliations
  • School of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China
  • show less
    DOI: 10.12086/oee.2022.210372 Cite this Article
    Xu Chen, Dongliang Peng, Yu Gu. Real-time object detection for UAV images based on improved YOLOv5s[J]. Opto-Electronic Engineering, 2022, 49(3): 210372-1 Copy Citation Text show less
    YOLOv5 backbone network architecture diagram
    Fig. 1. YOLOv5 backbone network architecture diagram
    Structure diagram of feature fusion module
    Fig. 2. Structure diagram of feature fusion module
    (a) Res-DConv module; (b) Receptive field mapping
    Fig. 3. (a) Res-DConv module; (b) Receptive field mapping
    Improved module structure
    Fig. 4. Improved module structure
    YOLOv5sm+ model architecture
    Fig. 5. YOLOv5sm+ model architecture
    (a) Total number of category instances on the VisDrone dataset; (b) Classes confusion matrix of YOLOv5m algorithm
    Fig. 6. (a) Total number of category instances on the VisDrone dataset; (b) Classes confusion matrix of YOLOv5m algorithm
    The detection examples of different algorithms in the VisDrone UAV scene. (a) YOLOv5m model; (b) YOLOv5sm+ model; (c) YOLOv5s model
    Fig. 7. The detection examples of different algorithms in the VisDrone UAV scene. (a) YOLOv5m model; (b) YOLOv5sm+ model; (c) YOLOv5s model
    Comparison of the detection effects of three algorithms in dense vehicle scenes. (a) YOLOv5m; (b) YOLOv5s; (c) YOLOv5sm+
    Fig. 8. Comparison of the detection effects of three algorithms in dense vehicle scenes. (a) YOLOv5m; (b) YOLOv5s; (c) YOLOv5sm+
    Detection comparison of improved algorithm in DIOR dataset. (a) YOLOv5s; (b) YOLOv5sm+
    Fig. 9. Detection comparison of improved algorithm in DIOR dataset. (a) YOLOv5s; (b) YOLOv5sm+
    YOLOv5s感受野通道YOLOv5sm感受野通道
    Focus632Conv 3*3 (stride:2)324
    Conv3*3 (dilation:2)1548
    下采样1064Conv3*3 (stride:2)1996
    Res-Block2796
    C3_x11864Res-Dconv5196
    下采样26128Conv 3*3 (stride:2)59192
    C3_x374128C3_x3107192
    下采样90256Conv3*3 (stride:2)123384
    C3_x3186256C3_x3219384
    下采样218512Conv3*3 (stride:2)251768
    Spp218~634512Spp251~667768
    C3_x1282~698512C3_x1315~731768
    Table 1. Receptive field analysis table
    下采样因子345
    最大感受野/pixel111255731
    先验框范围8*8~37*3732*32~85*8596*96~365*365
    Table 2. Pre-setting anchors in response to the receptive field and down-sampling
    目标种类Small (0×0~32×32)Mid (32×32~96×96)Large (96×96~)
    数量44.4418.631.704
    Table 3. Statistics of different types of objects
    深度宽度mAP50mAPBFLOPs
    0.330.50.5020.28816.5
    0.330.750.5400.31936.3
    1.330.50.5250.31135.4
    Table 4. Performance comparison experiment results of depth and width models
    BaselineRes-DconvmAP50mAPBFLOPs
    0.5020.28816.5
    0.5160.29919.8
    Table 5. Verification experiment results on Res-Dconv module
    BaselineSMSCAMSDCMmAPmAP50BFLOPsInferAP-smallAP-mediumAP-large
    注:加粗字体为该列最优值
    YOLOv5s0.3190.54816.54.80.2200.4370.495
    0.3580.58930.18.30.2800.4760.495
    0.3240.55514.73.80.2250.4460.511
    0.3330.55519.54.90.2500.4480.482
    0.3560.59338.09.00.2780.4750.512
    0.3600.59630.87.70.2810.4790.505
    Table 6. The ablation experiment results of our algorithm modules on the VisDrone dataset
    算法mAP50mAPmAP75AP-smallAP-midAP-largeBFLOPsInfer/ms
    注:+为添加改进模块的模型,*为多尺度测试结果,包含引用文献实验结果。
    YOLOv30.6090.3890.4170.2970.4960.545154.927.8
    Scaled-YOLOv40.6200.4000.4280.3050.5140.626119.427.1
    ClusDet[1]0.5620.3240.316-----
    HRDNet[1]0.6200.35510.351-----
    YOLOv5s0.5480.3190.3170.2200.4370.49516.54.8
    YOLOv5m0.5950.3650.3720.2850.4820.52550.49.8
    YOLOX-s0.5350.3140.3170.2250.4150.48541.655.1
    MobileNetv30.5540.3290.3290.2450.4430.49523.88.0
    MobileViT0.5550.3330.3370.2490.4420.418-13.7
    YOLOv5sm+0.5960.3600.3690.2810.4790.50530.87.7
    YOLOv5sm+*0.6060.3670.3780.2950.4780.439--
    Table 7. Detection performance of different algorithms on VisDrone dataset
    模型BackBonemAP50
    注:加粗字体为该列最优值,包含其他文献对比结果。
    Faster R-CNN[33]VGG160.541
    PANet[20]ResNet500.638
    Retina-Net[24]ResNet500.685
    文献[32] ResNet500.732
    CAT-Net[34]ResNet500.763
    YOLOv5sm+(ours)-0.667
    Table 8. Detection performance of different algorithms on DIOR dataset
    Xu Chen, Dongliang Peng, Yu Gu. Real-time object detection for UAV images based on improved YOLOv5s[J]. Opto-Electronic Engineering, 2022, 49(3): 210372-1
    Download Citation