• Acta Optica Sinica
  • Vol. 40, Issue 10, 1015002 (2020)
Fang Liu, Zhiwei Wu*, Anzhe Yang, and Xiao Han
Author Affiliations
  • Information Department, Beijing University of Technology, Beijing 100022, China
  • show less
    DOI: 10.3788/AOS202040.1015002 Cite this Article Set citation alerts
    Fang Liu, Zhiwei Wu, Anzhe Yang, Xiao Han. Multi-Scale Feature Fusion Based Adaptive Object Detection for UAV[J]. Acta Optica Sinica, 2020, 40(10): 1015002 Copy Citation Text show less
    Framework of our algorithm
    Fig. 1. Framework of our algorithm
    Schematic diagram of convolution decomposition. (a) Standard convolution process; (b) convolution process after decomposition
    Fig. 2. Schematic diagram of convolution decomposition. (a) Standard convolution process; (b) convolution process after decomposition
    Convolutional neural network residualmodule structure diagram
    Fig. 3. Convolutional neural network residualmodule structure diagram
    Deconvolution cascaded structure
    Fig. 4. Deconvolution cascaded structure
    Adaptive candidate region generation
    Fig. 5. Adaptive candidate region generation
    Visualization detection results of the proposed algorithm in different situations. (a) Small target detection results; (b) dense target detection results;(c) detection results of target under different illuminations
    Fig. 6. Visualization detection results of the proposed algorithm in different situations. (a) Small target detection results; (b) dense target detection results;(c) detection results of target under different illuminations
    LayerTypeKernelOutput sizeNumber of output channels
    Xinput224×2243
    Conv_1Convolution3×3,64 stride 2112×11232
    Conv_2Convolution3×3,11×1,643×3,11×1,64×356×5664
    Conv_3Convolution3×3,11×1,1283×3,11×1,128×428×28128
    Conv_4Convolution3×3,11×1,2563×3,11×1,256×614×14256
    Conv_5Convolution3×3,11×1,5123×3,11×1,512×37×7512
    Table 1. Lightweight deep residual network model
    LayerTypeKernelStrideOutput size
    h1Deconvolution3×3114×14×256
    h2Deconvolution3×3128×28×256
    h3Deconvolution3×3156×56×256
    Table 2. Deconvolution layer parameters
    ModelSize /MBRatio /%Accuracy /%
    Resnet97.781.3
    LResnet10.210.480.6
    Table 3. Feature extraction network comparison
    MethodmAPAP50AP75
    ①Faster-RCN(Resnet50+RPN)18.6335.8717.86
    ②LResnet+RPN18.5235.7517.44
    ③LResnet+DC+RPN21.0338.4618.03
    ④LResnet+DC+GA-RPN(ours)22.1238.7621.53
    Table 4. Effectiveness test of each module for different methods%
    MethodPedestrianPersonBicycleCarVanTruckTricycleAwnBusMotor
    Faster-RCNN18.347.626.7643.3127.5319.9510.137.6536.878.79
    Ours22.437.618.5650.1834.6324.3414.119.0836.2514.88
    Table 5. Comparison between the results of ten categories from ours model and Faster-RCNN on VisDrone dataset%
    MethodmAP /%AP50 /%AP75 /%Frame rate /(frame·s-1)
    FPN16.5132.2014.916
    YOLOv320.3044.1215.8044
    RetinaNet11.8121.3711.6211
    CornerNet17.4134.1215.7813
    Ours22.1238.7621.5324
    Table 6. Comparison test of UAV aerial data with mainstream object detection algorithm
    Fang Liu, Zhiwei Wu, Anzhe Yang, Xiao Han. Multi-Scale Feature Fusion Based Adaptive Object Detection for UAV[J]. Acta Optica Sinica, 2020, 40(10): 1015002
    Download Citation