• Laser & Optoelectronics Progress
  • Vol. 60, Issue 10, 1010027 (2023)
Qiangqiang Fan1, Zaifeng Shi1、3、*, Fanning Kong1, Shaoxiong Li1, and Jun Xiao2
Author Affiliations
  • 1School of Microelectronics, Tianjin University, Tianjin 300072, China
  • 2Phytium Technology Co., Ltd., Tianjin 300459, China
  • 3Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, Tianjin 300072, China
  • show less
    DOI: 10.3788/LOP220859 Cite this Article Set citation alerts
    Qiangqiang Fan, Zaifeng Shi, Fanning Kong, Shaoxiong Li, Jun Xiao. Lightweight Feature Fusion Network for Object Detection in Aerial Photography Images[J]. Laser & Optoelectronics Progress, 2023, 60(10): 1010027 Copy Citation Text show less
    Overall network architecture
    Fig. 1. Overall network architecture
    Deformable receptive field block
    Fig. 2. Deformable receptive field block
    Ghost bottleneck module
    Fig. 3. Ghost bottleneck module
    Comparison result of P-R curve. (a) P-R curves of the proposed model for ten classes of objects;(b)P-R curve for the bus;(c)P-R curve for the car
    Fig. 4. Comparison result of P-R curve. (a) P-R curves of the proposed model for ten classes of objects;(b)P-R curve for the bus;(c)P-R curve for the car
    Comparison of detection results of different models in different scenarios. (a) Multi-scale, occluded scene; (b) small object, dense scene; (c) illumination change scene
    Fig. 5. Comparison of detection results of different models in different scenarios. (a) Multi-scale, occluded scene; (b) small object, dense scene; (c) illumination change scene
    Detection results of the proposed model on NWPU VHR-10 dataset
    Fig. 6. Detection results of the proposed model on NWPU VHR-10 dataset
    Input sizeOperatorExp sizeOutput sizeSENLs
    640×640×3Conv2d320×320×16HS2
    320×320×16Bneck,3×316320×320×16RE1
    320×320×16Bneck,3×364160×160×24RE2
    160×160×24Bneck,3×372160×160×24RE1
    160×160×24Bneck,5×57280×80×401RE2
    80×80×40Bneck,5×512080×80×401RE1
    80×80×40Bneck,5×512080×80×401RE1
    80×80×40Bneck,3×324040×40×80HS2
    40×40×80Bneck,3×320040×40×80HS1
    40×40×80Bneck,3×318440×40×80HS1
    40×40×80Bneck,3×318440×40×80HS1
    40×40×80Bneck,3×348040×40×1121HS1
    40×40×112Bneck,3×367240×40×1121HS1
    40×40×112Bneck,5×567220×20×1601HS2
    20×20×160Bneck,5×596020×20×1601HS1
    20×20×160Bneck,5×596020×20×1601HS1
    Table 1. Detailed structure of backbone network

    Algorithm 1:simplify optimal transport assignment(SimOTA)

    Input:n is the number of initial selected candidate boxes Cm is the number of ground truth objects in image YPjclass is predicted class score for candidate box ajPjbox is predicted bounding box for ajj=1,2,…, n),Giclass is ground truth class for ground truth giGibox is bounding box for gii=1,2,…, m),ε=3

    Output:get k candidate boxes as positive samples of gi

    1 calculate class loss:Lijclass=BCELoss(PjclassGiclass

    2 calculate regression loss:Lijreg=GIoULoss(PjboxGibox

    3 calculate cost:cij=Lijclass+εLijreg

    4 select the top10 candidate boxes with the highest IoU for each gi

    5 sum these 10 IoU and take integers to get the top k for each gi

    6 for i=1 to m do

    7 select the top k candidate boxes with the least cost within a fixed center region for gi

    8 if a candidate box aj matches multiple ground truths then select the least cost ground truth matching aj

    9 else aj is selected as a positive sample of gi

    Table 2. Implementation flow of SimOTA label assignment
    ModelBackboneAP /%AP50 /%AP75 /%Parameters/106BFLOPsSpeed /(frame·s-1
    Faster R-CNN7VGG1615.220.4
    CenterNet22ResNet5012.422.712.432.67246.0145.2
    YOLOv421CSPDarknet5316.831.216.764.36321.3028.8
    YOLOv4-tiny21Tiny Darknet10.619.810.46.0636.9965.2
    Proposed modelMobileNetV315.126.615.57.7937.3559.9
    Table 3. Comparison of evaluation results of different models on VisDrone dataset

    MobileNetV3+

    Decoupled Head

    D-RFBRFBGhost-PANPANSimOTAFocallossAP50 /%Speed /(frame·s-1
    18.672.3
    20.168.2
    22.661.2
    23.654.3
    23.459.9
    25.259.7
    26.659.9
    Table 4. Results of ablation study
    ModelBackboneAP /%AP50 /%AP75 /%Parameters /106BFLOPsSpeed /(frame·s-1
    Faster R-CNN7VGG1681.820.9
    CenterNet22ResNet5045.484.140.732.67109.3455.3
    YOLOv421CSPDarknet5358.096.262.764.36142.8044.2
    YOLOv4-tiny21Tiny Darknet29.872.918.06.0616.4484.3
    Proposed modelMobileNetV359.294.464.97.7916.6079.6
    Table 5. Comparison of evaluation results of different models on NWPU VHR-10 dataset
    Target categoryFaster R-CNNCenterNetYOLOv4YOLOv4-tinyProposed model
    mAP81.8284.1096.2372.8994.36
    Airplane97.7199.8199.7999.3099.95
    Baseball diamond94.1491.8795.8190.7198.63
    Basketball court78.3878.4598.3965.4789.49
    Bridge72.5681.5287.1334.8589.07
    Ground track field96.9871.7797.1873.8599.21
    Harbor84.0175.0896.7549.1889.68
    Ship72.8087.6796.5290.3793.02
    Storage tank81.8392.0196.9186.4996.65
    Tennis court83.4485.3399.9577.1296.40
    Vehicle56.1777.5193.8961.5491.47
    Table 6. Evaluation result of different models on NWPU VHR-10 dataset for 10 classes of objects
    Qiangqiang Fan, Zaifeng Shi, Fanning Kong, Shaoxiong Li, Jun Xiao. Lightweight Feature Fusion Network for Object Detection in Aerial Photography Images[J]. Laser & Optoelectronics Progress, 2023, 60(10): 1010027
    Download Citation