• Optics and Precision Engineering
  • Vol. 32, Issue 5, 727 (2024)
Daxiang LI, Jiani XIN*, and Ying LIU
Author Affiliations
  • College of communication and information engineering, Xi'an University of Posts and Telecommunication, Xi'an710121, China
  • show less
    DOI: 10.37188/OPE.20243205.0727 Cite this Article
    Daxiang LI, Jiani XIN, Ying LIU. Position-sensitive Transformer aerial image object detection model[J]. Optics and Precision Engineering, 2024, 32(5): 727 Copy Citation Text show less
    Schematic diagram of PS-TOD model
    Fig. 1. Schematic diagram of PS-TOD model
    Fusion scheme of PCE3DA cross layer feature map
    Fig. 2. Fusion scheme of PCE3DA cross layer feature map
    Flow chart of position channel embedding 3D attention
    Fig. 3. Flow chart of position channel embedding 3D attention
    Position sensitive self-attention mechanism
    Fig. 4. Position sensitive self-attention mechanism
    Encoder-decoder structure
    Fig. 5. Encoder-decoder structure
    Partial detection results of PS-TOD on VisDrone test set
    Fig. 6. Partial detection results of PS-TOD on VisDrone test set
    Comparison of small object detection result
    Fig. 7. Comparison of small object detection result
    方法MSFFPSSALossAPSAPMAPLAPParam/M
    基线---13.836.847.524.741.30
    --16.438.949.426.442.36
    --15.037.648.725.841.45
    --15.639.148.926.041.30
    -17.139.749.827.242.51
    -16.540.049.126.941.45
    -18.539.650.128.142.36
    Ours19.440.150.928.842.51
    Table 1. Ablation experiment results on VisDrone test set
    组别方 法APSAPMAPLAP
    ABaseline13.836.847.524.7
    BBaseline-SE13.937.047.524.9
    CBaseline-SA14.538.147.725.2
    DBaseline-CA14.337.748.325.4
    EBaseline-CBAM14.637.548.125.2
    FBaseline-PCE3DA15.238.448.725.7
    GF+MSFF16.438.949.426.4
    Table 2. Experimental results for different attention mechanisms and using multi-scale features
    方 法APSAPMAPLAP
    基线模型13.836.847.524.7
    文献[2714.337.048.325.0
    文献[2814.637.448.125.1
    PSSA15.037.648.725.8
    Table 3. Experimental results of different relative position calculation methods
    方 法AP50AP75APFPS
    Faster R-CNN321.7//15.9
    Cascade R-CNN438.625.023.59.0
    YOLOv4631.216.716.828.8
    QueryDet748.128.828.32.8
    CornerNet1034.115.817.415.5
    RetinaNet2028.412.311.316
    Double-Head RCNN2938.324.823.86.5
    IterDet3036.820.320.411.4
    RSOD3143.327.125.428
    YOLOv83246.427.526.530.1
    PVTv23334.121.420.610.9
    PS-TOD(Ours)51.828.328.822.7
    Table 4. Performance comparison of different algorithms on VisDrone test set
    目标类别行人汽车公交车自行车卡车三轮车雨棚三轮车面包车摩托车
    基线模型24.818.761.635.212.123.315.24.628.624.9
    PS-TOD29.022.464.345.914.727.121.49.031.728.4
    Table 5. Experimental results of different categories on VisDrone test set
    Daxiang LI, Jiani XIN, Ying LIU. Position-sensitive Transformer aerial image object detection model[J]. Optics and Precision Engineering, 2024, 32(5): 727
    Download Citation