• Optics and Precision Engineering
  • Vol. 32, Issue 5, 727 (2024)
Daxiang LI, Jiani XIN*, and Ying LIU
Author Affiliations
  • College of communication and information engineering, Xi'an University of Posts and Telecommunication, Xi'an710121, China
  • show less
    DOI: 10.37188/OPE.20243205.0727 Cite this Article
    Daxiang LI, Jiani XIN, Ying LIU. Position-sensitive Transformer aerial image object detection model[J]. Optics and Precision Engineering, 2024, 32(5): 727 Copy Citation Text show less

    Abstract

    Addressing the challenge of detecting numerous small objects in UAV⁃captured aerial images, this paper introduces the Position⁃Sensitive Transformer Target Detection (PS⁃TOD) model. Initially, it presents a multi⁃scale feature fusion (MSFF) module incorporating a Positional Channel Embedded 3D Attention (PCE3DA) mechanism. PCE3DA leverages the interplay between spatial and channel data to generate 3D attention, enhancing feature representation in areas of interest. This foundation supports a bottom⁃up, cross⁃layer MSFF approach, augmenting the semantic richness of combined features. Subsequently, it proposes a novel Position⁃Sensitive Self⁃Attention (PSSA) mechanism, leading to the development of a position⁃sensitive Transformer encoder⁃decoder. This innovation heightens the model's sensitivity to target positioning, facilitating the capture of long⁃term dependencies within the image's global context. Comparative tests using the VisDrone dataset reveal that the PS⁃TOD model attains an Average Precision (AP) of 28.8%, marking a 4.1% enhancement over the baseline model (DETR). Furthermore, it demonstrates precise object detection in UAV aerial imagery against complex backdrops, significantly boosting the detection accuracy of small targets.
    Daxiang LI, Jiani XIN, Ying LIU. Position-sensitive Transformer aerial image object detection model[J]. Optics and Precision Engineering, 2024, 32(5): 727
    Download Citation