Position-sensitive Transformer aerial image object detection model

Daxiang LI; Jiani XIN; Ying LIU

doi:10.37188/OPE.20243205.0727

Abstract

Addressing the challenge of detecting numerous small objects in UAV⁃captured aerial images, this paper introduces the Position⁃Sensitive Transformer Target Detection (PS⁃TOD) model. Initially, it presents a multi⁃scale feature fusion (MSFF) module incorporating a Positional Channel Embedded 3D Attention (PCE3DA) mechanism. PCE3DA leverages the interplay between spatial and channel data to generate 3D attention, enhancing feature representation in areas of interest. This foundation supports a bottom⁃up, cross⁃layer MSFF approach, augmenting the semantic richness of combined features. Subsequently, it proposes a novel Position⁃Sensitive Self⁃Attention (PSSA) mechanism, leading to the development of a position⁃sensitive Transformer encoder⁃decoder. This innovation heightens the model's sensitivity to target positioning, facilitating the capture of long⁃term dependencies within the image's global context. Comparative tests using the VisDrone dataset reveal that the PS⁃TOD model attains an Average Precision (AP) of 28.8%, marking a 4.1% enhancement over the baseline model (DETR). Furthermore, it demonstrates precise object detection in UAV aerial imagery against complex backdrops, significantly boosting the detection accuracy of small targets.

微信扫一扫：分享

微信扫一扫：分享