Position-sensitive Transformer aerial image object detection model

Daxiang LI; Jiani XIN; Ying LIU

doi:10.37188/OPE.20243205.0727

Journals >Optics and Precision Engineering >Volume 32 >Issue 5 >Page 727 > Article

Optics and Precision Engineering
Vol. 32, Issue 5, 727 (2024)

Position-sensitive Transformer aerial image object detection model

Daxiang LI, Jiani XIN^*, and Ying LIU

Author Affiliations

College of communication and information engineering， Xi'an University of Posts and Telecommunication， Xi'an710121， China

show less

DOI: 10.37188/OPE.20243205.0727 Cite this Article

Daxiang LI, Jiani XIN, Ying LIU. Position-sensitive Transformer aerial image object detection model[J]. Optics and Precision Engineering, 2024, 32(5): 727 Copy Citation Text

show less

Fig. 1. Schematic diagram of PS-TOD model

Download full size | View in the Article

Fig. 2. Fusion scheme of PCE3DA cross layer feature map

Download full size | View in the Article

Fig. 3. Flow chart of position channel embedding 3D attention

Download full size | View in the Article

Fig. 4. Position sensitive self-attention mechanism

Download full size | View in the Article

Fig. 5. Encoder-decoder structure

Download full size | View in the Article

Fig. 6. Partial detection results of PS-TOD on VisDrone test set

Download full size | View in the Article

Fig. 7. Comparison of small object detection result

Download full size | View in the Article

方法	MSFF	PSSA	Loss	AP_S	AP_M	AP_L	AP	Param/M
基线	-	-	-	13.8	36.8	47.5	24.7	41.30
	√	-	-	16.4	38.9	49.4	26.4	42.36
	-	√	-	15.0	37.6	48.7	25.8	41.45
	-	-	√	15.6	39.1	48.9	26.0	41.30
	√	√	-	17.1	39.7	49.8	27.2	42.51
	-	√	√	16.5	40.0	49.1	26.9	41.45
	√	-	√	18.5	39.6	50.1	28.1	42.36
Ours	√	√	√	19.4	40.1	50.9	28.8	42.51

Table 1. Ablation experiment results on VisDrone test set

View in the Article

组别	方法	AP_S	AP_M	AP_L	AP
A	Baseline	13.8	36.8	47.5	24.7
B	Baseline-SE	13.9	37.0	47.5	24.9
C	Baseline-SA	14.5	38.1	47.7	25.2
D	Baseline-CA	14.3	37.7	48.3	25.4
E	Baseline-CBAM	14.6	37.5	48.1	25.2
F	Baseline-PCE3DA	15.2	38.4	48.7	25.7
G	F+MSFF	16.4	38.9	49.4	26.4

Table 2. Experimental results for different attention mechanisms and using multi-scale features

View in the Article

方法	AP_S	AP_M	AP_L	AP
基线模型	13.8	36.8	47.5	24.7
文献［27］	14.3	37.0	48.3	25.0
文献［28］	14.6	37.4	48.1	25.1
PSSA	15.0	37.6	48.7	25.8

Table 3. Experimental results of different relative position calculation methods

View in the Article

方法	AP₅₀	AP₇₅	AP	FPS
Faster R-CNN^［3］	21.7	/	/	15.9
Cascade R-CNN^［4］	38.6	25.0	23.5	9.0
YOLOv4^［6］	31.2	16.7	16.8	28.8
QueryDet^［7］	48.1	28.8	28.3	2.8
CornerNet^［10］	34.1	15.8	17.4	15.5
RetinaNet^［20］	28.4	12.3	11.3	16
Double-Head RCNN^［29］	38.3	24.8	23.8	6.5
IterDet^［30］	36.8	20.3	20.4	11.4
RSOD^［31］	43.3	27.1	25.4	28
YOLOv8^［32］	46.4	27.5	26.5	30.1
PVTv2^［33］	34.1	21.4	20.6	10.9
PS-TOD（Ours）	51.8	28.3	28.8	22.7

Table 4. Performance comparison of different algorithms on VisDrone test set

View in the Article

目标类别	行人	人	汽车	公交车	自行车	卡车	三轮车	雨棚三轮车	面包车	摩托车
基线模型	24.8	18.7	61.6	35.2	12.1	23.3	15.2	4.6	28.6	24.9
PS-TOD	29.0	22.4	64.3	45.9	14.7	27.1	21.4	9.0	31.7	28.4

Table 5. Experimental results of different categories on VisDrone test set

Daxiang LI, Jiani XIN, Ying LIU. Position-sensitive Transformer aerial image object detection model[J]. Optics and Precision Engineering, 2024, 32(5): 727

Download Citation

Tools

Save the article for my favorites

Paper Information

微信扫一扫：分享

微信扫一扫：分享