Lightweight Feature Fusion Network for Object Detection in Aerial Photography Images

Qiangqiang Fan; Zaifeng Shi; Fanning Kong; Shaoxiong Li; Jun Xiao

doi:10.3788/LOP220859

Journals >Laser & Optoelectronics Progress >Volume 60 >Issue 10 >Page 1010027 > Article

Laser & Optoelectronics Progress
Vol. 60, Issue 10, 1010027 (2023)

Lightweight Feature Fusion Network for Object Detection in Aerial Photography Images

Qiangqiang Fan¹, Zaifeng Shi^1、3、*, Fanning Kong¹, Shaoxiong Li¹, and Jun Xiao²

Author Affiliations

¹School of Microelectronics, Tianjin University, Tianjin 300072, China

²Phytium Technology Co., Ltd., Tianjin 300459, China

³Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, Tianjin 300072, China

show less

DOI: 10.3788/LOP220859 Cite this Article Set citation alerts

Qiangqiang Fan, Zaifeng Shi, Fanning Kong, Shaoxiong Li, Jun Xiao. Lightweight Feature Fusion Network for Object Detection in Aerial Photography Images[J]. Laser & Optoelectronics Progress, 2023, 60(10): 1010027 Copy Citation Text

show less

Fig. 1. Overall network architecture

Download full size

Fig. 2. Deformable receptive field block

Download full size

Fig. 3. Ghost bottleneck module

Download full size

Fig. 4. Comparison result of P-R curve. (a) P-R curves of the proposed model for ten classes of objects;（b）P-R curve for the bus；（c）P-R curve for the car

Download full size

Fig. 5. Comparison of detection results of different models in different scenarios. (a) Multi-scale, occluded scene; (b) small object, dense scene; (c) illumination change scene

Download full size

Fig. 6. Detection results of the proposed model on NWPU VHR-10 dataset

Download full size

Input size	Operator	Exp size	Output size	SE	NL	s
640×640×3	Conv2d		320×320×16		HS	2
320×320×16	Bneck，3×3	16	320×320×16		RE	1
320×320×16	Bneck，3×3	64	160×160×24		RE	2
160×160×24	Bneck，3×3	72	160×160×24		RE	1
160×160×24	Bneck，5×5	72	80×80×40	1	RE	2
80×80×40	Bneck，5×5	120	80×80×40	1	RE	1
80×80×40	Bneck，5×5	120	80×80×40	1	RE	1
80×80×40	Bneck，3×3	240	40×40×80		HS	2
40×40×80	Bneck，3×3	200	40×40×80		HS	1
40×40×80	Bneck，3×3	184	40×40×80		HS	1
40×40×80	Bneck，3×3	184	40×40×80		HS	1
40×40×80	Bneck，3×3	480	40×40×112	1	HS	1
40×40×112	Bneck，3×3	672	40×40×112	1	HS	1
40×40×112	Bneck，5×5	672	20×20×160	1	HS	2
20×20×160	Bneck，5×5	960	20×20×160	1	HS	1
20×20×160	Bneck，5×5	960	20×20×160	1	HS	1

Table 1. Detailed structure of backbone network

Algorithm 1：simplify optimal transport assignment（SimOTA）
Input：n is the number of initial selected candidate boxes C，m is the number of ground truth objects in image Y，P_j^class is predicted class score for candidate box a_j，P_j^box is predicted bounding box for a_j（j=1，2，…， n），G_i^class is ground truth class for ground truth g_i，G_i^box is bounding box for g_i（i=1，2，…， m），ε=3
Output：get k candidate boxes as positive samples of g_i
1 calculate class loss：L_ij^class=BCELoss（P_j^class，G_i^class）
2 calculate regression loss：L_ij^reg=GIoULoss（P_j^box，G_i^box）
3 calculate cost：c_ij=L_ij^class+εL_ij^reg
4 select the top10 candidate boxes with the highest IoU for each g_i
5 sum these 10 IoU and take integers to get the top k for each g_i
6 for i=1 to m do
7 select the top k candidate boxes with the least cost within a fixed center region for g_i
8 if a candidate box a_j matches multiple ground truths then select the least cost ground truth matching a_j
9 else a_j is selected as a positive sample of g_i

Table 2. Implementation flow of SimOTA label assignment

Model	Backbone	AP /%	AP⁵⁰ /%	AP⁷⁵ /%	Parameters/10⁶	BFLOPs	Speed /（frame·s^-1）
Faster R-CNN^［7］	VGG16		15.2				20.4
CenterNet^［22］	ResNet50	12.4	22.7	12.4	32.67	246.01	45.2
YOLOv4^［21］	CSPDarknet53	16.8	31.2	16.7	64.36	321.30	28.8
YOLOv4-tiny^［21］	Tiny Darknet	10.6	19.8	10.4	6.06	36.99	65.2
Proposed model	MobileNetV3	15.1	26.6	15.5	7.79	37.35	59.9

Table 3. Comparison of evaluation results of different models on VisDrone dataset

MobileNetV3+ Decoupled Head	D-RFB	RFB	Ghost-PAN	PAN	SimOTA	Focalloss	AP⁵⁰ /%	Speed /（frame·s^-1）
√							18.6	72.3
√	√						20.1	68.2
√		√	√				22.6	61.2
√	√			√			23.6	54.3
√	√		√				23.4	59.9
√	√		√		√		25.2	59.7
√	√		√		√	√	26.6	59.9

Table 4. Results of ablation study

Model	Backbone	AP /%	AP⁵⁰ /%	AP⁷⁵ /%	Parameters /10⁶	BFLOPs	Speed /（frame·s^-1）
Faster R-CNN^［7］	VGG16		81.8				20.9
CenterNet^［22］	ResNet50	45.4	84.1	40.7	32.67	109.34	55.3
YOLOv4^［21］	CSPDarknet53	58.0	96.2	62.7	64.36	142.80	44.2
YOLOv4-tiny^［21］	Tiny Darknet	29.8	72.9	18.0	6.06	16.44	84.3
Proposed model	MobileNetV3	59.2	94.4	64.9	7.79	16.60	79.6

Table 5. Comparison of evaluation results of different models on NWPU VHR-10 dataset

Target category	Faster R-CNN	CenterNet	YOLOv4	YOLOv4-tiny	Proposed model
mAP	81.82	84.10	96.23	72.89	94.36
Airplane	97.71	99.81	99.79	99.30	99.95
Baseball diamond	94.14	91.87	95.81	90.71	98.63
Basketball court	78.38	78.45	98.39	65.47	89.49
Bridge	72.56	81.52	87.13	34.85	89.07
Ground track field	96.98	71.77	97.18	73.85	99.21
Harbor	84.01	75.08	96.75	49.18	89.68
Ship	72.80	87.67	96.52	90.37	93.02
Storage tank	81.83	92.01	96.91	86.49	96.65
Tennis court	83.44	85.33	99.95	77.12	96.40
Vehicle	56.17	77.51	93.89	61.54	91.47

Table 6. Evaluation result of different models on NWPU VHR-10 dataset for 10 classes of objects

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information