Multi-Scale Feature Fusion Based Adaptive Object Detection for UAV

Fang Liu; Zhiwei Wu; Anzhe Yang; Xiao Han

doi:10.3788/AOS202040.1015002

Journals >Acta Optica Sinica >Volume 40 >Issue 10 >Page 1015002 > Article

Acta Optica Sinica
Vol. 40, Issue 10, 1015002 (2020)

Multi-Scale Feature Fusion Based Adaptive Object Detection for UAV

Fang Liu, Zhiwei Wu^*, Anzhe Yang, and Xiao Han

Author Affiliations

Information Department, Beijing University of Technology, Beijing 100022, China

show less

DOI: 10.3788/AOS202040.1015002 Cite this Article Set citation alerts

Fang Liu, Zhiwei Wu, Anzhe Yang, Xiao Han. Multi-Scale Feature Fusion Based Adaptive Object Detection for UAV[J]. Acta Optica Sinica, 2020, 40(10): 1015002 Copy Citation Text

show less

Fig. 1. Framework of our algorithm

Download full size

Fig. 2. Schematic diagram of convolution decomposition. (a) Standard convolution process; (b) convolution process after decomposition

Download full size

Fig. 3. Convolutional neural network residualmodule structure diagram

Download full size

Fig. 4. Deconvolution cascaded structure

Download full size

Fig. 5. Adaptive candidate region generation

Download full size

Fig. 6. Visualization detection results of the proposed algorithm in different situations. (a) Small target detection results; (b) dense target detection results;(c) detection results of target under different illuminations

Download full size

Layer	Type	Kernel	Output size	Number of output channels
X	input		224×224	3
Conv_1	Convolution	3×3,64 stride 2	112×112	32
Conv_2	Convolution	$\begin{matrix} [\begin{matrix} 3 \times 3,1 \\ 1 \times 1,64 \\ 3 \times 3,1 \\ 1 \times 1,64 \end{matrix}] \end{matrix}$ ×3	56×56	64
Conv_3	Convolution	$\begin{matrix} [\begin{matrix} 3 \times 3,1 \\ 1 \times 1,128 \\ 3 \times 3,1 \\ 1 \times 1,128 \end{matrix}] \end{matrix}$ ×4	28×28	128
Conv_4	Convolution	$\begin{matrix} [\begin{matrix} 3 \times 3,1 \\ 1 \times 1,256 \\ 3 \times 3,1 \\ 1 \times 1,256 \end{matrix}] \end{matrix}$ ×6	14×14	256
Conv_5	Convolution	$\begin{matrix} [\begin{matrix} 3 \times 3,1 \\ 1 \times 1,512 \\ 3 \times 3,1 \\ 1 \times 1,512 \end{matrix}] \end{matrix}$ ×3	7×7	512

Table 1. Lightweight deep residual network model

Layer	Type	Kernel	Stride	Output size
h1	Deconvolution	3×3	1	14×14×256
h2	Deconvolution	3×3	1	28×28×256
h3	Deconvolution	3×3	1	56×56×256

Table 2. Deconvolution layer parameters

Model	Size /MB	Ratio /%	Accuracy /%
Resnet	97.7	—	81.3
LResnet	10.2	10.4	80.6

Table 3. Feature extraction network comparison

Method	mAP	AP⁵⁰	AP⁷⁵
①Faster-RCN(Resnet50+RPN)	18.63	35.87	17.86
②LResnet+RPN	18.52	35.75	17.44
③LResnet+DC+RPN	21.03	38.46	18.03
④LResnet+DC+GA-RPN(ours)	22.12	38.76	21.53

Table 4. Effectiveness test of each module for different methods%

Method	Pedestrian	Person	Bicycle	Car	Van	Truck	Tricycle	Awn	Bus	Motor
Faster-RCNN	18.34	7.62	6.76	43.31	27.53	19.95	10.13	7.65	36.87	8.79
Ours	22.43	7.61	8.56	50.18	34.63	24.34	14.11	9.08	36.25	14.88

Table 5. Comparison between the results of ten categories from ours model and Faster-RCNN on VisDrone dataset%

Method	mAP /%	AP⁵⁰ /%	AP⁷⁵ /%	Frame rate /(frame·s^-1)
FPN	16.51	32.20	14.91	6
YOLOv3	20.30	44.12	15.80	44
RetinaNet	11.81	21.37	11.62	11
CornerNet	17.41	34.12	15.78	13
Ours	22.12	38.76	21.53	24

Table 6. Comparison test of UAV aerial data with mainstream object detection algorithm

Fang Liu, Zhiwei Wu, Anzhe Yang, Xiao Han. Multi-Scale Feature Fusion Based Adaptive Object Detection for UAV[J]. Acta Optica Sinica, 2020, 40(10): 1015002

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information