An ultra-efficient streaming-based FPGA accelerator for infrared target detection

Shao-Yi CHEN; Xin-Yi TANG; Jian WANG; Jing-Si HUANG; Zheng LI

doi:10.11972/j.issn.1001-9014.2022.05.016

Journals >Journal of Infrared and Millimeter Waves >Volume 41 >Issue 5 >Page 914 > Article

Journal of Infrared and Millimeter Waves
Vol. 41, Issue 5, 914 (2022)

An ultra-efficient streaming-based FPGA accelerator for infrared target detection

Shao-Yi CHEN^1,2,3,4, Xin-Yi TANG^2,3,4, Jian WANG^2,3,4, Jing-Si HUANG^1,2,3,4, and Zheng LI^2,3,4,*

Author Affiliations

¹School of Information Science and Technology，Shanghai Tech University，Shanghai 201210，China

²Shanghai Institute of Technical Physics，Chinese Academy of Sciences，Shanghai 20083，China

³University of Chinese Academy of Sciences，Beijing 100049，China

⁴Key Laboratory of Infrared System Detection and Imaging Technology，Chinese Academy of Sciences，Shanghai 200083，China

show less

DOI: 10.11972/j.issn.1001-9014.2022.05.016 Cite this Article

Shao-Yi CHEN, Xin-Yi TANG, Jian WANG, Jing-Si HUANG, Zheng LI. An ultra-efficient streaming-based FPGA accelerator for infrared target detection[J]. Journal of Infrared and Millimeter Waves, 2022, 41(5): 914 Copy Citation Text

EndNote(RIS)

BibTex

Plain Text

show less

Fig. 1. The network structure of infrared target detection algorithm based on deep learning

Download full size | View in the Article

Fig. 2. SkyNet object detection result on FLIR dataset

Download full size | View in the Article

Fig. 3. Concepts of initial interval and latency

Download full size | View in the Article

Fig. 4. The accelerator design for balancing all stages of pipeline

Download full size | View in the Article

Fig. 5. FPGA inference accelerator architecture

Download full size | View in the Article

Fig. 6. Datapath of pointwise convolution

Download full size | View in the Article

Fig. 7. Datapath of depthwise convolution

Download full size | View in the Article

Fig. 8. Using line buffer to optimize datapath

Download full size | View in the Article

Fig. 9. Datapath of maxpool

Download full size | View in the Article

Fig. 10. DSP48E2 slice architecture

Download full size | View in the Article

Fig. 11. Datapath of process element array

Download full size | View in the Article

Fig. 12. System optimization

Download full size | View in the Article

Algorithm 1Pseudocode for Pointwise Convolution Layer
Input： in× BIT_IN>：feature map input
weight< N_OUT × N_IN × BIT_WT>［N_OCH / N_OUT］［N_ICH / N_IN］：weight of neural network
N_IN：number of input parallel factor
N_OUT：number of output parallel factor
N_ICH：number of input channel
N_OCH：number of output channel
BIT_IN：bitwidth of input
BIT_WT：bitwidth of weight
BIT_OUT：bitwidth of output
#pragma HLS DATAFLOW
forfo = 0；fofo do
forfi = 0；fifi do
#pragma HLS PIPELINE II=1
fori = 0；ii do
#pragma HLS UNROLL
foro = 0；oo do
out += in * weight［fo］［fi］；
end for
end for
end for
end for
Output：out：feature map output

Table 0. [in Chinese]

View in the Article

Algorithm 2Pseudocode for Depthwise Convolution Layer
Input： in：feature map input
weight ［N_CH / N_IO］［9］：weight of neural network
N_IO：number of input parallel factor
N_CH：number of input channel
BIT_IN：bitwidth of input
BIT_WT：bitwidth of weight
BIT_OUT：bitwidth of output
#pragma HLS DATAFLOW
forf = 0；f < N_CH / N_IO；++f do
fork = 0；k<9；++k do
#pragma HLS PIPELINE II=1
wt_buf = weight［f］［k］
fori = 0；ii do
#pragma HLS UNROLL
foro = 0；oo do
out += in * wt_buf ；
end for
end for
end for
end for
Output：out：feature map output

Table 0. [in Chinese]

View in the Article

Net name	ResNet-18	ResNet-34	ResNet-50	VGG-16	SkyNet
Parameter	11.18 M	21.28 M	23.51 M	14.71 M	0.44 M
IoU	0.61	0.26	0.32	0.25	0.73

Table 1. SkyNet parameters and performance comparison with the classical network on DAC-SDC dataset

View in the Article

Layer	Type	K	C	FM	#MAC	PF
Total					465100800	764
1	DW	3	3	160×320	1382400	3
2	PW	1	48	160×320	7372800	12
3	DW	3	48	80×160	5529600	12
4	PW	1	96	80×160	58982400	96
5	DW	3	96	40×80	2764800	6
6	PW	1	192	40×80	58982400	96
7	DW	3	192	20×40	2764800	3
8	PW	1	384	20×40	58982400	96
9	DW	3	384	20×40	2764800	6
10	PW	1	512	20×40	157286400	256
11	DW	3	1280	20×40	9216000	16
12	PW	1	96	20×40	98304000	160
13	PW	1	10	20×40	768000	2

Table 2. Skynet’s parallelism factors of each layer

View in the Article

	iSmart	BJUT Runner	SkrSkr	Our work
Model	SkyNet	UltraNet	SkyNet	SkyNet
# of MACs	465M	272M	465M	465M
# of PFs	256	448	512	764
Frequency（MHz）	220	166	333	350
BRAMs	209	150.5	209	206.5
DSPs	329	360	329	360
LUTs	53809	44633	52875	50518
FFs	55833	58813	55278	40488
Precision（W，A）	11，9	4，4	6，8	5，8
IoU	73.1%	65.6%	73.1%	72.3%
Throughput（FPS）	25	213	52	551
Power（W）	13.5	6.66	6.7	8.4
Energy（mJ/img）	540	31	128	15.2