• Journal of Infrared and Millimeter Waves
  • Vol. 41, Issue 5, 914 (2022)
Shao-Yi CHEN1,2,3,4, Xin-Yi TANG2,3,4, Jian WANG2,3,4, Jing-Si HUANG1,2,3,4, and Zheng LI2,3,4,*
Author Affiliations
  • 1School of Information Science and Technology,Shanghai Tech University,Shanghai 201210,China
  • 2Shanghai Institute of Technical Physics,Chinese Academy of Sciences,Shanghai 20083,China
  • 3University of Chinese Academy of Sciences,Beijing 100049,China
  • 4Key Laboratory of Infrared System Detection and Imaging Technology,Chinese Academy of Sciences,Shanghai 200083,China
  • show less
    DOI: 10.11972/j.issn.1001-9014.2022.05.016 Cite this Article
    Shao-Yi CHEN, Xin-Yi TANG, Jian WANG, Jing-Si HUANG, Zheng LI. An ultra-efficient streaming-based FPGA accelerator for infrared target detection[J]. Journal of Infrared and Millimeter Waves, 2022, 41(5): 914 Copy Citation Text show less
    The network structure of infrared target detection algorithm based on deep learning
    Fig. 1. The network structure of infrared target detection algorithm based on deep learning
    SkyNet object detection result on FLIR dataset
    Fig. 2. SkyNet object detection result on FLIR dataset
    Concepts of initial interval and latency
    Fig. 3. Concepts of initial interval and latency
    The accelerator design for balancing all stages of pipeline
    Fig. 4. The accelerator design for balancing all stages of pipeline
    FPGA inference accelerator architecture
    Fig. 5. FPGA inference accelerator architecture
    Datapath of pointwise convolution
    Fig. 6. Datapath of pointwise convolution
    Datapath of depthwise convolution
    Fig. 7. Datapath of depthwise convolution
    Using line buffer to optimize datapath
    Fig. 8. Using line buffer to optimize datapath
    Datapath of maxpool
    Fig. 9. Datapath of maxpool
    DSP48E2 slice architecture
    Fig. 10. DSP48E2 slice architecture
    Datapath of process element array
    Fig. 11. Datapath of process element array
    System optimization
    Fig. 12. System optimization

    Algorithm 1Pseudocode for Pointwise Convolution Layer

    Input: in× BIT_IN>:feature map input

    weight< N_OUT × N_IN × BIT_WT>[N_OCH / N_OUT][N_ICH / N_IN]:weight of neural network

    N_IN:number of input parallel factor

    N_OUT:number of output parallel factor

    N_ICH:number of input channel

    N_OCH:number of output channel

    BIT_IN:bitwidth of input

    BIT_WT:bitwidth of weight

    BIT_OUT:bitwidth of output

    #pragma HLS DATAFLOW

    forfo = 0;fofo do

      forfi = 0;fifi do

      #pragma HLS PIPELINE II=1

        fori = 0;ii do

        #pragma HLS UNROLL

          foro = 0;oo do

            out += in * weight[fo][fi];

          end for

        end for

      end for

    end for

    Output:out:feature map output

    Table 0. [in Chinese]

    Algorithm 2Pseudocode for Depthwise Convolution Layer

    Input: in:feature map input

    weight [N_CH / N_IO][9]:weight of neural network

    N_IO:number of input parallel factor

    N_CH:number of input channel

    BIT_IN:bitwidth of input

    BIT_WT:bitwidth of weight

    BIT_OUT:bitwidth of output

    #pragma HLS DATAFLOW

    forf = 0;f < N_CH / N_IO;++f do

      fork = 0;k<9;++k do

        #pragma HLS PIPELINE II=1

        wt_buf = weight[f][k]

        fori = 0;ii do

          #pragma HLS UNROLL

          foro = 0;oo do

            out += in * wt_buf

          end for

        end for

      end for

    end for

    Output:out:feature map output

    Table 0. [in Chinese]
    Net nameResNet-18ResNet-34ResNet-50VGG-16SkyNet
    Parameter11.18 M21.28 M23.51 M14.71 M0.44 M
    IoU0.610.260.320.250.73
    Table 1. SkyNet parameters and performance comparison with the classical network on DAC-SDC dataset
    LayerTypeKCFM#MACPF
    Total465100800764
    1DW33160×32013824003
    2PW148160×320737280012
    3DW34880×160552960012
    4PW19680×1605898240096
    5DW39640×8027648006
    6PW119240×805898240096
    7DW319220×4027648003
    8PW138420×405898240096
    9DW338420×4027648006
    10PW151220×40157286400256
    11DW3128020×40921600016
    12PW19620×4098304000160
    13PW11020×407680002
    Table 2. Skynet’s parallelism factors of each layer
    iSmartBJUT RunnerSkrSkrOur work
    ModelSkyNetUltraNetSkyNetSkyNet
    # of MACs465M272M465M465M
    # of PFs256448512764
    Frequency(MHz)220166333350
    BRAMs209150.5209206.5
    DSPs329360329360
    LUTs53809446335287550518
    FFs55833588135527840488
    Precision(W,A)11,94,46,85,8
    IoU73.1%65.6%73.1%72.3%
    Throughput(FPS)2521352551
    Power(W)13.56.666.78.4
    Energy(mJ/img)5403112815.2
    Table 3. Comparison with DAC-SDC accelerator design
    Shao-Yi CHEN, Xin-Yi TANG, Jian WANG, Jing-Si HUANG, Zheng LI. An ultra-efficient streaming-based FPGA accelerator for infrared target detection[J]. Journal of Infrared and Millimeter Waves, 2022, 41(5): 914
    Download Citation