Design of a Buffer Optimization Architecture for ZynqNet Hardware Accelerator

CHEN Zhuo; CHEN Yiduo; TIAN Chunsheng; QIU Peiyi; DI Zhixiong

doi:10.13911/j.cnki.1004-3365.230098

Abstract

Convolutional neural network ZynqNet is widely used in edge devices. However, the existing FPGA hardware acceleration schemes are not able to meet the real-time requirements of high-demand scenarios, as their frame rates are limited to less than 30 FPS. This paper focuses on the improvement of ZynqNet's FPGA acceleration performance by designing a parallel computing structure based on multiple feature blocks, optimizing support for Expand layer to enhance feature reuse, and optimizing output cache to reduce the number of memory accesses efficiently. Furthermore, A depth-first feature and weight cache mechanism is proposed, utilizing a multi-bank cache mode to enable one-cycle feature and weight reading. Based on the Xilinx Xc7z045 FPGA chip, the accelerator hardware implementation and performance test are completed, the operating frequency is 166 MHz, and the computing performance can reach 49 FPS. Compared to the traditional scheme of deploying the entire network to FPGA, the proposed approach delivers three times the acceleration and five times the improvement in energy efficiency ratio.