Towards efficient deep neural network training by FPGA-based batch-level parallelism

Cheng Luo; Man-Kit Sit; Hongxiang Fan; Shuanglong Liu; Wayne Luk; Ce Guo

doi:10.1088/1674-4926/41/2/022403

Abstract

Training deep neural networks (DNNs) requires a significant amount of time and resources to obtain acceptable results, which severely limits its deployment in resource-limited platforms. This paper proposes DarkFPGA, a novel customizable framework to efficiently accelerate the entire DNN training on a single FPGA platform. First, we explore batch-level parallelism to enable efficient FPGA-based DNN training. Second, we devise a novel hardware architecture optimised by a batch-oriented data pattern and tiling techniques to effectively exploit parallelism. Moreover, an analytical model is developed to determine the optimal design parameters for the DarkFPGA accelerator with respect to a specific network specification and FPGA resource constraints. Our results show that the accelerator is able to perform about 10 times faster than CPU training and about a third of the energy consumption than GPU training using 8-bit integers for training VGG-like networks on the CIFAR dataset for the Maxeler MAX5 platform.

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

View in Article

微信扫一扫：分享

微信扫一扫：分享