Towards efficient deep neural network training by FPGA-based batch-level parallelism

Cheng Luo; Man-Kit Sit; Hongxiang Fan; Shuanglong Liu; Wayne Luk; Ce Guo

doi:10.1088/1674-4926/41/2/022403

[1] Y LeCun, L Bottou, Y Bengio et al. Gradient-based learning applied to document recognition. Proc IEEE(1998).

[2] O Russakovsky, J Deng, H Su et al. Imagenet large scale visual recognition challenge. IJCV(2015).

[3] S Ren, K He, R Girshick et al. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 91(2015).

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19] O Pell, O Mencer, K H Tsoi et al. Maximum performance computing with dataflow engines. High-performance computing using FPGAs(2013).

[20]

[21]

[22]

[23]

[24]

[25] C Zhang, G Sun, Z Fang et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans Comput-Aid Des Integr Circuits Syst, 38, 2072(2019).

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39] M Matsumoto, T Nishimura. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans Model Comput Simul, 8, 3(1998).

[40]

[41]

[42]

[43]

[44]

[45] S Krishnan, P Ratusziak, C Johnson et al. Accelerator templates and runtime support for variable precision CNN. CISC Workshop(2017).

[46]

[47]

微信扫一扫：分享

微信扫一扫：分享