• Journal of Semiconductors
  • Vol. 41, Issue 2, 022403 (2020)
Cheng Luo1, Man-Kit Sit2, Hongxiang Fan2, Shuanglong Liu2, Wayne Luk2, and Ce Guo2
Author Affiliations
  • 1State Key Laboratory of ASIC and System, Fudan University, Shanghai 200050, China
  • 2Department of Computing, Imperial College London, London, United Kingdom
  • show less
    DOI: 10.1088/1674-4926/41/2/022403 Cite this Article
    Cheng Luo, Man-Kit Sit, Hongxiang Fan, Shuanglong Liu, Wayne Luk, Ce Guo. Towards efficient deep neural network training by FPGA-based batch-level parallelism[J]. Journal of Semiconductors, 2020, 41(2): 022403 Copy Citation Text show less
    References

    [1] Y LeCun, L Bottou, Y Bengio et al. Gradient-based learning applied to document recognition. Proc IEEE(1998).

    [2] O Russakovsky, J Deng, H Su et al. Imagenet large scale visual recognition challenge. IJCV(2015).

    [3] S Ren, K He, R Girshick et al. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 91(2015).

    [4]

    [5]

    [6]

    [7]

    [8]

    [9]

    [10]

    [11]

    [12]

    [13]

    [14]

    [15]

    [16]

    [17]

    [18]

    [19] O Pell, O Mencer, K H Tsoi et al. Maximum performance computing with dataflow engines. High-performance computing using FPGAs(2013).

    [20]

    [21]

    [22]

    [23]

    [24]

    [25] C Zhang, G Sun, Z Fang et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans Comput-Aid Des Integr Circuits Syst, 38, 2072(2019).

    [26]

    [27]

    [28]

    [29]

    [30]

    [31]

    [32]

    [33]

    [34]

    [35]

    [36]

    [37]

    [38]

    [39] M Matsumoto, T Nishimura. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans Model Comput Simul, 8, 3(1998).

    [40]

    [41]

    [42]

    [43]

    [44]

    [45] S Krishnan, P Ratusziak, C Johnson et al. Accelerator templates and runtime support for variable precision CNN. CISC Workshop(2017).

    [46]

    [47]

    Cheng Luo, Man-Kit Sit, Hongxiang Fan, Shuanglong Liu, Wayne Luk, Ce Guo. Towards efficient deep neural network training by FPGA-based batch-level parallelism[J]. Journal of Semiconductors, 2020, 41(2): 022403
    Download Citation