A survey of neural network accelerator with software development environments

Jin Song; Xuemeng Wang; Zhipeng Zhao; Wei Li; Tian Zhi

doi:10.1088/1674-4926/41/2/021403

[1] W Huang, Z Jing. Multi-focus image fusion using pulse coupled neural network. Pattern Recogn Lett, 28, 1123(2007).

[2] J K Paik, A K Katsaggelos. Image restoration using a modified hopfield network. IEEE Trans Image Process, 1, 49(1992).

[3] X Li, L Zhao, L Wei et al. DeepSaliency: multi-task deep neural network model for salient object detection. IEEE Trans Image Process, 25, 3919(2016).

[4] Y Zhu, R Urtasun, R Salakhutdinov et al. segDeepM: exploiting segmentation and context in deep neural networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4703(2015).

[5] A Graves, A R Mohamed, G Hinton. Speech recognition with deep recurrent neural networks. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 6645(2013).

[6] O Abdelhamid, A Mohamed, H Jiang et al. Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio Speech Language Process, 22, 1533(2014).

[7] R Collobert, J Weston. A unified architecture for natural language processing. International Conference on Machine Learning(2008).

[8] R Sarikaya, G E Hinton, A Deoras. Application of deep belief networks for natural language understanding. IEEE/ACM Trans Audio, Speech, Language Process, 22, 778(2014).

[9]

[10] W S McCulloch, W Pitts. A logical calculus of ideas immanent in nervous activity. Bull Math Biophys, 5, 115(1943).

[11] F Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psycholog Rev, 65, 386(1958).

[12]

[13] G E Hinton, S Osindero, Y Teh. A fast learning algorithm for deep belief nets. Neur Comput, 18, 1527(2006).

[14]

[15] K He, X Zhang, S Ren et al. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770(2016).

[16] C Szegedy, W Liu, Y Jia et al. Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1(2015).

[17] C Szegedy, S Ioffe, V Vanhoucke et al. Inception-v4, Inception-ResNet and the impact of residual connections on learning. National Conference on Artificial Intelligence, 4278(2016).

[18]

[19] F Mamalet, C Garcia. Simplifying convnets for fast learning. international conference on artificial neural networks. International Conference on Artificial Neural Networks, 58(2012).

[20]

[21]

[22] S Hochreiter, J Schmidhuber. Long short-term memory. Neur Comput, 9, 1735(1997).

[23] A Vaswani, N Shazeer, N Parmar et al. Attention is all you need. Advances in Neural Information Processing Systems, 5998(2017).

[24]

[25]

[26]

[27] D D Lin, S S Talathi, V S Annapureddy. Fixed point quantization of deep convolutional networks. International Conference on Machine Learning, 2849(2016).

[28] J Xue, J Li, D Yu et al. Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network. IEEE International Conference on Acoustics(2014).

[29] E Park, J Ahn, S Yoo. Weighted-entropy-based quantization for deep neural networks. IEEE Conference on Computer Vision & Pattern Recognition(2017).

[30] L Song, Y Wang, Y Han et al. C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. Design Automation Conference(2016).

[31] R J Kuo, Y L An, H S Wang et al. Integration of self-organizing feature maps neural network and genetic K-means algorithm for market segmentation. Expert Syst Appl, 30, 313(2006).

[32] T Roska, G Bártfai, P Szolgay et al. A digital multiprocessor hardware accelerator board for cellular neural networks: CNN-HAC. Int J Circuit Theory Appl, 20, 589(1992).

[33]

[34] A Page, A Jafari, C Shea et al. SPARCNet: a hardware accelerator for efficient deployment of sparse convolutional networks. ACM J Emerg Technolog Comput Syst, 13, 1(2017).

[35] T Chen, Y Chen, M Duranton et al. BenchNN: On the broad potential application scope of hardware neural network accelerators. 2012 IEEE International Symposium on Workload Characterization (IISWC), 36(2012).

[36] C Farabet, C Poulet, J Y Han et al. CNP: An FPGA-based processor for convolutional networks. International Conference on Field Programmable Logic and Applications(2009).

[37] S Zhang, Z Du, L Zhang et al. Cambricon-X: An accelerator for sparse neural networks. The 49th Annual IEEE/ACM International Symposium on Microarchitecture, 20(2016).

[38] Y Yu, T Zhi, X Zhou et al. BSHIFT: a low cost deep neural networks accelerator. Int J Paral Program, 47, 360(2019).

[39] A Shafiee, A Nag, N Muralimanohar et al. ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)(2016).

[40] Y H Chen, T Krishna, J S Emer et al. Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits, 52, 127(2017).

[41] N P Jouppi, C Young, N Patil et al. In-datacenter performance analysis of a tensor processing unit. 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), 1(2017).

[42] Y Chen, T Chen, Z Xu et al. DianNao family: energy-efficient hardware accelerators for machine learning. Commun ACM, 59, 105(2016).

[43] T Chen, Z Du, N Sun et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems(2014).

[44] Y Chen, T Luo, S Liu et al. Dadiannao: A machine-learning supercomputer. Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 609(2014).

[45] Z Du, R Fasthuber, T Chen et al. ShiDianNao:shifting vision processing closer to the sensor. ACM/IEEE International Symposium on Computer Architecture(2015).

[46] D Liu, T Chen, S Liu et al. Pudiannao: A polyvalent machine learning accelerator. ACM SIGARCH Comput Architect News, 43, 369(2015).

[47] Z Du, K Palem, A Lingamneni et al. Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators. 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), 201(2014).

[48] G Estrin. Organization of computer systems: the fixed plus variable structure computer. Western Joint IRE-AIEE-ACM Computer Conference, 33(1960).

[49]

[50] A Majumdar, S Cadambi, M Becchi et al. A massively parallel, energy efficient programmable accelerator for learning and classification. ACM Trans Architect Code Optim, 9, 1(2012).

[51]

[52] K Ando, K Ueyoshi, K Orimo et al. BRein memory: a single-chip binary/ternary reconfigurable in-memory deep neural network accelerator achieving 1.4 TOPS at 0.6 W. IEEE J Solid-State Circuits, 53, 983(2017).

[53] J Lee, C Kim, S H Kang et al. UNPU: A 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. International Solid-State Circuits Conference, 218(2018).

[54] W You, C Wu. A reconfigurable accelerator for sparse convolutional neural networks. Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 119(2019).

[55] S Liu, Z Du, J Tao et al. Cambricon: An instruction set architecture for neural networks. ACM SIGARCH Comput Architect News, 44, 393(2016).

[56] Y Zhao, Z Du, Q Guo et al. Cambricon-F: machine learning computers with fractal von neumann architecture. International Symposium on Computer Architecture, 788(2019).

[57] M Abadi, P Barham, J Chen et al. Tensorflow: A system for large-scale machine learning. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 265(2016).

[58] Y Jia, E Shelhamer, J Donahue et al. Caffe: Convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM International Conference on Multimedia, 675(2014).

[59]

[60]

[61] H Lan, Z Du. DLIR: an intermediate representation for deep learning processors. IFIP International Conference on Network and Parallel Computing, 169(2018).

[62] W Du, L Wu, X Chen et al. ZhuQue: a neural network programming model based on labeled data layout. International Symposium on Advanced Parallel Processing Technologies, 27(2019).

[63]

[64]

[65] C Mendis, J Bosboom, K Wu et al. Helium: lifting high-performance stencil kernels from stripped ×86 binaries to halide DSL code. Program Language Des Implem, 50, 391(2015).

[66] J Song, Y Zhuang, X Chen et al. Compiling optimization for neural network accelerators. International Symposium on Advanced Parallel Processing Technologies, 15(2019).