• Journal of Semiconductors
  • Vol. 40, Issue 5, 050202 (2019)

Abstract

AI PROCESSOR

Energy-efficient reconfigurable AI processor

IEEE J. Solid-State Circuits, 54, 1120 (2018)

High computational energy-efficiency and rapid real-time response are the major concerns for applications of artificial intelligence in low-power mobile and Internet of Things devices with limited storage capacity. Due to the outstanding superiority of less memory requirement, low computation overhead and negligible accuracy degradation, deep neural networks with binary/ternary weights (BTNNs) have been widely adopted to replace traditional full-precision neural networks. Although previous hardware implementations have been proposed to accelerate the inference of BTNNs by utilizing the multiplication-free feature, some implicit characteristics in BTNN convolution, such as high arithmetic complexity and numerous redundant operations, are never exploited.

This paper proposes four optimization techniques to fully exploit these implicit characteristics for higher energy-efficiency in low-power devices. First, a feature-integral-based convolution (FIBC) method is proposed to reduce the arithmetic complexity of convolutional layers. Second, a kernel-transformation-feature-reconstruction (KTFR) convolution method is presented to remove redundant operations in BTNN convolution. Third, a hierarchical load-balancing mechanism (HLBM) is designed to eliminate zero value computation and improve resource utilization. Finally, a joint optimization approach for convolutional layers is proposed to search optimal calculation pattern for each layer. Based on the proposed four techniques, a reconfigurable processor in a 28-nm CMOS technology is designed to accelerate the inferences of BTNNs with flexible data bit-width. The four proposed techniques improve energy efficiency by 2.07×, 1.65×, 1.25× and 2.24× for BTNNs respectively, compared with the baseline implementation which disables the proposed techniques. Benchmarked with binary-weight AlexNet, the processor achieves an energy efficiency of 19.9 TOPS/W at 200 MHz and 0.9 V.

Nanjian Wu (Institute of Semiconductors, CAS, Beijing, China)

doi: 10.1088/1674-4926/40/5/050202

References