• Journal of Semiconductors
  • Vol. 41, Issue 2, 022401 (2020)
Zheng Wang1, Libing Zhou2, Wenting Xie2, Weiguang Chen1, Jinyuan Su2, Wenxuan Chen2, Anhua Du2, Shanliao Li3, Minglan Liang3, Yuejin Lin2, Wei Zhao2, Yanze Wu4, Tianfu Sun1, Wenqi Fang1, and Zhibin Yu1
Author Affiliations
  • 1Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
  • 2School of Microelectronics, Xidian University, Xi'an710071, China
  • 3School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
  • 4Changzhou Campus of Hohai University, Changzhou 213022, China
  • show less
    DOI: 10.1088/1674-4926/41/2/022401 Cite this Article
    Zheng Wang, Libing Zhou, Wenting Xie, Weiguang Chen, Jinyuan Su, Wenxuan Chen, Anhua Du, Shanliao Li, Minglan Liang, Yuejin Lin, Wei Zhao, Yanze Wu, Tianfu Sun, Wenqi Fang, Zhibin Yu. Accelerating hybrid and compact neural networks targeting perception and control domains with coarse-grained dataflow reconfiguration[J]. Journal of Semiconductors, 2020, 41(2): 022401 Copy Citation Text show less
    (Color online) Structure of hybrid neural network targeting perception and control with layer-wise algorithmic kernels.
    Fig. 1. (Color online) Structure of hybrid neural network targeting perception and control with layer-wise algorithmic kernels.
    Orientation and dimensions of compact CNN filters.
    Fig. 2. Orientation and dimensions of compact CNN filters.
    (Color online) Structure and operation distribution for Mobile-Net.
    Fig. 3. (Color online) Structure and operation distribution for Mobile-Net.
    (Color online) Reconfiguration of dataflow, PE and storage functionalities for standard kernels.
    Fig. 4. (Color online) Reconfiguration of dataflow, PE and storage functionalities for standard kernels.
    (Color online) Reconfiguration of dataflow for pointwise (PW) and depthwise (DW) convolution kernels.
    Fig. 5. (Color online) Reconfiguration of dataflow for pointwise (PW) and depthwise (DW) convolution kernels.
    (Color online) Microarchitecture of proposed reconfigurable dataflow processor.
    Fig. 6. (Color online) Microarchitecture of proposed reconfigurable dataflow processor.
    (Color online) Instruction set architecture (ISA) and developing toolchain.
    Fig. 7. (Color online) Instruction set architecture (ISA) and developing toolchain.
    (Color online) Operational phases of proposed architecture.
    Fig. 8. (Color online) Operational phases of proposed architecture.
    (Color online) Comparison of Q iteration time between proposed architecture and host machine (CPU)[27].
    Fig. 9. (Color online) Comparison of Q iteration time between proposed architecture and host machine (CPU)[27].
    (Color online) (a) ASIC layout with 16 reconfigurable PEs. Logics (middle) surrounded by 18 SRAM blocks. (b) Micrograph of taped-out chip with UMC CMOS 65 nm low-leakage technology.
    Fig. 10. (Color online) (a) ASIC layout with 16 reconfigurable PEs. Logics (middle) surrounded by 18 SRAM blocks. (b) Micrograph of taped-out chip with UMC CMOS 65 nm low-leakage technology.
    (Color online) Views of the testing board. The front view contains testing IC under CLCC84 packaging and socket. The rear view contains FPGA for interfacing IC with the host machine.
    Fig. 11. (Color online) Views of the testing board. The front view contains testing IC under CLCC84 packaging and socket. The rear view contains FPGA for interfacing IC with the host machine.
    (Color online) Testing infrastructure with measurement of both signal voltages and currents.
    Fig. 12. (Color online) Testing infrastructure with measurement of both signal voltages and currents.
    (Color online) Runtime current measurement across different phases of operation under 30 MHz frequency.
    Fig. 13. (Color online) Runtime current measurement across different phases of operation under 30 MHz frequency.
    NN LayerConvolutionPoolingFCLSTMState-actionShortcut
    OperandsSparse matrixVectorDense matrixDense matrixDense matrixVector
    OperatorsSum of product (SoP)Max, min, meanSoPSoP vector multiply vector sumSoPVector sum
    Nonlinear functionsReLU sigmoidNoneReLU sigmoidSigmoid tangentReLU sigmoidNone
    Dataflow propertySerial in/out thread-level parallelismParallel in/outSerial in/out thread-level parallelismSerial in/out shared among gatesSerial in/out action nodes iterationParallel in/out
    Buffering propertyActivation dominantActivation dominantWeight dominantWeight, statesWeight, states, actionsActivation pointer
    Table 1. Operation characteristics among multiple standard neural network kernels.
    LayerFilter sizeInput sizeMAC amounts
    Standard conv
    Conv DW
    Conv PW
    Table 2. Number of operations of standard, DW and PW convolution layers.
    Layer typeInput size#. MACsMulti-threaded streaming architecture @ 100 MHzSingle-threaded latency (ms)[24]
    Max BW utilizationMax PE utilization#. streamns / streamLatency (ms)
    Conv0 Std.224 × 224 × 310.84M3%6.70%25088334083.883.8
    Conv1 DW112 × 112 × 323.61M10%6.70%25088208017.4380.5
    Conv1 PW112 × 112 × 3225.69M100%100.00%313618355.892.1
    Conv2 DW112 × 112 × 641.81M10%6.70%1254420808.7190.2
    Conv2 PW56 × 56 × 6425.69M100%100.00%156835205.588.3
    Conv3 DW56 × 56 × 1283.61M10%6.70%25088208017.4380.5
    Conv3 PW56 × 56 × 12851.38M100%100.00%1568689010.8172.9
    Conv4 DW56 × 56 × 1280.90M10%6.70%627220804.395.1
    Conv4 PW28 × 28 × 12825.69M100%100.00%78468905.486.4
    Conv5 DW28 × 28 × 2561.81M10%6.70%1254420808.7190.2
    Conv5 PW28 × 28 × 25651.38M100%100.00%7841363010.7171
    Conv6 DW28 × 28 × 2560.45M10%6.70%313620802.247.6
    Conv6 PW14 × 14 × 25625.69M100%100.00%416136305.790.7
    Conv7-11DW14 × 14 × 5120.90M10%6.70%627220804.395.1
    Conv7-11PW14 × 14 × 51251.38M100%100.00%4162711011.3180.4
    Conv12 DW14 × 14 × 5120.23M10%6.70%156820801.123.8
    Conv12 PW7 × 7 × 51225.69M100%100.00%256271106.9111
    Conv13 DW7 × 7 × 10240.45M10%6.70%313620802.247.6
    Conv13 PW7 × 7 × 102451.38M100%100.00%2565407013.8221.5
    Avg Pool7 × 7 × 10240.05M10%6.70%6417670.10.1
    FC1 × 1 × 10241.02M55%6.70%63902185.75.7
    Total569M294.33856.5
    Table 3. Benchmark of performance for MobileNet with proposed architecture[25].
    Network layer specification1st LSTM layer2nd LSTM layer (if need)1st FC layer (if need)2nd FC layer
    In nodes: 3, Out nodes: 12, Recurrent nodes: 48In nodes: 12, Out nodes: 12, Recurrent nodes: 48In nodes: 12, Out nodes: 12In nodes: 12, Out nodes: 5
    * Simulation result, not account for data transferring between disk storage and DRAM.
    NetworkPerformance (ms/sample)Average power consumption
    1 LSTM + 1 FC2 LSTM + 1 FC2 LSTM + 2 FC
    CPU Intel i7-8700 @3.20 GHz11.98122.36223.96260–70 W
    CPU Intel i7 w. GPU NVIDIA GTX 10502.874.945.7450–70 W
    Proposed design with 16 PEs @ 100 MHz *1.0331.1571.95730–50 mW
    Table 4. Benchmark of performance for LSTM networks among three processing architectures.
    Frequency (MHz)InitializeConv1Pool1Conv2Pool2FCIdleAvg. (conv1-fc)
    301.922.842.583.042.713.321.922.62
    603.325.294.765.475.095.743.324.71
    1005.198.567.678.718.268.975.197.51
    Table 5. Runtime power consumption in mW for different phases and frequencies.
    ParameterEyeriss[28]ENVISION[29]Thinker[30]This workThis work
    Technology (nm)6528656565
    Core area (mm2) 12.251.8719.363.243.24
    Bit precision (b)164/8/168/1688
    Num. of MACs168512102416256
    Core frequency (MHz)200200200100100
    Performance (GOPS)67.676368.43.251.2
    Power (mW)278442907.51 (measured)55.4 (estimated)
    Energy efficiency166.2 GOPS/W1.73 TOPS/W1.27 TOPS/W426 GOPS/W0.92 TOPS/W
    Table 6. Comparison of physical properties with state-of-the-art designs.
    Zheng Wang, Libing Zhou, Wenting Xie, Weiguang Chen, Jinyuan Su, Wenxuan Chen, Anhua Du, Shanliao Li, Minglan Liang, Yuejin Lin, Wei Zhao, Yanze Wu, Tianfu Sun, Wenqi Fang, Zhibin Yu. Accelerating hybrid and compact neural networks targeting perception and control domains with coarse-grained dataflow reconfiguration[J]. Journal of Semiconductors, 2020, 41(2): 022401
    Download Citation