1 | A Boutros et al.[14] | DSP | Low-precision computation | DSP block to support 9-bit and 4-bit multiplication | Pack 2 × as many 9-bit and 4 × as many 4-bit multiplications compared to the baseline Arria-10-like DSP |
2 | Intel[15] | DSP | Low-precision computation | AgileX supports INT8 computation | Provide 2 × the number of 9 × 9 multipliers and doubles the amount of INT8 operations compared to the prior generation. |
3 | Intel[15] | DSP | High-accuracy computation | AgileX supports FP32, FP16 and BFLOAT16 | Provide up to 40 TFLOPs FP16 or BF16, or up to 20 TFLOPs FP32 DSP performance |
4 | Xilinx[16] | DSP | Low-precision computation | DSP Engine supports INT8 computation | VC1902 of AI Core Series provides INT8 peak performance up to 13.6 TOP/s[25] |
5 | Xilinx[16] | DSP | High-accuracy computation | DSP Engine supports FP32 and FP16 | VC1902 of AI Core Series provides FP32 peak performance up to 3.2 TFLOP/s[25] |
6 | A Boutros et al.[17] | ALM | Low-precision computation | ALM with extra carry chain, or more adders, or shadow multipliers | Extra carry chain provides a 1.5 × increase in MAC density; 4-bit adder and 9-bit shadow multiplier provides a 6.1 × increase in MAC density |
7 | J H Kim et al. [18] | ALM/CLB | Support BNN | Extra carry chain which propagates sum; additional FA | The first change reduces ALM/LUT usage by 23%–44%; the second change reduces ALM/LUT usage by 39%–60%[18].
|
8 | Intel[15] | Memory | Support more memory resources | Embedded memory, in-package HBM, off-chip memory interfaces | On-chip memory includes MLABs (640b), block RAM (M20K), and eSRAM (18 MB); in-package memory includes HBM2E; on-board memory includes DDR4/5, QDR/ RLDRAM, Intel Optane DC Persistent Memory |
10 | Xilinx[16] | Memory | Support more memory resources | Embedded memory, off-chip memory interfaces | Distributed-RAM(64-bit per CLB), block RAM (36 KB), UltraRAM (288 KB), Accelerator RAM; DDR4/LPDDR4 |
11 | Xilinx[20] | AI Engine | Artificial intelligence | An array of VLIW SIMD high-performance processors[20] | Deliver up to 8X silicon compute density at 50% the power consumption of traditional programmable logic solutions[20] |
12 | Intel[15] | Platform | For data-centric world | 10-nm Agilex; innovative chipletarchitecture[28] | Deliver up to 40% higher core performance, or up to 40% lower power over previous generation FPGAs[28] |
13 | Xilinx[16] | Platform | Adaptive compute acceleration platforms | Intelligent engines (AI and DSP), adaptable engines, andscalar engines | Achieve performance improvements of up to 20X over today's fastest FPGA implementations and over 100X over today's fastest CPU implementations[19] |