Mixed-precision quantization for neural networks based on error limit (Invited)

Yiduo Li; Zibo Guo; Kai Liu; Xiaoyao Sun

doi:10.3788/IRLA20220166

Journals >Infrared and Laser Engineering >Volume 51 >Issue 4 >Page 20220166 > Article

Infrared and Laser Engineering
Vol. 51, Issue 4, 20220166 (2022)

Mixed-precision quantization for neural networks based on error limit (Invited)

Yiduo Li, Zibo Guo, Kai Liu, and Xiaoyao Sun

Author Affiliations

School of Computer Science and Technology, Xidian University, Xi'an 710071, China

show less

DOI: 10.3788/IRLA20220166 Cite this Article

Yiduo Li, Zibo Guo, Kai Liu, Xiaoyao Sun. Mixed-precision quantization for neural networks based on error limit (Invited)[J]. Infrared and Laser Engineering, 2022, 51(4): 20220166 Copy Citation Text

show less

(a) Photograph of deep learning convolutional 8-bit quantization procession[6]; (b) The distribution trend of the most valued weights in the first 20 layers of the YOLOV5 s network; (c) Distribution of activation maximum and cutoff value during network quantization in YOLOV5 s

Fig. 1. (a) Photograph of deep learning convolutional 8-bit quantization procession^[6]; (b) The distribution trend of the most valued weights in the first 20 layers of the YOLOV5 s network; (c) Distribution of activation maximum and cutoff value during network quantization in YOLOV5 s

Download full size | View in the Article

Fig. 2. Framework of network hierarchical policy methodology

Download full size | View in the Article

Fig. 3. Example of COCO dataset detection results

Download full size | View in the Article

Quantitative method	Operation
${q}\left(w,{b}_{i}\right)=round\left(w/s\right)$	Multiplication
${q}\left(w,{b}_{i}\right)=round\left(w×{2}^{fl}\right)$	Displacement

Table 1. Product quantization method and shift quantization method

View in the Article

Network model	Dataset	bit	mAP.5-.95
Network model	Dataset	bit	Displacement	Multiplication
YOLOV5 s	VOC	8	63.4%	77.9%
		7	26.5%	68.8%
		6	4.6%	39.5%
		32	81.8%

Table 2. The performance of different quantification methods on the VOC2007 dataset

View in the Article

bit		8	7	6	5	32
mAP	MAX	78.9%	67.4%	46.7%	4.0%	82.6%
mAP	MSE	82.7%	76.0%	69.0%	31.7%	82.6%

Table 3. Network accuracy before and after quantization with different truncation methods

View in the Article

γ	Compression radio	Average bit	mAP
0.08	4.93	6.49	79.6%
0.10	5.13	6.23	77.8%
0.125	5.74	5.57	72.3%
0.142	6.11	5.23	62.8%
0.166	6.31	5.07	63.3%
0.20	7.14	4.48	21.0%

Table 4. Error limit parameter γ value comparison

View in the Article

Dataset	Method	bit	γ	mAP@0.5	mAP@0.5-0.95	Model size
COCO	Unified bit	7		0.567	0.345	6.35
		6		0.503	0.301	5.45
		5		0.386	0.215	4.54
	Mixed bit	6.49	0.08	0.602	0.368	5.89
		5.57	0.125	0.546	0.322	5.05
		5.07	0.166	0.446	0.260	4.60
	Ori model	32		0.636	0.411	29.07
VOC2011	Unified bit	7		0.950	0.732	6.35
		6		0.925	0.643	5.45
		5		0.533	0.295	4.54
	Mixed bit	6.49	0.08	0.950	0.706	5.89
		5.57	0.125	0.981	0.669	5.05
		5.07	0.166	0.782	0.456	4.60
	Ori model	32		0.950	0.786	29.07

Table 5. Test results of different quantification methods on COCO dataset and VOC2011 dataset

View in the Article

Dataset	Method	bit	mAP@0.5	Aeroplane	Bicycle	Bird	Boat	Bottle	Chair	Dog	Person	Sheep	Train	Tvmonitor
VOC2011	Unite	5	0.782	0.753	0.435	0.497	0.995	0.801	0.995	0.249	0.897	0.995	0.995	0.995
VOC2011	Mixed	5	0.533	0.232	0.324	0.497	0.484	0.209	0.995	0.332	0.455	0.995	0.995	0.34

Table 6. VOC2011 dataset category accuracy detection table

Yiduo Li, Zibo Guo, Kai Liu, Xiaoyao Sun. Mixed-precision quantization for neural networks based on error limit (Invited)[J]. Infrared and Laser Engineering, 2022, 51(4): 20220166

Download Citation

Tools

Save the article for my favorites

Paper Information