Optimized Deep Learning Stereo Matching Algorithm

Jihui Huang; Rongfen Zhang; Yuhong Liu; Zhixu Chen; Zipeng Wang

doi:10.3788/LOP202158.2433002

Journals >Laser & Optoelectronics Progress >Volume 58 >Issue 24 >Page 2433002 > Article

Laser & Optoelectronics Progress
Vol. 58, Issue 24, 2433002 (2021)

Optimized Deep Learning Stereo Matching Algorithm

Jihui Huang, Rongfen Zhang, Yuhong Liu^*, Zhixu Chen, and Zipeng Wang

Author Affiliations

College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China

show less

DOI: 10.3788/LOP202158.2433002 Cite this Article Set citation alerts

Jihui Huang, Rongfen Zhang, Yuhong Liu, Zhixu Chen, Zipeng Wang. Optimized Deep Learning Stereo Matching Algorithm[J]. Laser & Optoelectronics Progress, 2021, 58(24): 2433002 Copy Citation Text

show less

Fig. 1. Original network structure

Download full size

Fig. 2. Network structure proposed in this article

Download full size

Fig. 3. Attention mechanism

Download full size

Fig. 4. Cost calculation structure

Download full size

Fig. 5. Visualization results on KITTI2015 data set. (a) Left view images; (b) PSMNet predicted disparity maps; (c) predicted disparity maps of this article; (d) true disparity maps; (e) error maps

Download full size

Network	Layer	Setting	Output
Feature extraction	Layer0_1	3×3,32	$\frac{1}{2}$ H× $\frac{1}{2}$ W×32
	Layer0_2	1×1,32	$\frac{1}{2}$ H× $\frac{1}{2}$ W×32
	Layer1_x	$\begin{array}{l} 1 \times 1,32 \\ 1 \times 1,32 \end{array}$	$\frac{1}{2}$ H× $\frac{1}{2}$ W×32
	Layer2_x	$\begin{array}{l} 3 \times 3,64 \\ 3 \times 3,64 \end{array}$ (4 pairs)	$\frac{1}{4}$ H× $\frac{1}{4}$ W×64
	Layer3_x	$\begin{array}{l} 1 \times 1,128 \\ 1 \times 1,128 \end{array}$	$\frac{1}{4}$ H× $\frac{1}{4}$ W×128
	Attention mode	Channel, spatial	$\frac{1}{4}$ H× $\frac{1}{4}$ W×128
	Layer4	1×1,32	$\frac{1}{4}$ H× $\frac{1}{4}$ W×32
Cost volume	Cascade		$\frac{1}{4}$ H× $\frac{1}{4}$ W× $\frac{1}{8}$ D×64
3DCNN	3DLayer0	$\begin{array}{l} 3 \times 3 \times 3, 32 \\ 3 \times 3 \times 3, 32 \end{array}$	$\frac{1}{4}$ H× $\frac{1}{4}$ W× $\frac{1}{8}$ D×32
	3DLayer1	$\begin{array}{l} 3 \times 3 \times 3, 32 \\ 3 \times 3 \times 3, 32 \end{array}$	$\frac{1}{4}$ H× $\frac{1}{4}$ W× $\frac{1}{8}$ D×32
	3DStack1_1	$\begin{array}{l} 3 \times 3 \times 3, 64 \\ 3 \times 3 \times 3, 64 \end{array}$	$\frac{1}{8}$ H× $\frac{1}{8}$ W× $\frac{1}{16}$ D×64
	3DStack1_2	$\begin{array}{l} 3 \times 3 \times 3, 64 \\ 3 \times 3 \times 3, 64 \end{array}$	$\frac{1}{16}$ H× $\frac{1}{16}$ W× $\frac{1}{32}$ D×64
	3DStack1_3	$\begin{matrix} 3 \times 3 \times 3, 64 (deconv) \end{matrix}$	$\frac{1}{8}$ H× $\frac{1}{8}$ W× $\frac{1}{16}$ D×64
Network	Layer	Parameter	Output
3DCNN	3DStack1_4	$\begin{matrix} 3 \times 3 \times 3, 32 (deconv) \end{matrix}$	$\frac{1}{4}$ H× $\frac{1}{4}$ W× $\frac{1}{8}$ D×32
	3DStack2_1	$\begin{array}{l} 3 \times 3 \times 3, 64 \\ 3 \times 3 \times 3, 64 \end{array}$	$\frac{1}{8}$ H× $\frac{1}{8}$ W× $\frac{1}{16}$ D×64
	3DStack2_2	$\begin{array}{l} 3 \times 3 \times 3, 64 \\ 3 \times 3 \times 3, 64 \end{array}$	$\frac{1}{16}$ H× $\frac{1}{16}$ W× $\frac{1}{32}$ D×64
	3DStack2_3	$\begin{matrix} 3 \times 3 \times 3, 64 (deconv) \end{matrix}$	$\frac{1}{8}$ H× $\frac{1}{8}$ W× $\frac{1}{16}$ D×64
	3DStack2_4	$\begin{matrix} 3 \times 3 \times 3, 32 (deconv) \end{matrix}$	$\frac{1}{4}$ H× $\frac{1}{4}$ W× $\frac{1}{8}$ D×32
	3DStack3_1	$\begin{array}{l} 3 \times 3 \times 3, 64 \\ 3 \times 3 \times 3, 64 \end{array}$	$\frac{1}{8}$ H× $\frac{1}{8}$ W× $\frac{1}{16}$ D×64
	3DStack3_2	$\begin{array}{l} 3 \times 3 \times 3, 64 \\ 3 \times 3 \times 3, 64 \end{array}$	$\frac{1}{16}$ H× $\frac{1}{16}$ W× $\frac{1}{32}$ D×64
	3DStack3_3	$\begin{matrix} 3 \times 3 \times 3, 64 (deconv) \end{matrix}$	$\frac{1}{8}$ H× $\frac{1}{8}$ W× $\frac{1}{16}$ D×64
	3DStack3_4	$\begin{matrix} 3 \times 3 \times 3, 32 (deconv) \end{matrix}$	$\frac{1}{4}$ H× $\frac{1}{4}$ W× $\frac{1}{8}$ D×32
	Classify	$\begin{array}{l} 3 \times 3 \times 3,32 \\ 3 \times 3 \times 3,2 \end{array}$	$\frac{1}{4}$ H× $\frac{1}{4}$ W× $\frac{2}{8}$ D×1
Disparity regression		Upsampling	H×W×D
Disparity regression		Regression	H×W

Table 1. Specific network structure mentioned

Network	Optional module
Network	RESNet simplified	Attention mechanism	d,q	epe /pixel
PSMNet				1.09
Ours	√			1.13
	√	√		0.98
	√	√	√	0.83

Table 2. Comparison of different network structures

Network	epe /pixel	Number of parameters /10⁶
PSMNet	1.09	5.20
MC-CNN	3.79	--
GC-Net	2.51	3.50
DispNet	1.68	42.00
CRL	1.32	78.00
Ours	0.83	2.20

Table 3. Comparison of effects on SceneFlow test set

Network	3px /%	Running time /s
PSMNet	2.32	0.41
MC-CNN	3.89	67.00
GC-Net	2.87	0.90
DispNet	4.34	0.06
CRL	2.67	0.47
Ours	2.09	0.26

Table 4. Comparison on KITTI2015 dataset

d	q	epe /pixel	Time /s	GPU /GB
1	1	0.81	0.88	14.00
2	2	0.83	0.76	11.80
3	3	0.89	0.62	9.80
4	4	0.96	0.49	8.90

Table 5. Comparison of hyperparameters on SF-test

Jihui Huang, Rongfen Zhang, Yuhong Liu, Zhixu Chen, Zipeng Wang. Optimized Deep Learning Stereo Matching Algorithm[J]. Laser & Optoelectronics Progress, 2021, 58(24): 2433002

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information