Unsupervised Monocular Depth Estimation by Fusing Dilated Convolutional Network and SLAM

Renyue Dai; Zhijun Fang; Yongbin Gao

doi:10.3788/LOP57.061007

Journals >Laser & Optoelectronics Progress >Volume 57 >Issue 6 >Page 061007 > Article

Laser & Optoelectronics Progress
Vol. 57, Issue 6, 061007 (2020)

Unsupervised Monocular Depth Estimation by Fusing Dilated Convolutional Network and SLAM

Renyue Dai, Zhijun Fang^*, and Yongbin Gao

Author Affiliations

School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201600, China

show less

DOI: 10.3788/LOP57.061007 Cite this Article Set citation alerts

Renyue Dai, Zhijun Fang, Yongbin Gao. Unsupervised Monocular Depth Estimation by Fusing Dilated Convolutional Network and SLAM[J]. Laser & Optoelectronics Progress, 2020, 57(6): 061007 Copy Citation Text

show less

Fig. 1. Illustration of the network framework

Download full size

Comparison of standard convolution and dilated convolution filters. (a) Standard convolution filter; (b) dilated convolution filter with dilation ratio of 2; (c) dilated convolution filter with dilation ratio of 3

Fig. 2. Comparison of standard convolution and dilated convolution filters. (a) Standard convolution filter; (b) dilated convolution filter with dilation ratio of 2; (c) dilated convolution filter with dilation ratio of 3

Download full size

Fig. 3. Visualization process comparison of dilated convolution and standard convolution. (a) Visualization process of standard convolution; (b) visualization process of dilated convolution with dilation ratio of 2; (c) visualization process of dilated convolution with dilation ratio of 3

Download full size

Fig. 4. Flow chart of optimizing global camera pose by ORB-SLAM algorithm

Download full size

Fig. 5. Projection process of three-dimensional space points onto the image plane

Download full size

Fig. 6. Curves for different losses. (a) Reconstruction loss; (b) smooth loss; (c) total loss

Download full size

Fig. 7. Camera pose trajectories for different sequences in the KITTI Odometry dataset. (a) 00; (b) 01; (c) 09; (d) 02; (e) 03; (f) 10

Download full size

Fig. 8. Qualitative comparison of depth prediction. (a) RGB input image; (b) method of Garg et al.^[11]; (c) sfmlearner method^[4]; (d) our method; (e) ground truth

Download full size

Fig. 9. Visualization comparison of depth details. (a)(c) Input images; (b)(d) output images

Download full size

Method	Sequence 09		Sequence 10
Method	t_error /%	r_error per100 m /(°)	t_error /%	r_error per100 m /(°)
Luo et al.^[20]	3.72	1.60	6.06	2.22
Zhou et al.^[4]	18.77	3.21	14.33	3.30
Li et al.^[21]	7.01	3.61	10.63	4.65
Zhanet al.^[13] (Tem)	11.93	3.91	12.45	3.46
Zhan et al.^[13](New YorkUniversitydatasets)	11.92	3.60	12.62	3.43
Ours	1.70	0.50	1.43	0.52

Table 1. RMSE comparison of 09 and 10 sequences in the KITTI Odometry dataset

Method	Supervised	Data	Error				Accuracy
Method	Supervised	Data	A	S	R	lg R	δ₁ /%	δ₂ /%	δ₃ /%
Method in Ref. [5]	√	KITTI	0.214	1.605	6.563	0.292	67.3	88.4	95.7
Method in Ref. [6]	√	KITTI	0.203	1.548	6.307	0.282	70.2	89.0	95.8
Method in Ref. [7]	√	KITTI	0.202	1.614	6.523	0.275	67.8	89.5	96.5
Method in Ref. [22] (photo)	×	KITTI	0.211	1.980	6.154	0.264	73.2	89.8	95.9
Method in Ref. [22] (photo+ad)	×	KITTI	0.220	1.976	6.340	0.273	70.8	86.7	93.4
Method in Ref. [4]	×	KITTI	0.208	1.768	6.856	0.283	67.8	88.5	95.7
Method in Ref. [4](without explainability masks)	×	KITTI	0.221	2.226	7.527	0.294	67.6	88.5	95.4
Ours	×	KITTI	0.189	1.592	6.432	0.268	71.4	91.1	96.3

Table 2. Comparison of TUM evaluation results for depth estimation model

Renyue Dai, Zhijun Fang, Yongbin Gao. Unsupervised Monocular Depth Estimation by Fusing Dilated Convolutional Network and SLAM[J]. Laser & Optoelectronics Progress, 2020, 57(6): 061007

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information