Monocular Indoor Depth Estimation Method Based on Neural Networks with Constraints on Two-Dimensional Images and Three-Dimensional Geometry

Hao Sha; Yue Liu; Yongtian Wang; Chenguang Lu; Mengze Zhao

doi:10.3788/AOS202242.1911001

Journals >Acta Optica Sinica >Volume 42 >Issue 19 >Page 1911001 > Article

Acta Optica Sinica
Vol. 42, Issue 19, 1911001 (2022)

Monocular Indoor Depth Estimation Method Based on Neural Networks with Constraints on Two-Dimensional Images and Three-Dimensional Geometry

Hao Sha, Yue Liu^*, Yongtian Wang, Chenguang Lu, and Mengze Zhao

Author Affiliations

Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China

show less

DOI: 10.3788/AOS202242.1911001 Cite this Article Set citation alerts

Hao Sha, Yue Liu, Yongtian Wang, Chenguang Lu, Mengze Zhao. Monocular Indoor Depth Estimation Method Based on Neural Networks with Constraints on Two-Dimensional Images and Three-Dimensional Geometry[J]. Acta Optica Sinica, 2022, 42(19): 1911001 Copy Citation Text

show less

Fig. 1. Principle of calculating the normal of nearest neighbor point sampling method

Download full size

Fig. 2. Feature connection module based on depth channel attention mechanism

Download full size

Fig. 3. Overall architecture of monocular depth estimation method

Download full size

Fig. 4. Architecture of encoder and decoder sub-networks. (a) Sub-network structure of encoder; (b)-(d) subnetwork structures of decoder

Download full size

Fig. 5. Depth prediction results of different methods on NYU Depth v2 dataset

Download full size

Fig. 6. 3D reconstruction results based on monocular depth

Download full size

Fig. 7. Qualitative results of ablation experiments based on network architecture

Download full size

Fig. 8. Qualitative results of ablation experiments based on constraints

Download full size

Fig. 9. Quantitative results of test set in range of different depth values

Download full size

Fig. 10. Quantitative results of selected images in range of different depth values. (a) 10 images with worst RMSE; (b) 10 images with worst REL; (c) 10 images with worst TH1

Download full size

Method	RMSE	REL	$δ < 1.25$	$δ < 1 . 25^{2}$	$δ < 1 . 25^{3}$
Ref. ［18］	0.907	0.215	0.611	0.887	0.971
Ref. ［37］	0.824	0.230	0.614	0.883	0.971
Ref. ［38］	0.620	0.149	0.806	0.883	0.987
Ref. ［39］	0.635	0.143	0.788	0.958	0.991
Ref. ［40］	0.819	0.232	0.646	0.892	0.968
Ref. ［24］	0.641	0.158	0.769	0.950	0.988
Ref. ［19］	0.573	0.127	0.811	0.953	0.988
Ref. ［22］	0.586	0.121	0.811	0.954	0.987
Ref. ［26］	0.600	0.144	0.791	0.960	0.991
Ref. ［41］	0.572	0.139	0.815	0.963	0.991
Ref. ［27］	0.599	0.159	0.772	0.942	0.984
Ref. ［37］	0.555	0.126	0.843	0.968	0.991
This paper	0.552	0.164	0.768	0.940	0.984

Table 1. Quantitative comparison between proposed method and other different methods on NYU Depth v2 dataset

Method	Runing time /ms	Frame rate /（frame·s^-1）	RMSE
Ref. ［18］	23	43	0.907
Ref. ［19］	237	10	0.604
Ref. ［24］	96	6	0.753
This paper	58	17	0.552

Table 2. Comparison of running speeds of different methods

Method	RMSE	REL	$δ < 1.25$	$δ < 1 . 25^{2}$	$δ < 1 . 25^{3}$
Without skip connect	0.727	0.222	0.631	0.885	0.969
Without SE_Concat_Block	0.604	0.177	0.731	0.922	0.976
Baseline	0.586	0.178	0.738	0.932	0.982
U-net	0.647	0.202	0.681	0.915	0.978
Resnet-101	0.628	0.189	0.704	0.921	0.981

Table 3. Quantitative results of ablation experiments based on network architecture

Method	RMSE	REL	$δ < 1.25$	$δ < 1 . 25^{2}$	$δ < 1 . 25^{3}$
Baseline	0.594	0.177	0.740	0.926	0.980
With $L_{2 D}$	0.586	0.178	0.738	0.932	0.982
With $L_{2 D}$ and $L_{G}$	0.561	0.165	0.761	0.935	0.983
With $L_{2 D}, L_{G}, a n d L_{L}$	0.552	0.164	0.768	0.940	0.984

Table 4. Quantitative results of ablation experiments based on constraints

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information