Monocular Indoor Depth Estimation Method Based on Neural Networks with Constraints on Two-Dimensional Images and Three-Dimensional Geometry

Hao Sha; Yue Liu; Yongtian Wang; Chenguang Lu; Mengze Zhao

doi:10.3788/AOS202242.1911001

[1] Izadi S, Kim D, Hilliges O et al. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera[C], 559-568(2011).

[2] Zheng T X, Huang S, Li Y F et al. Key techniques for vision based 3D reconstruction: a review[J]. Acta Automatica Sinica, 46, 631-652(2020).

[3] Ding M, Jiang X Y. Scene depth estimation based on monocular vision in advanced driving assistance system[J]. Acta Optica Sinica, 40, 1715001(2020).

[4] Guo K Y, Yang M, Zhang M et al. Real-time monocular depth estimation method based on perspective N-point model[J]. Laser & Optoelectronics Progress, 58, 0615005(2021).

[5] Meka A, Fox G, Zollhöfer M et al. Live user-guided intrinsic video for static scenes[J]. IEEE Transactions on Visualization and Computer Graphics, 23, 2447-2454(2017).

[6] Lu Y, He W J, Wu M et al. Time-correlated Kalman depth estimation of photon-counting lidar[J]. Acta Photonica Sinica, 50, 0311001(2021).

[7] Palomer A, Ridao P, Forest J et al. Underwater laser scanner: ray-based model and calibration[J]. IEEE/ASME Transactions on Mechatronics, 24, 1986-1997(2019).

[8] Gu C J, Cong Y, Sun G. Three birds, one stone: unified laser-based 3-D reconstruction across different media[J]. IEEE Transactions on Instrumentation and Measurement, 70, 1-12(2021).

[9] Zhang R, Tsai P S, Cryer J E et al. Shape-from-shading: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21, 690-706(1999).

[10] Asada N, Fujiwara H, Matsuyama T. Edge and depth from focus[J]. International Journal of Computer Vision, 26, 153-163(1998).

[11] Favaro P, Soatto S. A geometric approach to shape from defocus[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 406-417(2005).

[12] Liu X M, Du M Z, Ma Z B et al. Depth estimation method of light field image based on occlusion scene[J]. Acta Optica Sinica, 40, 0510002(2020).

[13] Wu X W, Sahoo D, Hoi S C H. Recent advances in deep learning for object detection[J]. Neurocomputing, 396, 39-64(2020).

[14] Taghanaki S A, Abhishek K, Cohen J P et al. Deep semantic segmentation of natural and medical images: a review[J]. Artificial Intelligence Review, 54, 137-178(2021).

[15] Ciaparrone G, Sánchez F L, Tabik S et al. Deep learning in video multi-object tracking: a survey[J]. Neurocomputing, 381, 61-88(2020).

[16] Anwar S, Khan S, Barnes N. A deep journey into super-resolution[J]. ACM Computing Surveys, 53, 1-34(2021).

[17] Zhao C Q, Sun Q Y, Zhang C Z et al. Monocular depth estimation based on deep learning: an overview[J]. Science China Technological Sciences, 63, 1612-1627(2020).

[18] Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network[C], 2366-2374(2014).

[19] Laina I, Rupprecht C, Belagiannis V et al. Deeper depth prediction with fully convolutional residual networks[C], 239-248(2016).

[20] Wang P, Shen X H, Lin Z et al. Towards unified depth and semantic prediction from a single image[C], 2800-2809(2015).

[21] Liu P, Zhang Z H, Meng Z Z et al. Monocular depth estimation with joint attention feature distillation and wavelet-based loss function[J]. Sensors, 21, 54(2020).

[22] Xu D, Ricci E, Ouyang W L et al. Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation[C], 161-169(2017).

[23] Liu B, Gould S, Koller D. Single image depth estimation from predicted semantic labels[C], 1253-1260(2010).

[24] Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture[C], 2650-2658(2015).

[25] Yin W, Liu Y F, Shen C H et al. Enforcing geometric constraints of virtual normal for depth prediction[C], 5683-5692(2019).

[26] Qi X J, Liu Z Z, Liao R J et al. GeoNet: iterative geometric neural network with edge-aware refinement for joint depth and surface normal estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 969-984(2022).

[27] Yu Z H, Jin L, Gao S H. P2Net: patch-match and plane-regularization for unsupervised indoor depth estimation[M]. Vedaldi A, Bischof H, Brox T, et al. Computer Vision-ECCV 2020. Lecture notes in computer science, 12369, 206-222(2020).

[28] Paszke A, Gross S, Chintala S et al. Automatic differentiation in pytorch[EB/OL]. https://openreview.net/forum?id=BJJsrmfCZ

[29] Silberman N, Fergus R. Indoor scene segmentation using a structured light sensor[C], 601-608(2011).

[30] Yang J C, Yu K, Gong Y H et al. Linear spatial pyramid matching using sparse coding for image classification[C], 1794-1801(2009).

[31] Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation[M]. Navab N, Hornegger J, Wells W M, et al. Medical image computing and computer-assisted intervention-MICCAI 2015. Lecture notes in computer science, 9351, 234-241(2015).

[32] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C], 7132-7141(2018).

[33] Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C], 448-456(2015).

[34] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL]. https://arxiv.org/abs/1511.06434

[35] Levin A, Lischinski D, Weiss Y. Colorization using optimization[C], 689-694(2004).

[36] Kingma D P, Ba J. Adam: a method for stochastic optimization[EB/OL]. https://arxiv.org/abs/1412.6980

[37] Liu F Y, Shen C H, Lin G S. Deep convolutional neural fields for depth estimation from a single image[C], 5162-5170(2015).

[38] Chakrabarti A, Shao J Y, Shakhnarovich G. Depth from a single image by harmonizing overcomplete local network predictions[EB/OL]. https://arxiv.org/abs/1605.07081

[39] Jun L, Can Y C, Klein R et al. A two-streamed network for estimating fine-scaled depth maps from single RGB images[J]. Computer Vision and Image Understanding, 186, 25-36(2019).

[40] Cao Y, Wu Z F, Shen C H. Estimating depth from monocular images as classification using deep fully convolutional residual networks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 28, 3174-3182(2018).

[41] Lee J H, Heo M, Kim K R et al. Single-image depth estimation based on Fourier domain analysis[C], 330-339(2018).

[42] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition[C], 770-778(2016).