Progress in Deep Learning Based Monocular Image Depth Estimation

Yang Li; Xiuwan Chen; Yuan Wang; Maolin Liu

doi:10.3788/LOP56.190001

[1] Zeng A, Song S R. NieBner M, et al. 3DMatch: learning local geometric descriptors from RGB-D reconstructions. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 199-208(2017).

[2] Wang Z, Liu H, Wang X D et al. Segment and label indoor scene based on RGB-D for the visually impaired[M]. ∥Gurrin C, Hopfgartner F, Hurst W, et al. MultiMedia modeling. Lecture notes in computer science. Cham: Springer, 8325, 449-460(2014).

[3] Mancini M, Costante G, Valigi P et al. Fast robust monocular depth estimation for Obstacle Detection with fully convolutional networks. [C]∥2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 9-14, 2016, Daejeon, Korea. New York: IEEE, 4296-4303(2016).

[4] Chen Z H, Hong Y, Wang J K et al. Monocular visual odometry based on recurrent convolutional neural networks[J]. Robot, 41, 147-155(2019).

[5] Li X Z, Yang A L, Qin B L et al. Monocular camera three dimensional reconstruction based on optical flow feedback[J]. Acta Optica Sinica, 35, 0515001(2015).

[6] Zhan K F, Chen W J, Li W S et al. Line laser 3D scene reconstruction system and error analysis[J]. Chinese Journal of Lasers, 45, 1204004(2018).

[7] Bi T T, Liu Y, Weng D D et al. Survey on supervised learning based depth estimation from a single image[J]. Journal of Computer-Aided Design & Computer Graphics, 30, 3-13(2018).

[8] Žbontar J. LeCun Y. Stereo matching by training a convolutional neural network to compare image patches[J]. The Journal of Machine Learning Research, 17, 2287-2318(2016). http://dl.acm.org/citation.cfm?id=2946710

[9] Hirschmuller H. Accurate and efficient stereo processing by semi-global matching and mutual information. [C]∥2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), June 20-25, 2005, San Diego, CA, USA. New York: IEEE, 2, 807-814(2005).

[10] Zhao S Y, Zhang L, Shen Y et al. Super-resolution for monocular depth estimation with multi-scale sub-pixel convolutions and a smoothness constraint[J]. IEEE Access, 7, 16323-16335(2019). http://sse.tongji.edu.cn/linzhang/files/Super-Resolution%20for%20Monocular%20DepthSuper-Resolution%20for%20Monocular%20Depth.pdf

[11] He L, Dong Q L, Hu Z Y. The inherent ambiguity in scene depth learning from single images[J]. Scientia Sinica (Informationis), 46, 811-818(2016).

[12] Tsai Y M, Chang Y L, Chen L G. Block-based vanishing line and vanishing point detection for 3D scene reconstruction. [C]∥2006 International Symposium on Intelligent Signal Processing and Communications, December 12-15, 2006, Tottori, Japan. New York: IEEE, 586-589(2006).

[13] Tang C, Hou C P, Song Z J. Depth recovery and refinement from a single image using defocus cues[J]. Journal of Modern Optics, 62, 441-448(2015).

[14] Prados E, Faugeras O. Shape from shading[M]. ∥Paragios N, Chen Y, Faugeras O. Handbook of mathematical models in computer ision. Boston, MA: Springer, 375-388(2009).

[15] Karsch K, Liu C, Kang S B. Depth extraction from video using non-parametric sampling[M]. ∥Fitzgibbon A, Lazebnik S, Perona P, et al. Computer vision-ECCV 2012. Lecture notes in computer science. Berlin, Heidelberg: Springer, 7576, 775-788(2012).

[16] Saxena A, Sun M, Ng A Y. Make3D: learning 3D scene structure from a single still image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 824-840(2009). http://ieeexplore.ieee.org/document/4531745/citations?tabFilter=patents

[17] Saxena A, Sun M, Ng A Y. Learning 3-D scene structure from a single still image. [C]∥2007 IEEE 11th International Conference on Computer Vision, October 14-21, 2007, Rio de Janeiro, Brazil. New York: IEEE, 9848899(2007).

[18] Liu B, Gould S, Koller D. Single image depth estimation from predicted semantic labels. [C]∥2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 13-18, 2010, San Francisco, CA, USA. New York: IEEE, 1253-1260(2010).

[19] Girshick R, Donahue J, Darrell T et al. Rich feature hierarchies for accurate object detection and semantic segmentation. [C]∥2014 IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2014, Columbus, OH, USA. New York: IEEE, 580-587(2014).

[20] Liu F, Liu P Y, Li B et al. Deep learning model design of video target tracking based on TensorFlow platform[J]. Laser & Optoelectronics Progress, 54, 091501(2017).

[21] Hinton G E. Reducing the dimensionality of data with neural networks[J]. Science, 313, 504-507(2006). http://europepmc.org/abstract/med/16873662

[22] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. [C]∥Proceedings of the 25th International Conference on Neural Information Processing Systems, December 3-6, 2012, Lake Tahoe, Nevada, USA. Canada: NIPS(2012).

[23] Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. [C]∥27th International Conference on Neural Information Processing Systems, December 8-13, 2014, Montreal, Canada. Canada: NIPS(2014).

[24] Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. [C]∥2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile. New York: IEEE, 2650-2658(2015).

[25] Grigorev A, Jiang F, Rho S et al. Depth estimation from single monocular images using deep hybrid network[J]. Multimedia Tools and Applications, 76, 18585-18604(2017). http://link.springer.com/article/10.1007/s11042-016-4200-x

[26] Liu F Y, Shen C H, Lin G S et al. Learning depth from single monocular images using deep convolutional neural fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 2024-2039(2016). http://dl.acm.org/citation.cfm?id=3026801.3026841

[27] Laina I, Rupprecht C, Belagiannis V et al. Deeper depth prediction with fully convolutional residual networks. [C]∥2016 Fourth International Conference on 3D Vision (3DV), October 25-28, 2016,Stanford, CA, USA. New York: IEEE, 239-248(2016).

[28] Cao Y, Wu Z F, Shen C H. Estimating depth from monocular images as classification using deep fully convolutional residual networks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 28, 3174-3182(2018). http://ieeexplore.ieee.org/document/8010878/

[29] Xie J Y, Girshick R, Farhadi A. Deep3D: fully automatic 2D-to-3D video conversion with deep convolutional neural networks[M]. ∥Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science. Cham: Springer, 9908, 842-857(2016).

[30] Garg R. Kumar B G V, Carneiro G, et al. Unsupervised CNN for single view depth estimation: geometry to the rescue[M]. ∥Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science. Cham: Springer, 9912, 740-756(2016).

[31] Godard C, Aodha O M, Brostow G J. Unsupervised monocular depth estimation with left-right consistency. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 6602-6611(2017).

[32] Zhou T H, Brown M, Snavely N et al. Unsupervised learning of depth and ego-motion from video. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 6612-6619(2017).

[33] Casser V, Pirk S, Mahjourian R et al. -11-15)[2019-03-15]. https:∥arxiv., org/abs/1811, 06152(2018).

[34] Bao Z Q, Li A H, Cui Z G et al. Research progress of deep learning in visual localization and three-dimensional structure recovery[J]. Laser & Optoelectronics Progress, 55, 050007(2018).

[35] Saxe A M, McClelland J L. -02-19)[2019-03-15]. https:∥arxiv.org/abs/1312.6120v1.(2014).

[36] Srivastava R K, Greff K. -11-23)[2019-03-15]. https:∥arxiv., org/abs/1507, 06228(2015).

[37] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 770-778(2016).

[38] Roy A, Todorovic S. Monocular depth estimation using neural regression forest. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 5506-5514(2016).

[39] He L, Wang G H, Hu Z Y. Learning depth from single images with deep neural network embedding focal length[J]. IEEE Transactions on Image Processing, 27, 4676-4689(2018). http://europepmc.org/abstract/MED/29994526

[40] Couprie C, Farabet C, Najman L et al. -03-14)[2019-03-15]. https:∥arxiv., org/abs/1301, 3572(2013).

[41] Chen L F, Yang Z, Ma J J et al. Driving scene perception network: real-time joint detection, depth estimation and semantic segmentation. [C]∥2018 IEEE Winter Conference on Applications of Computer Vision (WACV), March 12-15, 2018, Lake Tahoe, NV, USA. New York: IEEE, 1283-1291(2018).

[42] Jiao J B, Cao Y, Song Y B et al. Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss[M]. ∥Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018. Lecture notes in computer science. Cham: Springer, 11219, 55-71(2018).

[43] Lin T Y, Goyal P, Girshick R et al. Focal loss for dense object detection. [C]∥The IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy. New York: IEEE, 2980-2988(2017).

[44] Saxena A, Chung S H, Ng A Y. Learning depth from single monocular images. [C]∥Proceedings of the 18th International Conference on Neural Information Processing Systems, December 5-8, 2005, Vancouver, British Columbia, Canada. Canada: NIPS(2005).

[45] Li B, Dai Y C, He M Y. Monocular depth estimation with hierarchical fusion of dilated CNNs and soft-weighted-sum inference[J]. Pattern Recognition, 83, 328-339(2018). http://www.sciencedirect.com/science/article/pii/S0031320318302097

[46] Yu F. -04-30)[2019-03-15]. https:∥arxiv., org/abs/1511, 07122(2016).

[47] Fu H, Gong M M, Wang C H et al. Deep ordinal regression network for monocular depth estimation. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT. New York: IEEE, 2002-2011(2018).

[48] Herbrich R, Graepel T, Obermayer K. Support vector learning for ordinal regression. [C]∥9th International Conference on Artificial Neural Networks: ICANN '99, September 7-10, 1999, Edinburgh, UK. New York: IEEE, 97-102(1999).

[49] Chen L C, Papandreou G, Kokkinos I et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848(2018). http://www.ncbi.nlm.nih.gov/pubmed/28463186

[50] Mayer N, Ilg E, Hausser P et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 4040-4048(2016).

[51] Wang Z, Bovik A C, Sheikh H R et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 13, 600-612(2004). http://jamia.bmj.com/external-ref?access_num=10.1109/TIP.2003.819861&link_type=DOI

[52] Heise P, Klose S, Jensen B et al. PM-huber: PatchMatch with Huber regularization for stereo matching. [C]∥2013 IEEE International Conference on Computer Vision, December 1-8, 2013, Sydney, Australia. New York: IEEE, 2360-2367(2013).

[53] Kuznietsov Y, Stuckler J, Leibe B. Semi-supervised deep learning for monocular depth map prediction. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI. New York: IEEE, 2215-2223(2017).

[54] Nister D, Naroditsky O, Bergen J. Visual odometry. [C]∥Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA. New York: IEEE, 1315094(2004).

[55] Mur-Artal R, Tardós J D. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 33, 1255-1262(2017). http://ieeexplore.ieee.org/document/7946260/

[56] Yang Z H, Wang P, Xu W et al. -11-10)[2019-03-15]. https:∥arxiv., org/abs/1711, 03665(2017).

[57] Zhou L P, Ye J M, Abello M, clip loss[J/OL] et al. -12-08)[2019-03-15]. https:∥arxiv., org/abs/1812, 03368(2018).

[58] Vijayanarasimhan S, Ricco S, Schmid C, motion from video[J/OL] et al. -04-25)[2019-03-15]. https:∥arxiv., org/abs/1704, 07804(2017).

[59] Yin Z C, Shi J P. GeoNet:unsupervised learning of dense depth, optical flow and camera pose. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT. New York: IEEE, 1983-1992(2018).

[60] Ilg E, Mayer N, Saikia T et al. FlowNet 2.0: evolution of optical flow estimation with deep networks. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI. New York: IEEE, 1647-1655(2017).

[61] Xu D, Ricci E, Ouyang W L et al. Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI. New York: IEEE, 161-169(2017).

[62] Guo X Y, Li H S, Yi S et al. Learning monocular depth by distilling cross-domain stereo networks[M]. ∥Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018. Lecture notes in computer science. Cham: Springer, 11215, 506-523(2018).

[63] Kumar A R S, Bhandarkar S M, Prasad M. Monocular depth prediction using generative adversarial networks. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 18-22, 2018, Salt Lake City, UT, USA. New York: IEEE, 413-418(2018).

[64] Almalioglu Y, Saputra M R U et al. -03-05)[2019-03-15]. https:∥arxiv.org/abs/1809.05786v2.(2019).

[65] Teng Q R, Chen Y M, Huang C. Occlusion-aware unsupervised learning of monocular depth, optical flow and camera pose with geometric constraints[J]. Future Internet, 10, 92(2018). http://www.onacademic.com/detail/journal_1000041695163399_9906.html

[66] Li S M, Lei G Q, Fan R. Depthmap super-resolution based on two-channel convolutional neural network[J]. Acta Optica Sinica, 38, 1010002(2018).

微信扫一扫：分享

微信扫一扫：分享