• Laser & Optoelectronics Progress
  • Vol. 55, Issue 5, 050007 (2018)
Zhenqiang Bao*, Aihua Li, Zhigao Cui, and Meng Yuan
Author Affiliations
  • Rocket Force University of Engineering, Xian, Shaanxi 710025, China
  • show less
    DOI: 10.3788/LOP55.050007 Cite this Article Set citation alerts
    Zhenqiang Bao, Aihua Li, Zhigao Cui, Meng Yuan. Research Progress of Deep Learning in Visual Localization and Three-Dimensional Structure Recovery[J]. Laser & Optoelectronics Progress, 2018, 55(5): 050007 Copy Citation Text show less
    References

    [1] Roberts L G. Machine perception of three-dimensional solids[M]. Cambridge: Massachusetts Institute of Technology, 31-39(1965).

    [2] Barrow H G, Tenenbaum J M. Interpreting line drawings as three-dimensional surfaces[J]. Artificial Intelligence, 17, 75-116(1981).

    [3] Tian Y B, Bai J, Huang Z. Depth estimation with a panoramic stereo imaging system[J]. Acta Optica Sinica, 33, 0611002(2013).

    [4] Flack J, Fox S. Rapid 2D-to-3D conversion[C]. SPIE, 4660, 78-86(2002).

    [5] Chen S E, Williams L. View interpolation for image synthesis[C]. Conference on Computer Graphics and Interactive Techniques, 279-288(1993).

    [6] Fitzgibbon A, Wexler Y, Zisserman A. Image-based rendering using image-based priors[J]. International Journal of Computer Vision, 63, 141-151(2005).

    [7] Seitz S M, Dyer C R. View morphing[C]. Conference on Computer Graphics and Interactive Techniques, 21-30(1996).

    [8] Zitnick C L, Kang S B, Uyttendaele M et al. High-quality video view interpolation using alayered representation[C]. ACM Transactions on Graphics, 23, 600-608(2004).

    [9] Lubor L, Häne C. arXiv[J]. Pollefeys M. Learning the matching function. Computer Science(2015).

    [10] Zbontar J. LeCun Y. Stereo matching by training a convolutional neural network to compare image patches[J]. Journal of Machine Learning Research, 17, 1-32(2016).

    [11] Xu L, Zhao H T, Sun S Y. Monocular infrared image depth estimation based on deep convolutional neural networks[J]. Acta Optica Sinica, 36, 0715002(2016).

    [12] Wu S C, Zhao H T, Sun S Y. Depth estimation from monocular infrared video based on bi-recursive convolutional neural network[J]. Acta Optica Sinica, 37, 1215003(2017).

    [13] Mayer N, Ilg E, Hausser P et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 4040-4048(2016).

    [14] Saxena A, Sun M, Ng A Y. Make3D: learning 3D scene structure from a single still image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 824-840(2009).

    [15] Liu F, Shen C, Lin G et al. Learning depth from single monocular images using deep convolutional neural fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 2024-2039(2016).

    [16] Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network[C]. International Conference on Neural Information Processing Systems, 2366-2374(2014).

    [17] Shi J, Pollefeys M. Pulling things out of perspective[C]. IEEE Conference on Computer Vision and Pattern Recognition, 89-96(2014).

    [18] Li B, Shen C, Dai Y et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs[C]. IEEE Conference on Computer Vision and Pattern Recognition, 1119-1127(2015).

    [19] Laina I, Rupprecht C, Belagiannis V et al. Deeper depth prediction with fully convolutional residual networks[C]. Fourth IEEE International Conference on 3D Vision, 239-248(2016).

    [20] Li B, Shen C, Dai Y et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs[C]. IEEE Conference on Computer Vision and Pattern Recognition, 1119-1127(2015).

    [21] Fan X, Zheng K, Lin Y et al. Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 8753, 1347-1355(2015).

    [22] Ummenhofer B, Zhou H, Uhrig J et al. DeMoN: depth and motion network for learning monocular stereo[C]. 30 th IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5622-5631(2017).

    [23] Kuznietsov Y, Stuckler J, Leibe B. Semi-supervised deep learning for monocular depth map prediction[C]. 30 th IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2215-2223(2017).

    [24] Liu B, Gould S, Koller D. Single image depth estimation from predicted semantic labels[C]. IEEE Conference on Computer Vision and Pattern Recognition, 1253-1260(2010).

    [25] Kendall A, Martirosyan H, Dasgupta S et al. End-to-end learning of geometry and context for deep stereo regression[C]. 16 th IEEE International Conference on Computer Vision, 66-75(2017).

    [26] Tulsiani S, Zhou T, Efros A A et al. Multi-view supervision for single-view reconstruction via differentiable ray consistency[C]. IEEE Conference on Computer Vision and Pattern Recognition, 209-217(2017).

    [27] Bell A J, Sejnowski T J. The "independent components" of natural scenes are edge filters[J]. Vision Research, 37, 3327-3338(1997).

    [28] Bourlard H, Kamp Y. Auto-association by multilayer perceptrons and singular value decomposition[J]. Biological Cybernetics, 59, 291-294(1988).

    [29] Olshausen B A, Field D J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images[J]. Nature, 381, 607-609(1996).

    [30] Salakhutdinov R, Hinton G. Deep Boltzmann machines[J]. Journal of Machine Learning Research, 5, 1967-2006(2009).

    [31] Gadelha M, Maji S. arXiv:[J]. Wang R. Shape generation using spatially partitioned point clouds. Computer Science, 1707, 06267(2016).

    [32] Rezende D J. Eslami S M A, Mohamed S, et al. Unsupervised learning of 3D structure from images[J]. Advances in Neural Information Processing Systems, 4997-5005(2016).

    [33] Yan X, Yang J, Yumer E et al. Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision[J]. Advances in Neural Information Processing Systems, 1696-1704(2016).

    [34] Jayaraman D, Grauman K. Learning image representations tied to ego-motion[C]. IEEE International Conference on Computer Vision, 1413-1421(2015).

    [35] Kendall A, Grimes M, Cipolla R. PoseNet: a convolutional network for real-time 6-DOF camera relocalization[C]. IEEE International Conference on Computer Vision, 2938-2946(2015).

    [36] Agrawal P, Carreira J, Malik J. Learning to see by moving[C]. IEEE International Conference on Computer Vision, 37-45(2015).

    [37] Garg R. Vijay K B G, Carneiro G, et al. Unsupervised CNN forsingle view depth estimation: geometry to the rescue[C]. 14 th European Conference on Computer Vision, 9912, 740-756(2016).

    [38] Kendall A, Cipolla R. Geometric loss functions for camera pose regression with deep learning[C]. IEEE Conference on Computer Vision and Pattern Recognition, 6555-6564(2017).

    [39] Flynn J, Snavely K, Neulander I et al. -03-13(2018).

    [40] Xie J, Girshick R, Farhadi A. Deep3D: Fully automatic 2D-to-3D video conversion with deep vonvolutional neural networks[C]. 14 th European Conference on Computer Vision, 9908, 842-857(2016).

    [41] Godard C, Aodha O M, Brostow G J. Unsupervised monocular depth estimation with left-right consistency[C]. IEEE Conference on Computer Vision and Pattern Recognition, 6602-6611(2017).

    [42] Konda K, Memisevic R. Learning visual odometry with a convolutional network[C]. International Conference on Computer Vision Theory and Applications, 486-490(2015).

    [43] Handa A, Bloesch M, Ptrucean V et al. gvnn: neural network library for geometric computer vision[C]. 14 th European Conference on Computer Vision, 9915, 67-82(2016).

    [44] Zhao Y, Liu G L, Tian G H et al. A survey of visual SLAM based on deep learning[J]. Robot, 39, 889-896(2017).

    [45] Wang S, Clark R, Wen H et al. DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks[C]. IEEE International Conference on Robotics and Automation, 2043-2050(2017).

    [46] Li R, Wang S, Long Zet al. UnDeepVO: monocular visual odometry through unsupervised deep learning[J]. arXiv:, 1709, 06841(2017).

    [47] Vijayanarasimhan S, Ricco S, Schmid C et al. arXiv:[J]. motion from video. Computer Science, 1704, 07804(2017).

    [48] Gadelha M, Maji S. arXiv:[J]. Wang R. 3D shape induction from 2D views of multiple objects. Computer Science, 1612, 05872(2016).

    [49] Arora R, Livescu K. Multi-view learning with supervision for transformed bottleneck features[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, 2499-2503(2014).

    [50] Shotton J, Glocker B, Zach C et al. Scene coordinate regression forests for camera relocalization in RGB-D images[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2930-2937(2013).

    [51] Zhou T, Brown M, Snavely N et al. Unsupervised learning of depth and ego-motion from video[C]. IEEE Conference on Computer Vision and Pattern Recognition, 6612-6619(2017).

    Zhenqiang Bao, Aihua Li, Zhigao Cui, Meng Yuan. Research Progress of Deep Learning in Visual Localization and Three-Dimensional Structure Recovery[J]. Laser & Optoelectronics Progress, 2018, 55(5): 050007
    Download Citation