• Laser & Optoelectronics Progress
  • Vol. 56, Issue 19, 190001 (2019)
Yang Li*, Xiuwan Chen, Yuan Wang, and Maolin Liu
Author Affiliations
  • Institute of Remote Sensing and Geographic Information System, School of Earth and Space Sciences, Peking University, Beijing 100871, China
  • show less
    DOI: 10.3788/LOP56.190001 Cite this Article Set citation alerts
    Yang Li, Xiuwan Chen, Yuan Wang, Maolin Liu. Progress in Deep Learning Based Monocular Image Depth Estimation[J]. Laser & Optoelectronics Progress, 2019, 56(19): 190001 Copy Citation Text show less
    Network architecture proposed by Laina et al.[27]
    Fig. 1. Network architecture proposed by Laina et al.[27]
    Schematics of up-projections. (a) Up-projection; (b) fast up-projection
    Fig. 2. Schematics of up-projections. (a) Up-projection; (b) fast up-projection
    Replacing tranditional 5×5 convolution kernels with four small convolution kernels
    Fig. 3. Replacing tranditional 5×5 convolution kernels with four small convolution kernels
    Long-tail distributions on depth and semantic labels. (a) Pixel-depth distribution of NYU Depth V2 dataset; (b) pixel-depth distribution of KITTI dataset; (c)(d) pixel-semantic label distributions of NYU Depth V2 dataset (all categories/40 categories)
    Fig. 4. Long-tail distributions on depth and semantic labels. (a) Pixel-depth distribution of NYU Depth V2 dataset; (b) pixel-depth distribution of KITTI dataset; (c)(d) pixel-semantic label distributions of NYU Depth V2 dataset (all categories/40 categories)
    Schematic of network architecture proposed by Jiao et al.[42]
    Fig. 5. Schematic of network architecture proposed by Jiao et al.[42]
    Schematics of proposed LSU and SUC connections. (a) LSU; (b) SUC
    Fig. 6. Schematics of proposed LSU and SUC connections. (a) LSU; (b) SUC
    Schematic of network architecture proposed by Fu et al.[47]
    Fig. 7. Schematic of network architecture proposed by Fu et al.[47]
    Structural schematic of full-image encoders
    Fig. 8. Structural schematic of full-image encoders
    Overall flow chart of algorithm proposed by Garg et al.[30]
    Fig. 9. Overall flow chart of algorithm proposed by Garg et al.[30]
    Schematics of network architecture proposed by Godard et al[31]. (a) Sampling strategy with left-right consistency; (b) loss function
    Fig. 10. Schematics of network architecture proposed by Godard et al[31]. (a) Sampling strategy with left-right consistency; (b) loss function
    Schematic of semi-supervised network architecture and loss function proposed by Kuznietsov et al.[53]
    Fig. 11. Schematic of semi-supervised network architecture and loss function proposed by Kuznietsov et al.[53]
    Schematic of network frame of video depth estimation based on view synthesis[32]
    Fig. 12. Schematic of network frame of video depth estimation based on view synthesis[32]
    Schematic of GeoNet network architecture proposed by Yin et al.[59]
    Fig. 13. Schematic of GeoNet network architecture proposed by Yin et al.[59]
    MethodYearTypeeAbseRMSelgδ<1.25δ<1.252δ<1.253
    Method in Ref. [23]2014Sup.0.2150.907-0.6110.8870.971
    Method in Ref. [26]2016Sup.0.2130.7590.0870.6500.9060.976
    Method in Ref. [27]2016Sup.0.1270.5730.0550.8110.9530.988
    Method in Ref. [61]2017Sup.0.1210.5860.0520.8110.9540.987
    Method in Ref. [47]2018Sup.0.1150.5090.0510.8280.9650.992
    Method in Ref. [42]2018Sup.0.0980.3290.0400.9170.9830.996
    Method in Ref. [45]2018Sup.0.1390.5050.0580.8200.9600.989
    Method in Ref. [10]2019Sup.0.1280.5230.0590.8130.9640.992
    Table 1. Quantitative evaluation of selected algorithms on NYU Depth V2 dataset
    MethodYearTypeeAbseRMSelgδ<1.25δ<1.252δ<1.253
    Method in Ref. [23]2014Sup.0.1907.1560.2700.6920.8990.967
    Method in Ref. [26]2016Sup.0.2177.046-0.6560.8810.958
    Method in Ref. [53]2017Semi. (stereo)0.1134.6210.1890.8620.9600.986
    Method in Ref. [31]2017Unsup. (stereo)0.1144.9350.2060.8610.9490.976
    Method in Ref. [47]2018Sup.0.0722.7270.1200.9320.9840.994
    Method in Ref. [42]2018Sup.-5.1100.2150.8430.9500.981
    Method in Ref. [45]2018Sup.0.1134.687-0.8560.9620.988
    Method in Ref. [33]2018Unsup. (video)0.1094.7500.1870.8740.9580.982
    Method in Ref. [62]2018Unsup. (stereo)0.0954.3160.1770.8920.9660.984
    Method in Ref. [59]2018Unsup. (video)0.1535.7370.2320.8020.9340.972
    Method in Ref. [57]2019Unsup. (video)0.1395.1600.2150.8330.9390.975
    Table 2. Quantitative evaluation of selected algorithms on KITTI dataset
    MethodYearTypeData typeLossMain contributions
    Method in Ref. [23]2014Sup.RGB+depthInference error(original)First to use deep learning onmonocular depth estimation (MDE)
    Method in Ref. [27]2016Sup.RGB+depthInference error(berHu loss)Introduction of residual learning toMDE with optimized up-convolutions
    Method in Ref. [61]2017Sup.RGB+depthInference error(square loss)Achievement of end-to-end MDEwith CNN layers fused within CRF
    Method in Ref. [53]2017Semi.Binocular RGB +sparse depthBerhu loss (supervisedloss), image alignmenterror (unsupervisedloss), regularization lossIntroduction of a semi-superviseddeep learning approach of MDE
    Method in Ref. [47]2018Sup.RGB+depthOrdinal regression lossOrdinal regression method forMDE with dilated convolutions
    Method in Ref. [59]2018Unsup.VideoWrapping loss, depthsmoothness loss,geometric consistency lossCascaded architecture to resolve rigidflow and object motion separately indepth estimation from monocular video
    Method in Ref. [33]2018UnsupVideoReconstruction loss, wrappingloss, depth smoothness lossobject size constraintsMDE in highly dynamic scenes withexplicit modeling of 3D motions ofmoving objects and camera itself
    Table 3. Summary of selected representative algorithms