• Laser & Optoelectronics Progress
  • Vol. 56, Issue 19, 190001 (2019)
Yang Li*, Xiuwan Chen, Yuan Wang, and Maolin Liu
Author Affiliations
  • Institute of Remote Sensing and Geographic Information System, School of Earth and Space Sciences, Peking University, Beijing 100871, China
  • show less
    DOI: 10.3788/LOP56.190001 Cite this Article Set citation alerts
    Yang Li, Xiuwan Chen, Yuan Wang, Maolin Liu. Progress in Deep Learning Based Monocular Image Depth Estimation[J]. Laser & Optoelectronics Progress, 2019, 56(19): 190001 Copy Citation Text show less
    Network architecture proposed by Laina et al.[27]
    Fig. 1. Network architecture proposed by Laina et al.[27]
    Schematics of up-projections. (a) Up-projection; (b) fast up-projection
    Fig. 2. Schematics of up-projections. (a) Up-projection; (b) fast up-projection
    Replacing tranditional 5×5 convolution kernels with four small convolution kernels
    Fig. 3. Replacing tranditional 5×5 convolution kernels with four small convolution kernels
    Long-tail distributions on depth and semantic labels. (a) Pixel-depth distribution of NYU Depth V2 dataset; (b) pixel-depth distribution of KITTI dataset; (c)(d) pixel-semantic label distributions of NYU Depth V2 dataset (all categories/40 categories)
    Fig. 4. Long-tail distributions on depth and semantic labels. (a) Pixel-depth distribution of NYU Depth V2 dataset; (b) pixel-depth distribution of KITTI dataset; (c)(d) pixel-semantic label distributions of NYU Depth V2 dataset (all categories/40 categories)
    Schematic of network architecture proposed by Jiao et al.[42]
    Fig. 5. Schematic of network architecture proposed by Jiao et al.[42]
    Schematics of proposed LSU and SUC connections. (a) LSU; (b) SUC
    Fig. 6. Schematics of proposed LSU and SUC connections. (a) LSU; (b) SUC
    Schematic of network architecture proposed by Fu et al.[47]
    Fig. 7. Schematic of network architecture proposed by Fu et al.[47]
    Structural schematic of full-image encoders
    Fig. 8. Structural schematic of full-image encoders
    Overall flow chart of algorithm proposed by Garg et al.[30]
    Fig. 9. Overall flow chart of algorithm proposed by Garg et al.[30]
    Schematics of network architecture proposed by Godard et al[31]. (a) Sampling strategy with left-right consistency; (b) loss function
    Fig. 10. Schematics of network architecture proposed by Godard et al[31]. (a) Sampling strategy with left-right consistency; (b) loss function
    Schematic of semi-supervised network architecture and loss function proposed by Kuznietsov et al.[53]
    Fig. 11. Schematic of semi-supervised network architecture and loss function proposed by Kuznietsov et al.[53]
    Schematic of network frame of video depth estimation based on view synthesis[32]
    Fig. 12. Schematic of network frame of video depth estimation based on view synthesis[32]
    Schematic of GeoNet network architecture proposed by Yin et al.[59]
    Fig. 13. Schematic of GeoNet network architecture proposed by Yin et al.[59]
    MethodYearTypeeAbseRMSelgδ<1.25δ<1.252δ<1.253
    Method in Ref. [23]2014Sup.0.2150.907-0.6110.8870.971
    Method in Ref. [26]2016Sup.0.2130.7590.0870.6500.9060.976
    Method in Ref. [27]2016Sup.0.1270.5730.0550.8110.9530.988
    Method in Ref. [61]2017Sup.0.1210.5860.0520.8110.9540.987
    Method in Ref. [47]2018Sup.0.1150.5090.0510.8280.9650.992
    Method in Ref. [42]2018Sup.0.0980.3290.0400.9170.9830.996
    Method in Ref. [45]2018Sup.0.1390.5050.0580.8200.9600.989
    Method in Ref. [10]2019Sup.0.1280.5230.0590.8130.9640.992
    Table 1. Quantitative evaluation of selected algorithms on NYU Depth V2 dataset
    MethodYearTypeeAbseRMSelgδ<1.25δ<1.252δ<1.253
    Method in Ref. [23]2014Sup.0.1907.1560.2700.6920.8990.967
    Method in Ref. [26]2016Sup.0.2177.046-0.6560.8810.958
    Method in Ref. [53]2017Semi. (stereo)0.1134.6210.1890.8620.9600.986
    Method in Ref. [31]2017Unsup. (stereo)0.1144.9350.2060.8610.9490.976
    Method in Ref. [47]2018Sup.0.0722.7270.1200.9320.9840.994
    Method in Ref. [42]2018Sup.-5.1100.2150.8430.9500.981
    Method in Ref. [45]2018Sup.0.1134.687-0.8560.9620.988
    Method in Ref. [33]2018Unsup. (video)0.1094.7500.1870.8740.9580.982
    Method in Ref. [62]2018Unsup. (stereo)0.0954.3160.1770.8920.9660.984
    Method in Ref. [59]2018Unsup. (video)0.1535.7370.2320.8020.9340.972
    Method in Ref. [57]2019Unsup. (video)0.1395.1600.2150.8330.9390.975
    Table 2. Quantitative evaluation of selected algorithms on KITTI dataset
    MethodYearTypeData typeLossMain contributions
    Method in Ref. [23]2014Sup.RGB+depthInference error(original)First to use deep learning onmonocular depth estimation (MDE)
    Method in Ref. [27]2016Sup.RGB+depthInference error(berHu loss)Introduction of residual learning toMDE with optimized up-convolutions
    Method in Ref. [61]2017Sup.RGB+depthInference error(square loss)Achievement of end-to-end MDEwith CNN layers fused within CRF
    Method in Ref. [53]2017Semi.Binocular RGB +sparse depthBerhu loss (supervisedloss), image alignmenterror (unsupervisedloss), regularization lossIntroduction of a semi-superviseddeep learning approach of MDE
    Method in Ref. [47]2018Sup.RGB+depthOrdinal regression lossOrdinal regression method forMDE with dilated convolutions
    Method in Ref. [59]2018Unsup.VideoWrapping loss, depthsmoothness loss,geometric consistency lossCascaded architecture to resolve rigidflow and object motion separately indepth estimation from monocular video
    Method in Ref. [33]2018UnsupVideoReconstruction loss, wrappingloss, depth smoothness lossobject size constraintsMDE in highly dynamic scenes withexplicit modeling of 3D motions ofmoving objects and camera itself
    Table 3. Summary of selected representative algorithms
    Yang Li, Xiuwan Chen, Yuan Wang, Maolin Liu. Progress in Deep Learning Based Monocular Image Depth Estimation[J]. Laser & Optoelectronics Progress, 2019, 56(19): 190001
    Download Citation