Efficient Monocular Image Depth Estimation Based on Transfer Learning

Jiatao Liu; Yaping Zhang; Yuwei Yang

doi:10.3788/LOP202259.1611002

Abstract

When performing computer vision tasks such as three-dimensional reconstruction and scene understanding, it is a basic task to recover depth information in three-dimensional space from two-dimensional images. When deep learning is currently used to complete this task, methods with higher accuracy often require a huge amount of data, and the acquisition of these data is usually complicated and expensive. In response to this problem, this paper based on transfer learning, and proposes a encoder-decoder network using global self-attention. It takes a single image as input and has a global receptive field at each stage of encoding. After decoding, the depth regression task is transformed into a classification task, greatly reducing the amount of training data required while ensuring the accuracy of the model. The experimental results show that compared with the current state-of-the-art depth estimation networks AdaBins and DPT-Hybrid, the designed model reduces the root mean square error by about 2.2% and 0.3%, and reduces the amount of training data by about 80% and 99.6%.