A Stereo-Matching Neural Network Based on Attention Mechanism

Mingyang Cheng; Shaoyan Gai; Feipeng Da

doi:10.3788/AOS202040.1415001

Abstract

To improve the accuracy of stereo matching based on binocular vision applied to weak texture scenes, this study proposes a 3D reconstruction algorithm based on feature extraction using an attention mechanism. The proposed model uses convolutional neural network (CNN) to train feature representation of left and right images and calculates the matching cost of stereo matching. First, during the CNN feature extraction stage, attention mechanism module and channel attention mechanism module are summed to obtain the connection of each pixel in the feature image, enabling the network to capture the context information better and reconstruct weak texture areas more accurately in the reconstruction process. Second, we integrate the semantic coding loss in our neural network. The final loss function is defined as the weighted sum of the semantic coding loss and the reconstruction loss, which can effectively improve the reconstruction accuracy of a region with weak texture. We use KITTI and Sceneflow datasets to validate the algorithm. Experimental results show that the proposed method yields good improvements in accuracy, particularly in areas with weak textures.