• Acta Optica Sinica
  • Vol. 44, Issue 9, 0911003 (2024)
Yue Zhang, Huaiyu Cai*, Jing Sheng, Yi Wang, and Xiaodong Chen
Author Affiliations
  • Key Laboratory of Opto-Electronics Information Technology of Ministry of Education, College of Precision Instrument and Opto-Electronic Engineering, Tianjin University, Tianjin 300072, China
  • show less
    DOI: 10.3788/AOS231957 Cite this Article Set citation alerts
    Yue Zhang, Huaiyu Cai, Jing Sheng, Yi Wang, Xiaodong Chen. Monocular Three-Dimensional Coding Imaging Based on Double Helix Phase Mask[J]. Acta Optica Sinica, 2024, 44(9): 0911003 Copy Citation Text show less

    Abstract

    Objective

    As information technology develops rapidly, cameras are used not only as photography tools to meet users’ artistic creation needs but also as hardware devices for visual sensing, serving as the“eyes”of machines. They are now widely applied in 2D computer vision tasks such as image classification, semantic segmentation, and object recognition. However, traditional cameras have two inherent limitations. Firstly, to meet the resolution requirements, the depth of field range needs to be sacrificed. Beyond the depth of field range, image blurring caused by defocus can affect the normal operation of subsequent algorithms. Secondly, as traditional cameras map the 3D world onto a 2D plane, they lose the depth information of the scene, making it difficult to apply to rapidly developing 3D computer vision tasks. Existing methods for depth acquisition, such as structured light, time-of-flight, and multi-view geometry, are inferior to single-lens cameras in terms of power consumption, cost, and size. Therefore, we propose a single-camera 3D imaging method based on a double helix phase mask, which can achieve depth estimation and depth-of-field extension imaging simultaneously with simple hardware modifications.

    Methods

    We propose an imaging method based on a double helix phase mask that can simultaneously acquire scene depth information and achieve depth-of-field extension. By inserting a designed double helix phase mask at the aperture stop of the camera, the imaging beam is modulated into a double helix shape. On the one hand, the depth information is encoded in the image using the sensitive rotation characteristics of the double helix point spread function with defocus. On the other hand, utilizing the longer depth of focus characteristic of the double helix beam, the object points are encoded in the form of a double helix point spread function in a larger depth of field range. The depth information of object points is encoded in the image in the form of local ghosting. We combine convolutional neural networks to decode and reconstruct the encoded image end-to-end, thereby obtaining depth maps and depth of field extended images of the scene and jointly optimizing individual phase mask parameters. We analyze the influence of phase mask parameters and object distance on imaging performance and discuss the method of selecting phase mask parameters reasonably within a given depth range.

    Results and Discussions

    To validate our method, we train it on the FlyingThings3D dataset, and the trained model is tested on the NYU Depth V2 dataset. The relative error of depth estimation on the NYU Depth V2 dataset can reach as low as 0.083 (Table 2). The depth of field extended images can achieve the highest PSNR of 35.254 dB and SSIM of 0.960 (Table 3). Compared to traditional optical systems, the depth of field can be extended by several tens of times. Using a phase mask with more rings can result in a higher depth of field extension imaging, but it may cause a slight decrease in depth estimation accuracy and quality of the depth of field extended images due to increased side lobes of the double-helix point spread function. Nevertheless, the overall performance remains within an acceptable range. The depth estimation accuracy of our method is related to the depth range to be measured. Reducing the detection range or increasing the object distance can improve the average depth estimation accuracy (Fig. 13). For potential application scenes such as gate face recognition, a physical system is built within the test range of 1.1-1.32 m. The relative depth estimation error in real scenes is 2.2%, and the depth of field is extended by about 10 times (Fig. 17), proving the effectiveness and practicality of the proposed method in real scenes.

    Conclusions

    We introduce a three-dimensional imaging method based on a double helix phase, which only requires the addition of a phase mask to the existing lens to simultaneously estimate the depth of the scene from captured single frame images and achieve depth of field extension imaging. This method does not rely on built-in light sources and additional lenses, allowing for further reduction in size and power consumption. Compared to depth estimation algorithms solely based on deep learning, our method has excellent generalization because it identifies optically introduced features to estimate depth without relying on high-level semantic information about the scene. Overall, the method shows potential applications in low-cost 3D imaging and detection fields. However, there are limitations to the proposed method. It relies on texture and can effectively work in scenes with weak texture, but it may fail in cases where texture is severely missing due to overexposure and other factors (Fig. 14). In addition, being affected by noise in real scenes can lead to errors in some depth values, decreased accuracy of system average depth estimation, and slight artifacts in reconstructed images. Subsequent research could consider incorporating noise suppression into the algorithm to solve this problem.

    Yue Zhang, Huaiyu Cai, Jing Sheng, Yi Wang, Xiaodong Chen. Monocular Three-Dimensional Coding Imaging Based on Double Helix Phase Mask[J]. Acta Optica Sinica, 2024, 44(9): 0911003
    Download Citation