Autonomous Underwater Vehicles (AUVs) are pivotal tools for ocean exploration, resource utilization, and environmental monitoring. The underwater docking process, which enables AUVs to physically connect with recovery stations for energy replenishment and data transmission, is critical for enhancing operational efficiency and mission continuity. Traditional guidance methods, such as acoustic and electromagnetic systems, face limitations in precision, robustness, and adaptability to complex underwater environments. Acoustic guidance suffers from low resolution and susceptibility to multipath interference, while electromagnetic signals degrade rapidly in water. Optical guidance, leveraging high-resolution visual or photoelectric detection, has emerged as a promising solution for close-range docking due to its superior accuracy, real-time performance, and stealth advantages. This review highlights advancements in optical guidance technologies, focusing on monocular vision, binocular vision, and position detectors, and outlines their transformative potential in enabling reliable AUV underwater recovery.
1) Monocular vision guidance. Monocular vision systems utilize a single camera to detect active or passive optical markers on docking stations. Active markers, such as LED arrays, offer long visibility ranges but require precise geometric configurations to avoid ambiguity in pose estimation. Passive markers (e.g., ArUco codes or geometric patterns) provide unique identification but are limited by shorter detection distances. Recent studies have improved robustness through multi-marker fusion and deep learning. For instance, irregularly arranged four-light beacons [Figs. 5(a)?(c)] and hybrid markers combining LEDs with black-and-white codes [Fig. 5(d)] enhance feature matching accuracy. Deep learning frameworks like YOLOV5 and CNN-based models [Fig. 6(a)] further optimize marker recognition in turbid water. Currently, deep learning-enhanced monocular visual guidance achieves sub-3 cm localization accuracy by combining beacon recognition with PnP algorithms or fusing visual data with multi-sensor inputs, but faces challenges in underwater optical attenuation, high computational demands, and limited real-time pose estimation frequency.
2) Binocular vision guidance. Binocular vision systems leverage stereo cameras to resolve depth through disparity analysis [Fig. 7(b)] . By correlating pixel coordinates of guide lights in dual images, 3D coordinates are derived using triangulation. Key advancements include camera calibration and distortion correction. Traditional calibration methods (e.g., Zhang’s checkerboard approach) ensure sub-pixel accuracy, while neural networks address nonlinear distortions caused by underwater optical windows [Fig. 7(a)]. Binocular vision guidance enhances beacon recognition range and accuracy, enabling AUVs to achieve sub-centimeter localization precision (~10 mm error) and 30-meter docking range, though its computational speed (milliseconds to hundreds of milliseconds per cycle) requires further optimization despite low hardware demands.
3) Position detector-based guidance. Position detectors, such as quadrant photodetectors (QPDs), track light spots from docking station beacons. These systems excel in high-speed tracking and angular resolution but require precise optical alignment. Experimental validations demonstrate their robustness in turbulent flows, achieving angular accuracies within 0.1°. The sea trial demonstrated that the multi-branch network optical guidance method, based on multi-quadrant photoelectric detection and real-time angle data processing, achieved an AUV position resolution speed of 5.650 ms/cycle and a mean coordinate error of 58.292 mm (best 7.107 mm at 2?3 m), fulfilling precision and efficiency requirements for terminal docking with lower computational power and energy consumption compared to existing methods.
Optical guidance for AUV underwater docking, a cornerstone technology enabling safe, continuous, and efficient marine operations, has garnered significant attention from researchers globally. This study systematically reviews two primary optical guidance paradigms: image sensor-based methods and position detector-based methods. Image sensor-based approaches, characterized by intuitive data acquisition and high positioning accuracy, dominate current practices by leveraging visual or photoelectric sensing to extract beacon features and resolve relative pose. Meanwhile, position detector-based methods, exemplified by multi-quadrant photoelectric detectors, highlight advantages in detection speed and communication-integration potential.
Despite progress, critical challenges persist: benchmark dataset limitations. While datasets have been developed, acquiring high-fidelity ground truth data remains arduous due to dynamic underwater environments and system-induced noise. Image sensor-based methods suffer from low frame rates, exacerbating latency and computational burdens during real-time processing. Position detectors, though faster, lack sufficient modulation bandwidth for high-speed communication.
To address these gaps, future advancements should focus on three synergistic directions:
1) High-speed, stable, and intelligent guidance systems. The integration of deep learning architectures, particularly large-scale models, with edge-computing frameworks will enhance real-time decision-making capabilities. Model quantization and lightweight design facilitate deployment on embedded devices, ensuring adaptive navigation in dynamic underwater scenarios.
2) Integrated optical-acoustic communication guidance. The development of multi-quadrant photodetectors with high-frequency modulation capabilities enables unify positioning and communication functions. Utilizing optical communication’s short-range high-bandwidth advantages while compensating for acoustic latency bridges the gap between near-field precision and long-range connectivity.
3) Multi-sensor fusion perception. The fusion of heterogeneous sensor data (e.g., GPS, INS, DVL) with optical guidance through advanced communication protocols and collaborative control algorithms enhances system performance. The incorporation of deep learning enables robust feature extraction and target perception, achieving centimeter-level accuracy and cross-domain sensor synergy. By synergizing these innovations, AUV underwater docking systems will evolve toward autonomous, resilient, and intelligent operation, unlocking new frontiers in marine exploration, infrastructure maintenance, and underwater robotics.