
- Advanced Imaging
- Vol. 2, Issue 1, 012001 (2025)
Abstract
1. Introduction
Underwater imaging is challenging as visibility is reduced due to the scattering and absorption of light by suspending particles in water. As light propagates in water, it scatters in different directions, which results in the degradation of the captured images. Light propagation in water is wavelength-dependent, as different wavelengths experience varying absorption levels, thus limiting the distance of propagation and depth at which we can visualize the objects. To relate the effects of attenuation with distance, Beer’s law may be used to measure the attenuation of light in a scattering medium, that is,
One popular underwater sensing technology is light detection and ranging (LiDAR), an active device that uses laser pulses to measure the distance or the range of the object[3]. In underwater applications, bathymetric LiDAR is the most common LiDAR sensing technology[3–7]. Bathymetric LiDAR is usually mounted on an aircraft or a boat and uses a green laser to penetrate deep in water. However, as the laser propagates deeper underwater, it loses energy due to scattering and absorption. Thus, active sensors can benefit from powerful light sources for longer range operations. However, active sensors such as LiDAR may become more complex and expensive as they require lasers and time-gated detectors.
One promising technique using multi-view imaging is three-dimensional (3D) integral imaging (InIm)[8], which captures images from different perspectives of the scene to reconstruct the scene at a particular depth plane[9]. The technique is passive and is ideal for imaging under degradations. 3D InIm has been used in underwater imaging applications for optical signal detection[10–16], object visualization[17–21], object classification[22], and object detection[16]. These works have implemented 3D InIm systems in various modalities such as with and without lenses, with diffusers, polarimetric systems, etc. In addition, computational algorithms such as the convolutional neural network (CNN)[23], generative adversarial network (GAN)[24], recurrent neural network (RNN)[25], statistical image processing[26], statistical optics[27], active polarization descattering[28], and correlation-based filters[29–31] were used to overcome the effects of turbidity and/or partial occlusion. Additionally, they have potential in many underwater applications including automated detection, classification of objects in turbidity, and marine research.
Sign up for Advanced Imaging TOC. Get the latest issue of Advanced Imaging delivered right to you!Sign up now
This paper presents an overview of recent advances in underwater optical imaging and sensing systems using 3D InIm in turbid conditions. The paper is organized into five different sections. Section 1 describes underwater imaging and 3D InIm in underwater imaging applications, and briefly discusses LiDAR. Section 2 describes the basics of 3D InIm and reconstruction algorithms, lensless imaging, and polarimetric InIm. Section 3 summarizes various applications of 3D InIm such as underwater object visualization[18,19,21], underwater signal detection[12–16], underwater object classification[22], and underwater object detection[16]. Section 4 summarizes the comparisons of the published works discussed in Sec. 3. Section 5 concludes the paper.
2. Methodology
2.1. Three-Dimensional Integral Imaging
3D InIm is a multidimensional imaging technique that can reconstruct a 3D scene at a user-defined depth using the angular and intensity information of the captured perspective images, called two-dimensional (2D) elemental images[8–13,16–22]. Conventional 2D imaging, which only collects the intensity of a scene, superimposes the background and the object. However, in 3D InIm, the angular information captured using the perspective images allows the 3D reconstruction at different depths in the scene, which gives the observer a 3D view of the scene. By reconstructing the object at the depth of interest, 3D InIm can segment the object out of the background. This technique is optimal in the maximum likelihood sense in terms of Gaussian noise for read-noise dominant images[32,33] to provide a higher signal-to-noise ratio (SNR) and has depth-sectioning capabilities to segment in-focus planes from out-of-focus planes[9]. Additionally, 3D InIm can visualize through partial obscurations by using parallax to capture non-occluded perspective images and may remedy moderate scattering conditions[13,16,20–22]. During the image acquisition stage [see Fig. 1(a)], the optical rays are projected onto 2D elemental images and recorded using a camera array or a single camera on a moving platform. InIm systems can be implemented using a one-dimensional (1D) array of cameras[12,14,15] or a 2D array of cameras[10,11,13,16–22]. With a 1D array, we only have parallax in one direction (i.e.,
Figure 1.3D integral imaging. (a) Pickup stage[
To digitally reconstruct at a given range (depth)
2.2. Lensless Imaging
Advances in optical instruments have provided users with access to better camera sensors to capture high-quality images. However, despite their improvements, camera sensors have a limited field of view due to their sensor size and focal length. To capture more photons, one would need a bigger, bulkier lens, which is expensive. An alternative solution is to use a lensless-based imaging system, which is inexpensive, less bulky, has a larger field of view, and is more compact than their lens-based counterparts[35]. In Ref. [15], a 1D lensless camera array underwater sensing system using diffusers was proposed to classify pseudorandom patterns in turbidity and partial occlusion. It was shown that this approach works well compared with its lens-based counterpart for underwater optical signal detection[14].
2.3. Underwater Polarimetric Integral Imaging
Polarimetric imaging is an imaging technique that captures the polarimetric information of a scene. Polarimetric imaging introduces an additional degree of information, enhancing the data for analysis and enabling a more comprehensive understanding of the observed scene or object. The polarimetric information of a scene contains the surface features of an object as well as the scattered light to enhance the contrast of an object scene in scattering media[36]. In polarimetric imaging, the polarimetric information of the scene can be captured by using either a polarization camera or by placing a polarization filter in front of a camera sensor. Polarimetric imaging can also be performed either using active (i.e., light source) or passive illumination (i.e., natural light) depending on the environment. In Ref. [12], an active polarimetric signal detection system using polarization-difference imaging[37,38] was proposed to transmit a polarized optical signal in turbid water to reduce the effects of turbidity. Polarization-difference imaging was applied to the captured videos and then processed using 3D InIm reconstruction [see Eq. (2)]. In Ref. [20], polarization-based image recovery using active polarization descattering and 3D InIm reconstruction [see Eq. (1)] was proposed to reduce the effects of turbidity for underwater scenes with partial occlusion.
3. Applications
In Sec. 3.1, we briefly review object visualization using statistical image processing (Sec. 3.1.1), peplography (Sec. 3.1.2), and deep learning using GAN (Sec. 3.1.3). In Sec. 3.2, we briefly review signal detection using correlation-based filtering (Sec. 3.2.1) and deep learning using (1) CNN combined with RNN (Sec. 3.2.2) and (2) CNN (Secs. 3.2.3 and 3.2.4). In Sec. 3.3, we briefly review object classification using a neural network (Sec. 3.3.1) and a dual-purpose system using deep learning (Sec. 3.3.2) with (1) CNN and (2) CNN combined with RNN.
3.1. Underwater Object Visualization
Object visualization under degradation was one of the first underwater applications reported using 3D InIm[17]. Due to difficulty visualizing the objects under high turbidity, image restoration techniques were also used along with 3D InIm to mitigate the effects of turbidity. We briefly review InIm underwater object visualization using statistical image processing[18], peplography[19], and physics-based deep learning[21]. For visualization tasks, we record images and perform 3D InIm reconstruction using Eq. (1).
3.1.1. Statistical image processing
The first object visualization report on 3D InIm reconstruction of underwater objects in turbidity was presented in Ref. [18]. A multi-class underwater turbid scene in Fig. 2(a) was degraded by turbidity. As a result, the objects could not be visualized in turbidity; thus, the authors used statistical image processing algorithms[26] along with 3D InIm reconstruction for improved visualization. The authors assumed that turbidity was caused by light scattering, which they statistically modeled as a Gaussian distribution. While this statistical model of turbidity may not be very precise, it allows for the application of statistical approaches to reduce the effects of turbidity. The authors used maximum likelihood estimation[39], gamma correction, histogram matching, and histogram equalization to enhance the visibility of the objects. Finally, the grayscale perspective images were preprocessed using 3D InIm reconstruction. Figure 2 shows an example of this approach where one object is reconstructed in focus at a particular depth plane and the other planes are blurred out.
Figure 2.(a) Clear water 3D scene used in the experiments. (b) The scene in turbid water. (c) Diagram of the 3D underwater imaging system in turbid water. (d)–(g) 3D integral imaging reconstruction. (d) 3D reconstructed bug focused at
3.1.2. Peplography
As described in Sec. 3.1.1, in Ref. [26], the authors used statistical image processing algorithms, including gamma correction, histogram equalization, and histogram stretching, that are subject to estimation errors. Therefore, it results in artificial color pixel intensities. To alleviate this issue, the authors in Ref. [19] proposed peplography (see Fig. 3), a passive technique that relies on detecting ballistic photons—photons that travel through scattering media without significant scattering—and filtering out scattered photons using statistical algorithms[27]. This technique combines photon counting with statistical algorithms to estimate the turbid medium and extracts the ballistic photons for image reconstruction. This helps to distinguish ballistic photons from scattered photons, allowing for a less noisy visualization of objects. The detected photons are then used to reconstruct a 3D image of the scene using InIm, which involves capturing multiple perspectives of the scene and processing them to create a 3D representation of the object. However, the expected number of photons is arbitrarily chosen for ballistic photon counting due to fluctuations in their intensity. If the expected value is too low, the ballistic photons may not be captured to recover the 3D object information and to restore the scene in scattering media. 3D InIm reconstruction also uses averaging, as shown in Eq. (1), to reduce noise.
Figure 3.Flow chart of peplography for imaging in scattering media. Reprinted with permission from Ref. [
3.1.3. Physics-based deep learning
Deep learning is a computational technique that uses multilayer neural networks to learn representations of features directly from data[40]. Conventional deep learning-based approaches are purely data-driven. However, for improving the performance and interpretability/reliability of deep learning models, incorporating the physical laws in the deep learning models would be beneficial as was proposed in Ref. [21]. In this work, 3D InIm with the physics-informed cycle generative adversarial network (CycleGAN)[41,42] for image recovery was proposed. The network was trained on unpaired underwater datasets in turbid water with external lighting and tested in different turbidities with external lighting with and without partial occlusion. The turbidity was varied from
Figure 4.(a) Sample clean image of a light source (
Figure 5.(a) Sample clean image of a light source (
3.2. Underwater Optical Signal Detection
Another application of 3D InIm is optical underwater signal detection, which focuses on transmitting, receiving, and detecting bits in the presence of turbidity and ambient light conditions. Thanks to improvements in computational algorithms, we can enhance the deteriorated signal to improve the detection performance. In this section, we review signal detection papers covering correlation-based filters[12] and deep learning-based signal detection methods[13–15]. The optical signals are temporally encoded, so multi-perspective videos of the transmitted signal are recorded using a camera array. Then, the video frames are converted to image frames for 3D InIm reconstruction [see Eq. (2)][12,13,16] or fed directly into a deep learning model[14,15]. On the transmitter side, a light source such as an LED-generated binary optical signal is coded with a gold sequence[45]. The gold sequence, expressed as “1” (i.e., LED on) and “0” (i.e., LED off), denoting the gold code and the flipped gold code, respectively, is transmitted underwater. At the receiver end, the transmitted signal is decoded, outputting the class conditional probabilities comparing the coded sequence to the transmitted video sequence: “1,” “0,” and “idle” (i.e., neither “1” nor “0”). The “idle” class is included in Refs. [13
3.2.1. Active polarimetric InIm using correlation-based filtering
In Sec. 2.3, we briefly introduced underwater polarimetric InIm. An example was shown in Ref. [12], where the authors proposed an underwater single-shot polarization-difference InIm system for signal detection using four-dimensional (4D) nonlinear correlation[29,30]. The underwater scene was illuminated by an active polarized light source, and a beam splitter divided the transmitted signal into two beams. These beams were then captured by two orthogonally polarized
Figure 6.Flow chart of the underwater optical signal detection pipeline using 4D nonlinear correlation. Reprinted with permission from Ref. [
3.2.2. InIm with deep learning for underwater applications
In this section, we consider a deep learning-based approach for optical signal detection. Partial occlusion was not considered in Ref. [12] but was considered in Ref. [13] in addition to turbidity. In Ref. [13], the authors proposed a signal detection-based deep learning system using a convolutional neural network with bidirectional long short-term memory (CNN-BiLSTM)[46–48]. The classifier learns the spatial information of the data using a pretrained CNN, GoogleNet[49], and learns the temporal information using the BiLSTM. The training data was recorded in clear water without occlusion and evaluated on testing data in clear and turbid water with occlusion from Beer’s coefficient of
Figure 7.Flow chart of CNN-BiLSTM-based signal detection using sliding window-based classification. Reprinted with permission from Ref. [
3.2.3. End-to-end integrated 1D InIm convolutional neural network for underwater applications
In contrast to Ref. [13], the approach in Ref. [14] is an end-to-end integrated architecture that could train on the spatial and temporal features of different camera perspectives without the intermediate steps such as camera calibration, depth estimation, and 3D InIm reconstruction of the light source. Thus, several improvements were made at the receiver end to increase the speed and to minimize hardware complexity without comprising signal detection performance (see Fig. 8). The authors used a
Figure 8.Flow chart of end-to-end integrated 1DInImCNN-based signal detection model. Reprinted with permissions from Refs. [
3.2.4. Lensless imaging using diffusers
In Sec. 3.2.3, we introduced an end-to-end integrated underwater optical signal detection methodology and compared its improvements with the CNN-BiLSTM-based signal detection system presented in Sec. 3.2.2. In this section, we extend our discussion of Sec. 3.2.3 to a lensless-based sensing system[15] (see Sec. 2.2), replacing the camera lens with a diffuser. The experiments were performed on a
Figure 9.(a)–(c) Sample training data without occlusion. (a) Central perspective video of the encoded optical signal in clear water. (b)–(c) Sample training data in turbid water from a lensless-based 1D camera array with diffusers at
3.3. Underwater Object Classification and Object Detection
Another application we present in this review is 3D InIm underwater object classification and object detection. In object classification, given an image, we assign a label to the entire image, whereas in object detection we try to localize the objects present in an image using bounding boxes. A bounding box is only assigned to a target when the confidence score exceeds a predefined threshold and has a maximum score value out of duplicate predictions according to non-maximum suppression[53]. In this section, we review underwater reports using distortion-tolerant object classification[22] and You Only Look Once Version 4 (YOLOv4) object detection[54]. However, more advanced versions of YOLO[55] and other deep learning algorithms may be used.
3.3.1. Object classification
Underwater object classification in turbidity using 3D InIm was investigated in Ref. [22]. This was the first report on using InIm for underwater 3D object classification and as such used the older neural network algorithms available. A microlens array was inserted in front of a stationary single camera sensor to capture the 3D scene on a
3.3.2. Object detection
Unlike object classification, which ignores the target locations, object detection is a suitable approach for localizing multiple objects. An example was applied in Ref. [16], where a dual-purpose underwater object detection and optical (temporal) signal detection-based system were explored using 3D InIm. The underwater objects were detected using YOLOv4, and the temporally encoded optical signals were classified using the CNN-BiLSTM model in Sec. 3.2.2. Both models were trained on clear and turbid water data without occlusion with Beer’s coefficient from
Figure 10.Sample underwater testing scenes with occlusion applying object detection and signal detection. Reprinted with permission from Ref. [
4. Results and Discussion
In summary, we compare the previously published InIm-based approaches in underwater imaging scenarios. In Sec. 3.1.1, the turbid images were processed using statistical image processing algorithms, namely, histogram matching, histogram equalization, and gamma correction to remedy the effects of turbidity. However, these algorithms generated color pixel errors. Peplography was introduced in Sec. 3.1.2 to solve the artificial color pixel intensities issue in Sec. 3.1.1 using scattering media estimation and ballistic photon detection. This method works in any homogeneous as well as inhomogeneous media. However, the optimal number of ballistic photons is manually chosen. If this parameter is not chosen correctly, the ballistic photons may not be recovered, or saturation can occur in the reconstructed 3D image.
In Sec. 3.2.1, correlation-based filtering was integrated into an active polarimetric-based InIm system for underwater signal detection. However, correlation-based filters are effective for smaller object variations and are not as effective as deep learning approaches. The earliest underwater InIm report using a neural network was highlighted in Sec. 3.3.1, which used object classification to classify underwater objects. However, the system in Sec. 3.3.1 acquired the perspective images using a microlens array, which limited the depth of field and reduced the resolution, thus, resulting in poor quality reconstructed 3D images. This approach can be improved using a camera array or a single camera sensor and with more advanced deep learning algorithms. For capturing high-quality perspective images, the systems in Secs. 3.1.1–3.1.3, 3.2.1–3.2.4, and 3.3.2 do not use a microlens array. Additionally, deep learning algorithms were applied in object visualization (Sec. 3.1.3), signal detection (Secs. 3.2.2–3.2.4), and object detection (Sec. 3.3.2).
Section 3.1.3 incorporated the physical laws into a deep learning model using an unpaired training set to eliminate the need to collect large, paired datasets, whereas Secs. 3.2.2–3.2.4 and 3.3.2 discussed purely data-driven deep learning approaches. The work in Sec. 3.1.3 can be extended to other scattering media with any other physical degradation model as well. Sections 3.2.2 and 3.3.2 implemented a 3D InIm-based system using a CNN-BiLSTM model. The reconstruction of the image frames, along with camera array calibration, depth estimation, and spatial and temporal feature extraction, increased the computational time and introduced errors, thus, influencing the detection performance. Section 3.2.3 eliminated these intermediate steps by directly feeding the captured multi-perspective images to a deep learning model (1DInImCNN) to lower the computational complexity. Section 3.2.4 replaced the lens-based imaging system in Sec. 3.2.3 with a lensless-based imaging system using diffusers and showed that by reducing the dimensionality of the diffuser data, the computational time of the 1DInImCNN model could be decreased, and the detection performance could be improved substantially. Sections 3.2.1–3.2.4 and 3.3.1 reviewed signal detection-based and object classification-based systems, respectively. Section 3.3.2 reviewed a dual-purpose underwater system that can perform object detection and temporal signal detection simultaneously.
5. Conclusion
We reviewed the published reports on underwater imaging and sensing using 3D InIm for object visualization[18,19,21], temporal signal detection[12–16], object classification[22], and object detection[16]. 3D InIm is shown to be effective in reducing the effect of turbidity, having depth-sectioning capabilities, and working in challenging environments such as partial occlusion. Visualization of objects degraded by scattering and absorption may be remedied using statistical image processing algorithms, peplography, and physics-based deep learning. For signal detection, we reviewed correlation-based filtering using a 4D nonlinear correlation filter, deep learning using a CNN-BiLSTM model, and an end-to-end integrated 1DInImCNN model. We also reviewed YOLOv4 and a distortion-tolerant neural network for object detection and object classification, respectively.
Although LiDAR was briefly discussed, detailed descriptions and reviews of this approach are outside the scope of this manuscript; therefore, we limit our discussions only to 3D InIm-based approaches. Most of our research efforts have focused on 3D InIm in turbidity without occlusion or with partial occlusion, but there are other degraded environments that we have not considered that are of importance in underwater optical imaging and that could be considered in future work such as turbulence and photon sparse environments. All underwater 3D InIm systems covered in this review are lens-based, except for the work in Ref. [15]. To the best of our knowledge, Ref. [15] is the earliest report implementing a lensless-based 3D InIm system using diffusers in underwater imaging. Lensless-based systems are gaining interest in the research community due to their low-cost, compact size, large field of view, and low weight over lens-based imaging systems. However, there is limited research on this subject in underwater imaging. Polarimetric underwater imaging, which uses active illumination, was covered in signal detection using 3D InIm and polarization-difference imaging, object visualization using 3D InIm and active polarization descattering, and in other published reports in turbid environments[57–60]. There are more advanced polarimetric imaging techniques in the underwater literature, which are outside the scope of this paper, such as Stokes-based imaging[61] and Mueller matrix imaging[62,63]. These techniques could be potential future work for underwater 3D InIm-based systems. As in any review papers of this nature, we may have omitted discussions reported in published papers or works by other research groups, for which we apologize in advance.
References
[8] G. Lippmann. La photographie intégrale. C. R. Acad. Sci., 146, 446(1908).
[20] R. Joshi, B. Javidi. Three-dimensional integral imaging visualization in scattering medium with active polarization descattering, JTu4A.39(2023).
[23] K. O’Shea, R. Nash. An introduction to convolutional neural networks(2015).
[24] J. Goodfellow et al. Generative adversarial nets(2014).
[25] R. M. Schmidt. Recurrent neural networks (RNNs): a gentle introduction and overview(2019).
[26] R. C. Gonzalez, R. E. Woods. Digital Image Processing(2008).
[27] J. W. Goodman. Statistical Optics(2015).
[34] G. Bradski, A. Kaehler. Learning OpenCV: Computer Vision with the OpenCV Library(2008).
[35] V. Boominathan et al. Recent advances in lensless imaging. Optica, 9, 1(2022).
[39] J. O. Berger. Statistical Decision Theory and Bayesian Analysis(1985).
[40] Y. LeCun, Y. Bengio, G. Hinton. Deep learning. Nature, 521, 436(2015).
[41] J.-Y. Zhu et al. Unpaired image-to-image translation using cycle-consistent adversarial networks, 2242(2017).
[42] X. Chen et al. InfoGAN: interpretable representation learning by information maximizing generative adversarial nets(2016).
[43] K. He, J. Sun, X. Tang. Single image haze removal using dark channel prior, 1956(2009).
[44] A. Candelieri. A gentle introduction to Bayesian optimization, 1(2021).
[46] S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Comput., 9, 1735(1997).
[48] J. Y.-H. Ng et al. Beyond short snippets: deep networks for video classification, 4694(2015).
[49] C. Szegedy et al. Going deeper with convolutions, 1(2015).
[54] A. Bochkovskiy, C.-Y. Wang, H.-Y. M. Liao. YOLOv4: optimal speed and accuracy of object detection(2020).
[56] I. T. Jolliffe. Principal Component Analysis(2002).

Set citation alerts for the article
Please enter your email address