Cross air–water interface hydroacoustic signal detection based on laser speckle imaging

Lize Deng; Jiajia Liang; Xueyuan Huang; Wei Guo; Xiaozhong Wang; Hongyan Fu; Zhengqian Luo

doi:10.3788/COL202523.071201

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Abstract

A cross air‬–water interface hydroacoustic signal detection method based on microvibration detection at the air–water interface is proposed. Laser speckles modulated by water surface acoustic waves are recorded and used as an information carrier. Phase correlation and multichannel fusion algorithms are used to extract and enhance hydroacoustic signals. Wide frequency range (200 Hz to 20 kHz) underwater acoustic signals are detected with a frequency relative error smaller than 0.5%. Several common artificial and natural hydroacoustic signals are used as source signals and correctly reconstructed. The average intelligibility of recovered humpback whale signals evaluated with the normalized subband envelope correlation algorithm is 0.52 ± 0.02.

Keywords

cross air–water interface hydroacoustic signal detection laser speckle linear array CMOS vibration detection

1. Introduction

Detecting underwater acoustic sources from the air can allow aircraft and surface vessels to monitor the underwater acoustic environment and can be potentially used in many commercial and military applications. This kind of cross-medium detection scheme smartly utilizes the low transmission loss features of light in air and sound in water, respectively. Therefore, it has attracted the attention of many researchers around the world. As early as 2004, researchers from the Naval Undersea Warfare Center of America took the lead in using a laser Doppler vibrometer (LDV) to measure the vibration of the water surface modulated by the underwater acoustic signal. To solve the data dropout problem caused by the hydrodynamic water surface, a surface normal glint tracker was employed^[1]. The system was also successfully used in cross water–air surface uplink communication in 2006^[2]. In 2015, researchers from the Naval Air Warfare Center of America utilized an LDV to detect the vibrations of the water’s surface caused by an underwater speaker, and audio signals ranging from 50 Hz to 5 kHz were successfully detected^[3]. In 2007, a scanning LDV was used to analyze underwater acoustic wavefields, and the wavefronts scattered by circular cross-sectional obstruction were correctly established^[4]. In 2020, researchers from Donghua University, China, exploited a five-channel fiber-based LDV for underwater acoustic field measurement. The system performance was tested in Qiandao Lake and the signal probing probability of the whole sensing system was up to 59.77%^[5].

Except for LDV, many other cross air–water detecting methods were reported. In 2010, Farrant introduced the concept of optical sonar^[6], which used a ${CO}_{2}$ laser to illuminate water and generate acoustic waves as a probe signal in water. The acoustic wave was reflected by underwater objects and transmitted to the water’s surface to modulate the air–water interface, which was probed by a 532 nm laser, and the location and orientation of the underwater objects can be deduced. The optical sonar scheme is followed by many researchers. They usually used the water surface acoustic wave (WSAW) originated from hydroacoustic sources as a signal carrier^[7–11]. In 2015, researchers from the National Oceanic and Atmospheric Administration (NOAA) of America used the relationship between sound pressure and bubble void to deduce the underwater acoustic signal frequency through the return lidar signals from a collection of insonified bubbles, which were modulated by the underwater acoustic signal^[12].

Except for optical detection methods, millimeter-wave radar and airborne sonar systems were also used to detect underwater acoustic sources. In 2018, researchers from Massachusetts Institute of Technology, America, showed that water surface modulated by acoustic waves can be detected with a millimeter-wave frequency modulated carrier wave (FMCW) radar. The distance between the radar and the water surface is 30 cm. Standard cross water–air interface communication with bit rates up to 400 bps was achieved, and the system could operate correctly in the presence of surface waves with peak-to-peak amplitudes of up to 16 cm^[13]. In 2020, researchers from Stanford University, America, reported an airborne sonar system for underwater remote sensing and imaging. A 1070 nm ytterbium laser with a peak power of kilowatts was used to generate detecting acoustic waves and a specially designed capacitive micromachined ultrasonic transducer was used to detect deformation of the water surface at a 10 cm working distance. Underwater objects with a depth of 13 cm were correctly detected^[14].

Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you！Sign up now

From the existing reports, we can see that the LDV method is usually selected for cross-medium audio sensing because of its high sensitivity and technology maturity. However, LDV is a single-point detecting method, which has a small signal receive angle and faces the return signal dropout problem in hydrodynamic situations. The millimeter-wave radar and airborne sonar methods are robust for severe weather conditions, but come with the shortcomings of low sensing resolution originated from their working mechanism. The amplitudes of water surface acoustic waves caused by underwater acoustic sources usually lie in tens of nanometer levels^[8]. A method with high sensing resolution and a wide viewing angle is needed in practical applications.

Except for the LDV method, other optical vibration detection methods, including laser self-mixing^[15], holographic vibration measurement^[16], photography^[17,18], single-pixel imaging^[19], and laser speckle imaging vibration measurement^[20], have also been developed in recent years. Among these methods, the laser speckle imaging vibration measurement method has the advantages of a simple setup and flexible system configuration and has been investigated by many researchers. Correspondingly, many measuring schemes are presented, including area array camera imaging^[21–24], photodiode flux detection^[25,26], linear array camera imaging^[27–30], and rolling-shutter camera imaging^[18,31–35]. Among the existing schemes, the linear array camera speckle imaging scheme is selected in this paper for its advantages of wide sensing frequency range, high sensitivity, large viewing angle, fast processing speed, low data storage, and transmission requirement.

In this paper, a cross air–water interface hydroacoustic signal sensing system based on linear array camera imaging is constructed. Phase correlation and multichannel fusion algorithms in the phase domain are introduced to extract and enhance the vibration signals. Underwater acoustic signals with a frequency range from 200 Hz to 20 kHz are successfully detected with a relative frequency error of less than 0.5%. Several common artificial and natural underwater acoustic signals are correctly detected. To the authors’ knowledge, this is the first report of cross air–water interface underwater acoustic signal detection based on laser speckle imaging. The arrangement of this paper is as follows. The experimental setup and signal extraction algorithm are introduced in Sec. 2. The experimental results and analysis are presented in Sec. 3, and a brief conclusion is given in Sec. 4.

2. Experiments

The experimental setup is shown in Fig. 1. The setup can be divided into three modules, including the detection module, the signal generation module, and the amplitude calibration module, respectively. A beam of 532 nm laser (JDSU CDPS532 M 50 mW) is reflected by a mirror and approximately vertically illuminated at the water surface of a glass tank (length 60 cm, width 45 cm, height 45 cm). An underwater speaker (Chinese kHz Electronics Technology LTD, model UWS-015A) is placed on the bottom of the tank, which is filled with drinking water to a height of 40 cm; the distance from the speaker’s upper surface to the water surface is about 30 cm. The speaker is driven by an audio amplifier (Yamaha RX-V490), and the signal source is either a signal generator or a computer. The audio signals of the speaker are transmitted to the air–water interface and generate water surface acoustic waves, which modulate the illuminated laser beam. The modulated laser beam is diffusely reflected by the air–water interface to the air and then collected by an imaging system. The imaging system is composed of an imaging lens (Nikon Sigma Zoom Master: 35–70 mm) and a linear array CMOS camera (Basler raL2048-48gm). The camera is controlled by the Pylon Viewer software (Basler), and the recorded speckles are stored in the computers in .bmp format. The maximum line rate of the camera is 51,000 lines per second and can be configured according to the requirements of the software. To be specially noted: the line rate is configured as 16,000 lines per second and each frame contains 8000 lines (corresponding to an atemporal period of 0.5 s). In the experiment, the incident probe laser beam should be avoided from incident upright on the underwater speaker and the transmitted beamlet should be reflected by another mirror in the other direction and forbidden to enter the image system. To increase the received light flux, a piece of 10 µm thickness aluminum foil is placed on the water surface in the experiment. A typical speckle pattern is shown as the inset in Fig. 1.

Figure 1.Experimental setup of an underwater acoustic signal detection system.

To record the absolute sound level, a commercially available hydrophone (CSSC 715th Institute of China: SAH2014-11) is used. The receiving sensitivity of the hydrophone is $- 175.7 dB re µPa$ at 10 kHz. The sensor head of the hydrophone is immersed below the water surface about 10 cm, and the vertical and horizontal distances between the hydrophone and the underwater speaker center are 15 and 9 cm, respectively. The output of the hydrophone is digitized by a data acquisition card and is stored in a computer. The sound pressure level (SPL) at the water surface can be estimated as the SPL at the hydrophone sensor head and can be calculated through the output $V_{rms}$ of the hydrophone using Eq. (1), $SPL = 20 \log (V_{rms}) - RVS dB re µPa,$ (1)where RVS is the sensitivity of the hydrophone at the respective frequency.

The signal extraction algorithm is a modified version of the phase correlation algorithm that is reported in Ref. [29]. The phase correlation algorithm is considered a high-performance image alignment method^[36]. In this paper, phase correlation is combined with multichannel fusion algorithms to enhance the performance of the vibration signal extraction. The algorithms are briefly introduced in the following section.

As the speckle pattern does not change quickly, the movement of the speckle pattern can be considered a rigid body movement. The images recorded in the $i$ th row $I_{i} (x)$ and the ( $i + 1$ )th row $I_{i + 1} (x)$ can be related with a spatial shift of $Δ x$ , $I_{i} (x) = I_{i + 1} (x - Δ x) .$ (2)

Assume that $F_{i} (u)$ and $F_{i + 1} (u)$ are the Fourier transforms of $I_{i} (x)$ and $I_{i + 1} (x)$ , respectively; their cross-correlation can be written as $Γ_{i, i + 1} (u) = F_{i} {(u)}^{*} F_{i + 1} (u) = A_{i} (u) A_{i + 1} (u) \exp [i (Φ_{i + 1} - Φ_{i})],$ (3)where $u$ is the spatial coordinate in the frequency domain, $A_{i}$ is the real part of $F_{i} (u)$ , and $Φ_{i}$ represents the phase of $F_{i} (u)$ . Accordingly, the following relationship is fulfilled: $Δ Φ = Φ_{i + 1} - Φ_{i} = p Δ x,$ (4)where $p$ is a constant coefficient. As different rows represent different sampling instants, $Δ Φ$ represents the variation of phase with time. After normalization, we have $S (t) = \frac{F_{i} * F_{i + 1}}{| F_{i} * | | F_{i + 1} |} = \frac{F_{i} * F_{i + 1}}{| F_{i} * F_{i + 1} |} = \exp (i p Δ x) .$ (5)

Thus, $S (t)$ can represent the object vibration. As the Fourier transform $F_{i} (u)$ of a row image $I_{i} (x)$ is a row datum, the normalized correlation $S (t)$ is also a row datum, which is shown in Fig. 2. Therefore, different column groups can be considered as different sensing channels, and multichannel fusion can be used to enhance the signal reconstruction performance. In this paper, the phase-error-based filtering (PBF) method is selected as the multichannel data fusion algorithm^[37].

Figure 2.Flowchart of signal extraction algorithm.

Two columns, $s_{τ} (t)$ and $s_{k} (t)$ , in $S (t)$ are selected and segmented into M frames (frame length 64 ms) using a Hanning window; then they are Fourier transformed as $S_{τ} (ω)$ and $S_{k} (ω)$ . The phase variation between $S_{τ} (ω)$ and $S_{k} (ω)$ at the $m$ th frame, $θ_{τ, k, m}$ , can be calculated as $θ_{τ, k, m} (ω) = X_{τ, m} (ω) - X_{k, m} (ω),$ (6)where $X_{τ, m} (ω)$ and $X_{k, m} (ω)$ are the $m$ th frame phases of $S_{τ} (ω)$ and $S_{k} (ω)$ , respectively.

The signal decay coefficient of individual channel $τ$ can be calculated as $ψ_{τ, m} (ω) = {[Π_{k = 1, k \neq τ}^{C} η_{τ, k, m} (ω)]}^{\frac{1}{r}},$ (7)where $C$ is the total number of input channels, $η_{τ, k, m} (ω) = 1 / [1 + γ θ_{τ, k, m}^{2} (ω)]$ is the decaying function, and $r$ is the weighting factor, which is set to the total frames $M$ . $γ$ is a constant and is empirically set to 5, which gives good performance in wide signal-to-noise ratio (SNR) ranges. The fusion of these filtered signals in the frequency domain can improve the demodulated signal, $S_{m}^{'} (ω) = \sum_{τ = 1}^{c} ψ_{τ, m} (ω) S_{τ, m} (ω) .$ (8)

The inverse Fourier transform of $S_{m}^{'} (ω)$ to the temporal domain can obtain the enhanced signal $s_{m}^{'} (t)$ . The final enhanced extracted signal $s^{'} (t) = \sum_{m = 1}^{M} s_{m}^{'} (t)$ . The data processing flow is introduced in Fig. 2. The implementation of these algorithms is described in the following: 1) Transform image to data matrix by MATLAB function imread; 2) set pixels with gray value smaller than 6 as 0 to filter noise; 3) fast Fourier transform (FFT) on rows and calculate the phase difference matrix of adjacent rows according to Eq. (5); 4) set the Hanning window frame length at 64 ms, step length 1 ms, and $γ = 5$ ; 5) FFT for selected 10 channels; 6) powder bed fusion (PBF) according to Eqs. (6), (7), and (8); 7) IFFT on fused signal to get $s_{m}^{'} (t)$ ; 8) align and add all frame signals to get $s^{'} (t)$ .

Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you！Sign up now

To fulfill the Nyquist sampling criteria, the line rate of camera is set to 16,000 lines/s for 200–8000 Hz signals. For 9–16 kHz signals, the line rate is set to 32,000 lines/s. For 17–20 kHz signals, the line rate is set to 44,000 lines/s. Except for the temporal sampling criteria, the spatial sampling criteria, that is, the average speckle size, should be bigger than 2 pixels and also must be fulfilled. This condition can be fulfilled through the adjustment of imaging parameters.

In the experiment, the path length from the laser source to the water surface is 1.44 m. The distance from the image sensor to the water surface is about 1.90 m. The imaging lens is defocused and focused at 0.70 m. The focus length of the image lens is adjusted to 50 mm and the aperture number is F/5.6. The beam waist radius of the 532 nm laser is about 0.3 mm. The laser spot size D illuminated on the object to be detected can be calculated according to the propagation of the laser beam and is 1.732 mm. Accordingly, the average object speckle size on the focusing plane is $λ L_{1} / D$ and is 0.368 mm ( $λ$ is the laser wavelength and $L_{1}$ is the distance between the water surface and lens focus point). Consequently, the subjective speckle size can be calculated as $λ L_{1} \times f / (L_{2} \times D)$ and is about 26.3 µm ( $f$ is the focus length of the imaging lens, and $L_{2}$ is the distance between the lens and the lens focus point). The pixel size of the camera is 7.0 µm. In all directions, the average speckle image occupies 3.8 pixels, thus meeting the requirement of the spatial sampling law.

3. Results

First, the vibration detection performance of single-frequency sinusoid signals is tested. The experimental setup is the same as that shown in Fig. 1, and the experimental parameters are as introduced in Sec. 2. According to the algorithm in Sec. 2, the frequency of the source audio signal can be reconstructed. Typical experimental results are shown in Fig. 3. The data in Fig. 3 are normalized.

Figure 3.Typical signal extraction results. (a) 200 Hz; (b) 4 kHz; (c) 14 kHz; (d) 20 kHz.

From Fig. 3, we can see that every signal is correctly reconstructed, and the frequency relative error is smaller than 0.2%. There are evident harmonics in Fig. 3(a); the authors think the main reason may be that the speaker plays the signal with harmonics. The harmonics of the 4, 14, and 20 kHz signals are filtered out in the processing procedure. Typical signal reconstruction results are summarized in Table 1, and the frequency relative errors are all smaller than 0.3%. The frequency relative errors before and after the multichannel fusion are compared; the results show no obvious improvement.

Original signal frequency (Hz)	Detected average frequency (Hz)	Relative error (%)	SPL (dB/μPa)
200.0	200.1	0.05	173.8
500.0	500.6	0.12	165.9
1000.0	999.4	0.01	153.9
4000.0	3998.9	0.28	172.6
7000.0	6998.0	0.03	180.6
10,000.0	10,017.5	0.18	177.7
14,000.0	14,024.3	0.17	177.6
20,000.0	19,989.8	0.05	168.8

Table 1. Typical Recovered Signal Frequency Characteristics

View all Tables

The sensibility of the vibration detecting system is verified with a commercial LDV vibrometer (Polytec OFV-3001 OFV-303). With a 300 Hz underwater audio signal, the minimum measurable amplitude is about 69 nm. While at 1 kHz, the minimum measurable amplitude is about 5 nm. In the latter case, the vibration waveform is not stable and can only be roughly estimated.

The absolute SPLs of different frequencies are measured with the hydrophone and are listed in Table 1. The values range from 153.9 to 180.6 dB re µPa. The peak SPL of the 10 kHz in-water acoustic signal was calculated to be 177.7 dB re µPa at the water surface from the reference hydrophone output voltage using Eq. (1) with $V_{rms} = 1.26 V$ . The measured SPL is consistent with the results of other researchers^[1]. It should be noted that the SPLs listed in Table 1 are not the minimum detectable SPLs of the sensing system introduced in this paper.

In this section, the signal detection performance for burst audio signals of underwater marine animals and artificial vehicles is evaluated. First, a humpback whale audio signal is selected as a signal source to drive the underwater speaker. The typical waveforms and spectrograms of the original signal and the reconstructed signal are shown in Fig. 4, where PC means phase correlation and PC_PBF means phase correlation combined with phase-error-based filtering. It can be seen from the spectrograms that the characteristic frequencies are reconstructed appropriately. The SNR of the reconstructed signals is evaluated with the segmental SNR (SegSNR) algorithm^[38] and the average SegSNR of 20 reconstructed signals is $(- 3.89 \pm 0.32) dB$ . The intelligibility of the reconstructed signals is evaluated with the normalized subband envelope correlation (NSEC) algorithm^[39]. The score of NSEC lies between 0 and 1, and 1 means the intelligibility is as good as the original audio signal. Usually, an NSEC better than 0.30 means the audio can be intelligible. The average NSEC score of 20 reconstructed signals is $0.48 \pm 0.03$ . The log-likelihood ratio (LLR) algorithm is used to estimate the likelihood between the original and recovered signals^[35]. The LLR score ranges from 0 to 2, and 0 means the best likelihood. The average LLR score of 20 reconstructed signals is $1.71 \pm 0.11$ . The results show that the SegSNR and the intelligibility performance are good.

Figure 4.Waveforms and spectrograms of humpback whale’s audio signal. (a) Original signal; (b) PC reconstructed signal; (c) PC_PBF enhanced signal.

To further enhance the recovered signals, the PBF multichannel fusion algorithm is employed. Ten channels with the highest scales are selected and fused together using the algorithm introduced in Sec. 2. The typical fusion results of the humpback whale audio signal are shown in Fig. 4(c), and the evaluation results are summarized in Table 2. Compared with the PC method, the SegSNR, NSEC score, and LLR score of the PC_PBF method for humpback whale signals are increased by 1.61 dB, 0.04 (8.0%), and 0.20 (12.0%), respectively. The standard deviations of the PC_PBF method are smaller than those of the PC algorithm in all situations. These results show that multichannel fusion can enhance the reconstruction performance and the computational stability. The typical evaluation results for killer whale and sonar are also shown in Table 2. The enhancement in SegSNR and NSEC is obvious.

Method	Seg SNR (dB)	NSEC score	LLR score
1 PC	−3.89 $\pm$ 0.32	0.48 $\pm$ 0.03	1.71 $\pm$ 0.11
1 PC_PBF	−2.28 $\pm$ 0.13	0.52 $\pm$ 0.02	1.51 $\pm$ 0.06
2 PC	−10.56 $\pm$ 0.42	0.33 $\pm$ 0.02	1.64 $\pm$ 0.10
2 PC_PBF	−7.87 $\pm$ 0.31	0.36 $\pm$ 0.02	1.52 $\pm$ 0.07
3 PC	−6.51 $\pm$ 0.72	0.68 $\pm$ 0.04	1.97 $\pm$ 0.02
3 PC_PBF	−2.37 $\pm$ 0.32	0.71 $\pm$ 0.03	1.91 $\pm$ 0.02

Table 2. Typical Evaluation Results for the PC Method and PC_PBF Method^a

View all Tables

Second, artificial audio signals from sonar, torpedo, and other natural audio signals from a dolphin, sea turtle, and killer whale are used as signal sources and experimented upon. The results show that all these signals can be correctly reconstructed.

To further testify to the performance, audio signals of the humpback whale and killer whale are mixed together to drive the underwater speaker. The NSEC and segmental SNR value of the reconstructed mixed signal is $(0.50 \pm 0.02)$ and $(- 3.44 \pm 0.46) dB$ , respectively. The NSEC and SegSNR value of the hydrophone recorded signal is $(0.65 \pm 0.03)$ and $(- 2.38 \pm 0.41) dB$ , respectively. The waveforms of the recovered signal and the hydrophone recorded signal are similar to the original mixed signal. From the evaluation results of NSEC and SegSNR, the signal recorded by the hydrophone is better than that reconstructed from speckle. The NSEC scores of the reconstructed mixed signal are $0.37 \pm 0.02$ and $0.34 \pm 0.02$ for the killer whale and humpback whale, respectively. The audio signal can be clearly distinguished as a mixed signal of humpback whale and killer whale signals for human ears.

The processing platform is a personal computer with a 3.20 GHz CPU (AMD Ryzen 5800H),16 GB of RAM, and a 4 GB graphics card (NVIDIA GeForce GTX 1650). The identifying process of 20 signals lasts 2.05 s, which shows that this method can identify underwater objects in real time.

In the experiment, aluminum foil is placed on the water surface to increase the scattering flux. This foil restricts the application of the sensing system to the laboratory environment. As long as the angle between the laser incident direction and the normal of the aluminum foil is smaller than 20 deg, the system performance is acceptable. Without the aluminum foil, the refraction and reflection of light should be considered.

The minimum detectable SPL at the water surface decides the application of the presented system. For the commercial Polytec OFV 353 sensor, the minimum detectable SPL is 120 dB re µPa^[1]. For the sensing system presented in this paper, experimental results show that the minimum detectable SPLs are 131.5 and 132.1 dB re µPa for 250 Hz and 2.5 kHz sinusoid signals, respectively. In a real marine environment, the SPL of sea turtles usually lies around 120 dB re µPa^[40], which is beyond the sensing range of the system presented in this paper. For cross air–water communication, the SPL of transducers is around 167 dB re µPa^[1], which lies in the detectable range of the sensing system presented in this paper. With the increase of the distance between the sensor and the vibrating object, the vibrating amplitude can be magnified through a defocused image. Therefore, the sensitivity can be increased with the increase of measuring distance.

The following parameters, such as water turbidity, water fluctuations, and ambient light, all affect the measurement performance of the sensor. For the present setup with aluminum foil, the effect of water turbidity can be ignored. A multibeamlet scheme can be used to suppress the data drop-out problem caused by water fluctuation. The interference caused by environmental factors (such as wind and passing ships) is low-frequency vibration (below 3 Hz^[13]) and can be filtered through a high-pass filter in the frequency domain. The effect of ambient light can be suppressed through an optical bandpass filter, which only transmits signal light to the image sensor.

The sensing performances are compared in Table 3. From Table 3, we can see that the proposed method has a wider viewing angle, lower detectable SPL, and longer sensing distance. Furthermore, the system setup proposed in this paper consists only of a laser source and a camera, which makes it compact and robust. If employing the multibeamlet scheme, simultaneous multipoint detection can be realized. As a comparison, the LDV scheme is principally a single-point method. Compared with the hydrophone method, the advantages of the proposed method are easy to deploy and suitable for large-range surveillance. The weakness of the proposed method is that its present performance is inferior to that of a hydrophone, the technique’s maturity has not reached the requirement of practical application, and the effect of environmental noise must be considered and suppressed.

Performance	LDV^[1]	Millimeter-wave radar^[13]	Airborne sonar^[14]	This paper
Frequency range	Single frequency 10 kHz	100–200 Hz	Single frequency 71 kHz	200 Hz–20 kHz
Viewing angle	0.13 deg	No data	No data	> 20 deg
Measuring distance	1.981 m	0.3 m	0.1 m	1.9 m
SPL	150 dB re µPa (experimental)	No data	80 dB re µPa (theoretical)	131.5 dB re µPa (experimental)
Depth of detection	2 m	0.9–3.6 m	0.13 m	0.3 m
Detectable amplitude	nm	µm	No data	nm
Working principle	Doppler frequency shift	Reflection phase variation	Piezoelectric transducer	Laser speckle imaging
Cost	Expensive	Low cost	Expensive	Low cost

Table 3. Performance Comparison of Various Methods

View all Tables

4. Conclusion

In conclusion, a cross air–water interface underwater acoustic sensing system based on a linear array camera imaging is constructed. Phase correlation and a PBF multichannel fusion algorithm are introduced to extract and enhance the vibration signals. 1) Underwater acoustic signals with a frequency range from 200 Hz to 20 kHz are successfully detected with a relative frequency error of less than 0.5%. The minimum detectable SPL is 131.5 dB re µPa at 250 Hz. 2) Natural audio signals, including dolphins, turtles, killer whales, humpback whales, and artificial audio signals, including sonar and torpedoes, are tested and correctly reconstructed. For the reconstructed humpback whale audio signal, the NSEC and segmental SNR are $(0.52 \pm 0.02)$ and $(- 2.28 \pm 0.13) dB$ , respectively. 3) Mixed humpback whale signals and killer whale signals are correctly reconstructed with NSEC better than $0.50 \pm 0.02$ .

References

[1] L. Antonelli, F. Blackmon. Experimental demonstration of remote, passive acousto-optic sensing. J. Acoust. Soc. Am., 116, 3393(2004).

[2] F. A. Blackmon, L. T. Antonelli. Experimental detection and reception performance for uplink underwater acoustic communication using a remote, in-air, acousto-optic sensor. IEEE J. Oceanic Eng., 31, 179(2006).

[3] P. Land, J. Roeder, D. Robinson, A. Majumdar. Application of a laser Doppler vibrometer for air-water to subsurface signature detection. Proc. SPIE, 9461, 94611H(2015).

[4] A. R. Harland, J. N. Petzing, J. R. Tyrer. Visualising scattering underwater acoustic fields using laser Doppler vibrometry. J. Sound. Vib., 305, 659(2007).

[5] J. Shang, Y. Liu, J. Sun et al. Five-channel fiber-based laser Doppler vibrometer for underwater acoustic field measurement. Appl. Opt., 59, 676(2020).

[6] D. Farrant, J. Burke, L. Dickinson et al. Opto-acoustic underwater remote sensing (OAURS)-an optical sonar?. OCEANS’ 10 IEEE Sydney, 1(2010).

[7] R. Miao, Y. Wang, F. Meng et al. Optical measurement of the liquid surface wave amplitude with different intensities of underwater acoustic signal. Opt. Commun., 313, 285(2014).

[8] L. Zhang, X. Zhang, W. Tang. Amplitude measurement of weak sinusoidal water surface acoustic wave using laser interferometer. Chin. Opt. Lett., 13, 091202(2015).

[9] L. Zhao, J. Zhang. Expanding research on laser coherent detection of underwater sound source. Optik, 139, 145(2017).

[10] M. C. Tsai, Y. H. Chang, C. W. Chow. Water-to-air unmanned aerial vehicle (UAV) based rolling shutter optical camera communication (OCC) system with gated recurrent unit neural network (GRU-NN). Opt. Express, 32, 41014(2024).

[11] K. Tanaka, A. Kariya, K. Kuwahara et al. Real-time full-duplex transmission experiment for an air and underwater invisible light communication system. Opt. Continuum, 3, 2260(2024).

[12] J. H. Churnside, K. Naugolnykh, R. D. Marchbanks. Optical remote sensing of sound in the ocean. J. Appl. Rem. Sens., 9, 096038(2015).

[13] F. Tonolini, F. Adib. Networking across boundaries: enabling wireless communication through the water-air interface. Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, 117(2018).

[14] A. Fitzpatrick, A. Singhvi, A. Arbabian. An airborne sonar system for underwater remote sensing and imaging. IEEE Access, 8, 189945(2020).

[15] S. Donati. Developing self-mixing interferometry for instrumentation and measurements: self-mixing interferometry. Laser Photonics Rev., 6, 393(2012).

[16] T. Kakue, Y. Endo, T. Nishitsuji et al. Digital holographic high-speed 3D imaging for the vibrometry of fast-occurring phenomena. Sci. Rep., 7, 10413(2017).

[17] M. Raffel, C. E. Willert, F. Scarano et al. Particle Image Velocimetry: A Practical Guide(2018).

[18] A. Davis, M. Rubinstein, N. Wadhwa et al. The visual microphone: passive recovery of sound from video. ACM Trans. Graph., 33, 1(2014).

[19] W. Zhang, Y. Tao, Y. Wu et al. Vibration measurement with frequency modulation single-pixel imaging. Chin. Opt. Lett., 21, 011102(2023).

[20] Z. Zalevsky, Y. Beiderman, I. Margalit et al. Simultaneous remote extraction of multiple speech sources and heart beats from secondary speckles pattern. Opt. Express, 17, 21566(2009).

[21] Z. Chen, C. Wang, C. Huang et al. Audio signal reconstruction based on adaptively selected seed points from laser speckle images. Opt. Commun., 331, 6(2014).

[22] N. Ozana, I. Margalith, Y. Beiderman et al. Demonstration of a remote optical measurement configuration that correlates with breathing, heart rate, pulse pressure, blood coagulation, and blood oxygenation. Proc. IEEE., 103, 248(2015).

[23] N. Wu, S. Haruyama. Real-time audio detection and regeneration of moving sound source based on optical flow algorithm of laser speckle images. Opt. Express, 28, 4475(2020).

[24] J. Heikkinen, G. S. Schajer. Self-calibrated defocused speckle imaging for remote surface motion measurements. Opt. Lasers Eng., 173, 107914(2024).

[25] A. A. Veber, A. Lyashedko, E. Sholokhov et al. Laser vibrometry based on analysis of the speckle pattern from a remote object. Appl. Phys. B, 105, 613(2011).

[26] S. Bianchi. Vibration detection by observation of speckle patterns. Appl. Opt., 53, 931(2014).

[27] S. Bianchi, E. Giacomozzi. Long-range detection of acoustic vibrations by speckle tracking. Appl. Opt., 58, 7805(2019).

[28] C. Dai, C. Liu, Y. Wu et al. Audio signal detection and enhancement based on linear CMOS array and multi-channel data fusion. IEEE Access, 8, 133463(2020).

[29] X. Huang, W. Guo, R. Yu et al. Real-time high sensibility vibration detection based on phase correlation of line speckle patterns. Opt. Laser Technol., 148, 107759(2022).

[30] C. Liu, L. Li, X. Huang et al. Audio signal extraction and enhancement based on CNN from laser speckles. IEEE Photonics J., 14, 1(2022).

[31] M. Zhou. Vibration extraction using rolling shutter cameras. Ottawa University(2016).

[32] Y. Zhao, J. Liu, S. Guo et al. Measuring frequency of one-dimensional vibration with video camera using electronic rolling shutter. Opt. Eng., 57, 43104(2018).

[33] M. Sheinin, D. Chan, M. O’Toole et al. Dual-shutter optical vibration sensing. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16303(2022).

[34] H. Hong, J. Liang, L. Deng et al. Extreme detectable vibration frequency limited by rolling shutter camera imaging of laser speckles. Opt. Lett., 48, 3837(2023).

[35] N. H. Xia, C. F. Xie, Y. S. Liu et al. Two-dimensional displacement estimation of one-dimensional laser speckle images for detection of acoustic vibration. Appl. Opt., 62, 1785(2023).

[36] X. Tong, Z. Ye, Y. Xu et al. Image registration with Fourier-based image correlation: a comprehensive review of developments and applications. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 12, 4062(2019).

[37] P. Aarabi, G. Shi. Phase-based dual-microphone robust speech enhancement. IEEE Trans. Syst. Man Cybern. B, 34, 1763(2004).

[38] P. C. Loizou. Speech Enhancement: Theory and Practice(2013).

[39] J. B. Boldt, D. P. W. Ellis. A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation. 2009 17th European Signal Processing Conference, 1849(2009).

[40] I. Charrier, L. Jeantet, L. Maucourt et al. First evidence of underwater vocalizations in green sea turtles Chelonia mydas. Endang. Species. Res., 48, 31(2022).

微信扫一扫：分享

微信扫一扫：分享