
- Chinese Optics Letters
- Vol. 23, Issue 7, 071201 (2025)
Abstract
Keywords
1. Introduction
Detecting underwater acoustic sources from the air can allow aircraft and surface vessels to monitor the underwater acoustic environment and can be potentially used in many commercial and military applications. This kind of cross-medium detection scheme smartly utilizes the low transmission loss features of light in air and sound in water, respectively. Therefore, it has attracted the attention of many researchers around the world. As early as 2004, researchers from the Naval Undersea Warfare Center of America took the lead in using a laser Doppler vibrometer (LDV) to measure the vibration of the water surface modulated by the underwater acoustic signal. To solve the data dropout problem caused by the hydrodynamic water surface, a surface normal glint tracker was employed[1]. The system was also successfully used in cross water–air surface uplink communication in 2006[2]. In 2015, researchers from the Naval Air Warfare Center of America utilized an LDV to detect the vibrations of the water’s surface caused by an underwater speaker, and audio signals ranging from 50 Hz to 5 kHz were successfully detected[3]. In 2007, a scanning LDV was used to analyze underwater acoustic wavefields, and the wavefronts scattered by circular cross-sectional obstruction were correctly established[4]. In 2020, researchers from Donghua University, China, exploited a five-channel fiber-based LDV for underwater acoustic field measurement. The system performance was tested in Qiandao Lake and the signal probing probability of the whole sensing system was up to 59.77%[5].
Except for LDV, many other cross air–water detecting methods were reported. In 2010, Farrant introduced the concept of optical sonar[6], which used a
Except for optical detection methods, millimeter-wave radar and airborne sonar systems were also used to detect underwater acoustic sources. In 2018, researchers from Massachusetts Institute of Technology, America, showed that water surface modulated by acoustic waves can be detected with a millimeter-wave frequency modulated carrier wave (FMCW) radar. The distance between the radar and the water surface is 30 cm. Standard cross water–air interface communication with bit rates up to 400 bps was achieved, and the system could operate correctly in the presence of surface waves with peak-to-peak amplitudes of up to 16 cm[13]. In 2020, researchers from Stanford University, America, reported an airborne sonar system for underwater remote sensing and imaging. A 1070 nm ytterbium laser with a peak power of kilowatts was used to generate detecting acoustic waves and a specially designed capacitive micromachined ultrasonic transducer was used to detect deformation of the water surface at a 10 cm working distance. Underwater objects with a depth of 13 cm were correctly detected[14].
Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you!Sign up now
From the existing reports, we can see that the LDV method is usually selected for cross-medium audio sensing because of its high sensitivity and technology maturity. However, LDV is a single-point detecting method, which has a small signal receive angle and faces the return signal dropout problem in hydrodynamic situations. The millimeter-wave radar and airborne sonar methods are robust for severe weather conditions, but come with the shortcomings of low sensing resolution originated from their working mechanism. The amplitudes of water surface acoustic waves caused by underwater acoustic sources usually lie in tens of nanometer levels[8]. A method with high sensing resolution and a wide viewing angle is needed in practical applications.
Except for the LDV method, other optical vibration detection methods, including laser self-mixing[15], holographic vibration measurement[16], photography[17,18], single-pixel imaging[19], and laser speckle imaging vibration measurement[20], have also been developed in recent years. Among these methods, the laser speckle imaging vibration measurement method has the advantages of a simple setup and flexible system configuration and has been investigated by many researchers. Correspondingly, many measuring schemes are presented, including area array camera imaging[21–24], photodiode flux detection[25,26], linear array camera imaging[27–30], and rolling-shutter camera imaging[18,31–35]. Among the existing schemes, the linear array camera speckle imaging scheme is selected in this paper for its advantages of wide sensing frequency range, high sensitivity, large viewing angle, fast processing speed, low data storage, and transmission requirement.
In this paper, a cross air–water interface hydroacoustic signal sensing system based on linear array camera imaging is constructed. Phase correlation and multichannel fusion algorithms in the phase domain are introduced to extract and enhance the vibration signals. Underwater acoustic signals with a frequency range from 200 Hz to 20 kHz are successfully detected with a relative frequency error of less than 0.5%. Several common artificial and natural underwater acoustic signals are correctly detected. To the authors’ knowledge, this is the first report of cross air–water interface underwater acoustic signal detection based on laser speckle imaging. The arrangement of this paper is as follows. The experimental setup and signal extraction algorithm are introduced in Sec. 2. The experimental results and analysis are presented in Sec. 3, and a brief conclusion is given in Sec. 4.
2. Experiments
The experimental setup is shown in Fig. 1. The setup can be divided into three modules, including the detection module, the signal generation module, and the amplitude calibration module, respectively. A beam of 532 nm laser (JDSU CDPS532 M 50 mW) is reflected by a mirror and approximately vertically illuminated at the water surface of a glass tank (length 60 cm, width 45 cm, height 45 cm). An underwater speaker (Chinese kHz Electronics Technology LTD, model UWS-015A) is placed on the bottom of the tank, which is filled with drinking water to a height of 40 cm; the distance from the speaker’s upper surface to the water surface is about 30 cm. The speaker is driven by an audio amplifier (Yamaha RX-V490), and the signal source is either a signal generator or a computer. The audio signals of the speaker are transmitted to the air–water interface and generate water surface acoustic waves, which modulate the illuminated laser beam. The modulated laser beam is diffusely reflected by the air–water interface to the air and then collected by an imaging system. The imaging system is composed of an imaging lens (Nikon Sigma Zoom Master: 35–70 mm) and a linear array CMOS camera (Basler raL2048-48gm). The camera is controlled by the Pylon Viewer software (Basler), and the recorded speckles are stored in the computers in .bmp format. The maximum line rate of the camera is 51,000 lines per second and can be configured according to the requirements of the software. To be specially noted: the line rate is configured as 16,000 lines per second and each frame contains 8000 lines (corresponding to an atemporal period of 0.5 s). In the experiment, the incident probe laser beam should be avoided from incident upright on the underwater speaker and the transmitted beamlet should be reflected by another mirror in the other direction and forbidden to enter the image system. To increase the received light flux, a piece of 10 µm thickness aluminum foil is placed on the water surface in the experiment. A typical speckle pattern is shown as the inset in Fig. 1.
Figure 1.Experimental setup of an underwater acoustic signal detection system.
To record the absolute sound level, a commercially available hydrophone (CSSC 715th Institute of China: SAH2014-11) is used. The receiving sensitivity of the hydrophone is
The signal extraction algorithm is a modified version of the phase correlation algorithm that is reported in Ref. [29]. The phase correlation algorithm is considered a high-performance image alignment method[36]. In this paper, phase correlation is combined with multichannel fusion algorithms to enhance the performance of the vibration signal extraction. The algorithms are briefly introduced in the following section.
As the speckle pattern does not change quickly, the movement of the speckle pattern can be considered a rigid body movement. The images recorded in the
Assume that
Thus,
Figure 2.Flowchart of signal extraction algorithm.
Two columns,
The signal decay coefficient of individual channel
The inverse Fourier transform of
Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you!Sign up now
To fulfill the Nyquist sampling criteria, the line rate of camera is set to 16,000 lines/s for 200–8000 Hz signals. For 9–16 kHz signals, the line rate is set to 32,000 lines/s. For 17–20 kHz signals, the line rate is set to 44,000 lines/s. Except for the temporal sampling criteria, the spatial sampling criteria, that is, the average speckle size, should be bigger than 2 pixels and also must be fulfilled. This condition can be fulfilled through the adjustment of imaging parameters.
In the experiment, the path length from the laser source to the water surface is 1.44 m. The distance from the image sensor to the water surface is about 1.90 m. The imaging lens is defocused and focused at 0.70 m. The focus length of the image lens is adjusted to 50 mm and the aperture number is F/5.6. The beam waist radius of the 532 nm laser is about 0.3 mm. The laser spot size D illuminated on the object to be detected can be calculated according to the propagation of the laser beam and is 1.732 mm. Accordingly, the average object speckle size on the focusing plane is
3. Results
First, the vibration detection performance of single-frequency sinusoid signals is tested. The experimental setup is the same as that shown in Fig. 1, and the experimental parameters are as introduced in Sec. 2. According to the algorithm in Sec. 2, the frequency of the source audio signal can be reconstructed. Typical experimental results are shown in Fig. 3. The data in Fig. 3 are normalized.
Figure 3.Typical signal extraction results. (a) 200 Hz; (b) 4 kHz; (c) 14 kHz; (d) 20 kHz.
From Fig. 3, we can see that every signal is correctly reconstructed, and the frequency relative error is smaller than 0.2%. There are evident harmonics in Fig. 3(a); the authors think the main reason may be that the speaker plays the signal with harmonics. The harmonics of the 4, 14, and 20 kHz signals are filtered out in the processing procedure. Typical signal reconstruction results are summarized in Table 1, and the frequency relative errors are all smaller than 0.3%. The frequency relative errors before and after the multichannel fusion are compared; the results show no obvious improvement.
Original signal frequency (Hz) | Detected average frequency (Hz) | Relative error (%) | SPL (dB/μPa) |
---|---|---|---|
200.0 | 200.1 | 0.05 | 173.8 |
500.0 | 500.6 | 0.12 | 165.9 |
1000.0 | 999.4 | 0.01 | 153.9 |
4000.0 | 3998.9 | 0.28 | 172.6 |
7000.0 | 6998.0 | 0.03 | 180.6 |
10,000.0 | 10,017.5 | 0.18 | 177.7 |
14,000.0 | 14,024.3 | 0.17 | 177.6 |
20,000.0 | 19,989.8 | 0.05 | 168.8 |
Table 1. Typical Recovered Signal Frequency Characteristics
The sensibility of the vibration detecting system is verified with a commercial LDV vibrometer (Polytec OFV-3001 OFV-303). With a 300 Hz underwater audio signal, the minimum measurable amplitude is about 69 nm. While at 1 kHz, the minimum measurable amplitude is about 5 nm. In the latter case, the vibration waveform is not stable and can only be roughly estimated.
The absolute SPLs of different frequencies are measured with the hydrophone and are listed in Table 1. The values range from 153.9 to 180.6 dB re µPa. The peak SPL of the 10 kHz in-water acoustic signal was calculated to be 177.7 dB re µPa at the water surface from the reference hydrophone output voltage using Eq. (1) with
In this section, the signal detection performance for burst audio signals of underwater marine animals and artificial vehicles is evaluated. First, a humpback whale audio signal is selected as a signal source to drive the underwater speaker. The typical waveforms and spectrograms of the original signal and the reconstructed signal are shown in Fig. 4, where PC means phase correlation and PC_PBF means phase correlation combined with phase-error-based filtering. It can be seen from the spectrograms that the characteristic frequencies are reconstructed appropriately. The SNR of the reconstructed signals is evaluated with the segmental SNR (SegSNR) algorithm[38] and the average SegSNR of 20 reconstructed signals is
Figure 4.Waveforms and spectrograms of humpback whale’s audio signal. (a) Original signal; (b) PC reconstructed signal; (c) PC_PBF enhanced signal.
To further enhance the recovered signals, the PBF multichannel fusion algorithm is employed. Ten channels with the highest scales are selected and fused together using the algorithm introduced in Sec. 2. The typical fusion results of the humpback whale audio signal are shown in Fig. 4(c), and the evaluation results are summarized in Table 2. Compared with the PC method, the SegSNR, NSEC score, and LLR score of the PC_PBF method for humpback whale signals are increased by 1.61 dB, 0.04 (8.0%), and 0.20 (12.0%), respectively. The standard deviations of the PC_PBF method are smaller than those of the PC algorithm in all situations. These results show that multichannel fusion can enhance the reconstruction performance and the computational stability. The typical evaluation results for killer whale and sonar are also shown in Table 2. The enhancement in SegSNR and NSEC is obvious.
Method | Seg SNR (dB) | NSEC score | LLR score |
---|---|---|---|
1 PC | −3.89 | 0.48 | 1.71 |
1 PC_PBF | −2.28 | 0.52 | 1.51 |
2 PC | −10.56 | 0.33 | 1.64 |
2 PC_PBF | −7.87 | 0.36 | 1.52 |
3 PC | −6.51 | 0.68 | 1.97 |
3 PC_PBF | −2.37 | 0.71 | 1.91 |
Table 2. Typical Evaluation Results for the PC Method and PC_PBF Method
Second, artificial audio signals from sonar, torpedo, and other natural audio signals from a dolphin, sea turtle, and killer whale are used as signal sources and experimented upon. The results show that all these signals can be correctly reconstructed.
To further testify to the performance, audio signals of the humpback whale and killer whale are mixed together to drive the underwater speaker. The NSEC and segmental SNR value of the reconstructed mixed signal is
The processing platform is a personal computer with a 3.20 GHz CPU (AMD Ryzen 5800H),16 GB of RAM, and a 4 GB graphics card (NVIDIA GeForce GTX 1650). The identifying process of 20 signals lasts 2.05 s, which shows that this method can identify underwater objects in real time.
In the experiment, aluminum foil is placed on the water surface to increase the scattering flux. This foil restricts the application of the sensing system to the laboratory environment. As long as the angle between the laser incident direction and the normal of the aluminum foil is smaller than 20 deg, the system performance is acceptable. Without the aluminum foil, the refraction and reflection of light should be considered.
The minimum detectable SPL at the water surface decides the application of the presented system. For the commercial Polytec OFV 353 sensor, the minimum detectable SPL is 120 dB re µPa[1]. For the sensing system presented in this paper, experimental results show that the minimum detectable SPLs are 131.5 and 132.1 dB re µPa for 250 Hz and 2.5 kHz sinusoid signals, respectively. In a real marine environment, the SPL of sea turtles usually lies around 120 dB re µPa[40], which is beyond the sensing range of the system presented in this paper. For cross air–water communication, the SPL of transducers is around 167 dB re µPa[1], which lies in the detectable range of the sensing system presented in this paper. With the increase of the distance between the sensor and the vibrating object, the vibrating amplitude can be magnified through a defocused image. Therefore, the sensitivity can be increased with the increase of measuring distance.
The following parameters, such as water turbidity, water fluctuations, and ambient light, all affect the measurement performance of the sensor. For the present setup with aluminum foil, the effect of water turbidity can be ignored. A multibeamlet scheme can be used to suppress the data drop-out problem caused by water fluctuation. The interference caused by environmental factors (such as wind and passing ships) is low-frequency vibration (below 3 Hz[13]) and can be filtered through a high-pass filter in the frequency domain. The effect of ambient light can be suppressed through an optical bandpass filter, which only transmits signal light to the image sensor.
The sensing performances are compared in Table 3. From Table 3, we can see that the proposed method has a wider viewing angle, lower detectable SPL, and longer sensing distance. Furthermore, the system setup proposed in this paper consists only of a laser source and a camera, which makes it compact and robust. If employing the multibeamlet scheme, simultaneous multipoint detection can be realized. As a comparison, the LDV scheme is principally a single-point method. Compared with the hydrophone method, the advantages of the proposed method are easy to deploy and suitable for large-range surveillance. The weakness of the proposed method is that its present performance is inferior to that of a hydrophone, the technique’s maturity has not reached the requirement of practical application, and the effect of environmental noise must be considered and suppressed.
Performance | LDV[ | Millimeter-wave radar[ | Airborne sonar[ | This paper |
---|---|---|---|---|
Frequency range | Single frequency 10 kHz | 100–200 Hz | Single frequency 71 kHz | 200 Hz–20 kHz |
Viewing angle | 0.13 deg | No data | No data | > 20 deg |
Measuring distance | 1.981 m | 0.3 m | 0.1 m | 1.9 m |
SPL | 150 dB re µPa (experimental) | No data | 80 dB re µPa (theoretical) | 131.5 dB re µPa (experimental) |
Depth of detection | 2 m | 0.9–3.6 m | 0.13 m | 0.3 m |
Detectable amplitude | nm | µm | No data | nm |
Working principle | Doppler frequency shift | Reflection phase variation | Piezoelectric transducer | Laser speckle imaging |
Cost | Expensive | Low cost | Expensive | Low cost |
Table 3. Performance Comparison of Various Methods
4. Conclusion
In conclusion, a cross air–water interface underwater acoustic sensing system based on a linear array camera imaging is constructed. Phase correlation and a PBF multichannel fusion algorithm are introduced to extract and enhance the vibration signals. 1) Underwater acoustic signals with a frequency range from 200 Hz to 20 kHz are successfully detected with a relative frequency error of less than 0.5%. The minimum detectable SPL is 131.5 dB re µPa at 250 Hz. 2) Natural audio signals, including dolphins, turtles, killer whales, humpback whales, and artificial audio signals, including sonar and torpedoes, are tested and correctly reconstructed. For the reconstructed humpback whale audio signal, the NSEC and segmental SNR are
References
[6] D. Farrant, J. Burke, L. Dickinson et al. Opto-acoustic underwater remote sensing (OAURS)-an optical sonar?. OCEANS’ 10 IEEE Sydney, 1(2010).
[13] F. Tonolini, F. Adib. Networking across boundaries: enabling wireless communication through the water-air interface. Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, 117(2018).
[17] M. Raffel, C. E. Willert, F. Scarano et al. Particle Image Velocimetry: A Practical Guide(2018).
[26] S. Bianchi. Vibration detection by observation of speckle patterns. Appl. Opt., 53, 931(2014).
[31] M. Zhou. Vibration extraction using rolling shutter cameras. Ottawa University(2016).
[33] M. Sheinin, D. Chan, M. O’Toole et al. Dual-shutter optical vibration sensing. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16303(2022).
[38] P. C. Loizou. Speech Enhancement: Theory and Practice(2013).
[39] J. B. Boldt, D. P. W. Ellis. A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation. 2009 17th European Signal Processing Conference, 1849(2009).

Set citation alerts for the article
Please enter your email address