• Infrared and Laser Engineering
  • Vol. 52, Issue 10, 20230051 (2023)
Xinxue Dai1、2, Songtao Fan1, and Yan Zhou1、2
Author Affiliations
  • 1Optoelectronics System Laboratory, Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China
  • 2University of Chinese Academy of Sciences, Beijing 100049, China
  • show less
    DOI: 10.3788/IRLA20230051 Cite this Article
    Xinxue Dai, Songtao Fan, Yan Zhou. Speech enhancement method of laser microphone based on ResUnet and TFGAN network[J]. Infrared and Laser Engineering, 2023, 52(10): 20230051 Copy Citation Text show less

    Abstract

    ObjectiveLaser microphone is a kind of equipment which employs optical Doppler effect to acquire acoustic vibration information (speech). Compared with conventional microphones, laser microphones have the characteristics of extended range, high precision and non-contact. It is capable of collecting distant sound field information in a directional fashion while avoiding interference from the sound field close to the equipment. However, when the laser microphone is used to collect the remote sound field speech information, the quality of the obtained speech is affected by many factors, which leads to the severe decline of the laser speech quality. At present, the research of speech enhancement algorithm for laser microphone speech is relatively preliminary. The traditional single-channel speech enhancement method requires the signal and noise to satisfy the conditions of stationarity or correlation, and its performance is significantly reduced under complex conditions such as low signal-to-noise ratio and non-stationarity noise. The method based on deep neural network can understand the complex mapping relationship between noisy speech and clear speech, and the performance is better than the traditional method. This technique, however, has poor generalizability for laser speech from complex targets in unpreset environments because different targets have different frequency response characteristics. Therefore, in order to increase the quality of far-field speech captured by laser microphones, a laser microphone speech enhancement method based on ResUnet network and TFGAN network is proposed in this paper.MethodsUsing laboratory-made laser microphones, four different types of objects were used in this paper's remote speech acquisition tests (Fig.6). The technique described in this paper is used to process the recorded speech, and it is contrasted with methods for nonlinear function harmonic reconstruction and DNN+ harmonic reconstruction (Fig.9). Finally, objective speech quality assessment (PESQ) and time-domain segmented signal-to-noise ratio (SNRseg) were used to quantitatively evaluate the processed laser speech (Fig.11).Results and DiscussionsCompared with the above two methods, the method proposed in this paper can better suppress the broadband noise and pulse noise and reconstruct the more accurate high-frequency information after the stepwise enhancement processing of the collected laser speech. The laser speech PESQ scores of A4 paper, A4 paper box, corrugated box and PET plastic bottle after this method are 2.126, 1.818, 1.804 and 1.951, respectively increased by 0.129, 0.113, 0.117 and 0.22. The corresponding SNRseg scores were -5.31 dB, -3.36 dB, -5.07 dB and -3.40 dB, which were increased by 1 dB, 6.25 dB, 1.41 dB and 0.17 dB, respectively. The experimental results show that the ResUnet+TFGAN network method proposed in this paper can effectively improve the laser speech quality of the above targets.ConclusionsIn this study, a laser microphone speech enhancement method based on ResUnet and TFGAN network is proposed. Speech pieces are gathered on various targets by self-made laser microphones in the lab, and the proposed method is demonstrated through experiments. The experimental results show that this method can enhance the speech of laser microphone from a variety of objects. Compared with the nonlinear function harmonic reconstruction method and DNN+ harmonic reconstruction method, the advantages of this method are that ResUet and TFGAN networks can respectively realize the clear Mel spectrum prediction and time domain waveform recovery of laser speech, avoiding the high-frequency noise introduced by the harmonic reconstruction method in the reconstruction of speech signal, and at the same time recover the more clear high-frequency information of laser speech. PESQ and SNRseg results demonstrate that using the proposed method results in improved speech quality for the laser microphone. This method extends the application range of laser microphones to a certain extent, and we will further verify and improve this method on objects with more complex materials and shapes.
    Xinxue Dai, Songtao Fan, Yan Zhou. Speech enhancement method of laser microphone based on ResUnet and TFGAN network[J]. Infrared and Laser Engineering, 2023, 52(10): 20230051
    Download Citation