• Laser & Optoelectronics Progress
  • Vol. 59, Issue 13, 1307001 (2022)
Yankai Wang, Hua Long*, Yubin Shao, Qingzhi Du, and Yao Wang
Author Affiliations
  • Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan , China
  • show less
    DOI: 10.3788/LOP202259.1307001 Cite this Article Set citation alerts
    Yankai Wang, Hua Long, Yubin Shao, Qingzhi Du, Yao Wang. Language Identification Using Joint Voice Activity Detection and Dynamic Range Control[J]. Laser & Optoelectronics Progress, 2022, 59(13): 1307001 Copy Citation Text show less

    Abstract

    In the language identification system, the interference of silent segments and the inconsistency of voice decibel range leads to a decline in language identification. Additionally, algorithms using spectrograms for language identification cannot effectively show the information of its low-frequency part, which results in performance failure. To mitigate this, we proposed a language identification method based on joint voice activity detection and dynamic range control. First, we extracted the first dimension coefficient of the Mel-scale frequency cepstral coefficients. Second, we applied median filtering to smooth the feature parameters and perform voice activity detection to remove the silent segment of the voice. Next, we used the dynamic range control to adjust the decibel range of different voices. Finally, we put the log scale spectrogram into the convolutional neural network for classification. The experimental results show that the proposed algorithm improved performance by 7.16 percentage points as compared with the traditional language identification algorithm using spectrogram in the VoxForge public corpus under the ResNeSt network. Additionally, under the same experimental settings, the recognition performance of the log scale spectrogram showed superiority over other mainstream features, which fully validates the effectiveness and superiority of the proposed algorithm and features.
    Yankai Wang, Hua Long, Yubin Shao, Qingzhi Du, Yao Wang. Language Identification Using Joint Voice Activity Detection and Dynamic Range Control[J]. Laser & Optoelectronics Progress, 2022, 59(13): 1307001
    Download Citation