• Laser & Optoelectronics Progress
  • Vol. 59, Issue 13, 1307001 (2022)
Yankai Wang, Hua Long*, Yubin Shao, Qingzhi Du, and Yao Wang
Author Affiliations
  • Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan , China
  • show less
    DOI: 10.3788/LOP202259.1307001 Cite this Article Set citation alerts
    Yankai Wang, Hua Long, Yubin Shao, Qingzhi Du, Yao Wang. Language Identification Using Joint Voice Activity Detection and Dynamic Range Control[J]. Laser & Optoelectronics Progress, 2022, 59(13): 1307001 Copy Citation Text show less
    MFCC0 feature voice activity detection. (a) Voice waveform; (b) MFCC0 features; (c) MFCC0 feature voice activity detection result after median filtering
    Fig. 1. MFCC0 feature voice activity detection. (a) Voice waveform; (b) MFCC0 features; (c) MFCC0 feature voice activity detection result after median filtering
    DRC input/output processing unit
    Fig. 2. DRC input/output processing unit
    Voice changes before and after DRC processing. (a) Voice waveform changes before and after DRC processing; (b) spectropram before DRC processing; (c) spectropram after DRC processing
    Fig. 3. Voice changes before and after DRC processing. (a) Voice waveform changes before and after DRC processing; (b) spectropram before DRC processing; (c) spectropram after DRC processing
    Comparison of different frequency scales. (a) Linear scale spectrogram; (b) log scale spectrogram
    Fig. 4. Comparison of different frequency scales. (a) Linear scale spectrogram; (b) log scale spectrogram
    Flow chart of language recognition
    Fig. 5. Flow chart of language recognition
    Multi-classification task evaluation parameters
    Fig. 6. Multi-classification task evaluation parameters
    Results of different frequency coordinate scales
    Fig. 7. Results of different frequency coordinate scales
    Resnet classification results
    Fig. 8. Resnet classification results
    ResNeSt classification results
    Fig. 9. ResNeSt classification results
    Language recognition result confusion matrix
    Fig. 10. Language recognition result confusion matrix
    Probability distribution before VADξ10
    PL2SL1S
    Probability distribution after VADξ10
    PL2+L3SL1-L3S
    Table 1. Probability distribution change before and after VAD

    Language

    type

    Training setTesting setTotal wav numberDuration /s
    Wav numberPeople numberWav numberPeople number
    French120015030014915003
    German120015030015015003
    Spanish120015130015115003
    English120016930015415003
    Italian120015130015015003
    Russian120015030014815003
    Total720092118009029000
    Table 2. Data allocation of training set and testing set
    Feature(Frame_number, Data_dimension)Aaccuracy /%
    MFCC-SDC(374, 56)65.72
    MFCC(374, 39)80.88
    GFCC(374, 32)85.44
    Log scale Fbank feature(374, 64)93.05
    Linear scale spectrogram(374, 128)93.66
    Log scale spectrogram (proposed)(374, 128)97.94
    Table 3. Comparison of language identification results of several different features
    Yankai Wang, Hua Long, Yubin Shao, Qingzhi Du, Yao Wang. Language Identification Using Joint Voice Activity Detection and Dynamic Range Control[J]. Laser & Optoelectronics Progress, 2022, 59(13): 1307001
    Download Citation