• Laser & Optoelectronics Progress
  • Vol. 59, Issue 7, 0707001 (2022)
Weikang Tang, Yubin Shao*, Hua Long, Qingzhi Du, Yi Peng, and Liang Chen
Author Affiliations
  • School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming , Yunnan 650500, China
  • show less
    DOI: 10.3788/LOP202259.0707001 Cite this Article Set citation alerts
    Weikang Tang, Yubin Shao, Hua Long, Qingzhi Du, Yi Peng, Liang Chen. Syllable Matching Algorithm with Spectral Peak Point Feature for Chinese Speech[J]. Laser & Optoelectronics Progress, 2022, 59(7): 0707001 Copy Citation Text show less
    Speech spectrograms and energy peak point distribution of Chinese phonetic syllables. (a) Original speech spectrogram; (b) speech envelope spectrogram; (c) energy peak point distribution
    Fig. 1. Speech spectrograms and energy peak point distribution of Chinese phonetic syllables. (a) Original speech spectrogram; (b) speech envelope spectrogram; (c) energy peak point distribution
    Syllable matching algorithm steps
    Fig. 2. Syllable matching algorithm steps
    Extraction of spectral peak point feature
    Fig. 3. Extraction of spectral peak point feature
    Gray-scale spectrogram of speech signal
    Fig. 4. Gray-scale spectrogram of speech signal
    Envelope spectrogram of speech signal
    Fig. 5. Envelope spectrogram of speech signal
    Energy point after thresholding
    Fig. 6. Energy point after thresholding
    Envelope spectrograms. (a) Envelope spectrogram of commonly used Chinese character pronunciation “de”; (b) envelope spectrogram after discarding partial information blow 300 Hz
    Fig. 7. Envelope spectrograms. (a) Envelope spectrogram of commonly used Chinese character pronunciation “de”; (b) envelope spectrogram after discarding partial information blow 300 Hz
    Distribution of energy maximum points using different division methods in frequency bands. (a) Frequency band is equally spaced; (b) logarithmic division of frequency bands
    Fig. 8. Distribution of energy maximum points using different division methods in frequency bands. (a) Frequency band is equally spaced; (b) logarithmic division of frequency bands
    Illustration of two maximum feature points in each signal frame
    Fig. 9. Illustration of two maximum feature points in each signal frame
    SNR /dBAccuracy /%
    Nf=5Nf=10Nf=15Nf=20
    3061.275.670.765.6
    2559.073.468.762.4
    2057.371.866.460.1
    1555.670.162.557.8
    Table 1. Matching accuracy of different frame numbers under different signal-to-noise ratios
    SNR /dBAccuracy /%
    Nb=2Nb=4Nb=8Nb=16
    3035.675.670.158.6
    2532.373.468.454.7
    2030.171.865.851.2
    1529.370.162.448.7
    Table 2. Matching accuracy of different logarithmic frequency bands under different signal-to-noise ratios
    Matching algorithmAccuracy /%
    Mahalanobis distance1162.3
    Cosine similarity1271.6
    Our algorithm80.4
    Table 3. Matching accuracy of different algorithms for the same person's pronunciation in a noise-free environment
    Matching algorithmAccuracy /%
    SNR of 25 dBSNR of 20 dBSNR of 15 dBSNR of 10 dB
    Mahalanobis distance1158.356.854.651.1
    Cosine similarity1268.466.764.061.2
    Our algorithm76.474.872.271.1
    Table 4. Matching accuracy of different algorithms for the same person's pronunciation in a noisy environment
    Matching algorithmAccuracy /%
    Mahalanobis distance1158.3
    Cosine similarity1265.5
    Our algorithm74.4
    Table 5. Matching accuracy of different algorithms for different people's pronunciation in a noise-free environment
    Matching algorithmAccuracy /%
    SNR of 25 dBSNR of 20 dBSNR of 15 dBSNR of 10 dB
    Mahalanobis distance1156.354.853.652.1
    Cosine similarity1263.462.160.758.2
    Our algorithm70.868.967.164.5
    Table 6. Matching accuracy of different algorithms for different people's pronunciation in a noisy environment
    Weikang Tang, Yubin Shao, Hua Long, Qingzhi Du, Yi Peng, Liang Chen. Syllable Matching Algorithm with Spectral Peak Point Feature for Chinese Speech[J]. Laser & Optoelectronics Progress, 2022, 59(7): 0707001
    Download Citation