Syllable Matching Algorithm with Spectral Peak Point Feature for Chinese Speech

Weikang Tang; Yubin Shao; Hua Long; Qingzhi Du; Yi Peng; Liang Chen

doi:10.3788/LOP202259.0707001

Journals >Laser & Optoelectronics Progress >Volume 59 >Issue 7 >Page 0707001 > Article

Laser & Optoelectronics Progress
Vol. 59, Issue 7, 0707001 (2022)

Syllable Matching Algorithm with Spectral Peak Point Feature for Chinese Speech

Weikang Tang, Yubin Shao^*, Hua Long, Qingzhi Du, Yi Peng, and Liang Chen

Author Affiliations

School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming , Yunnan 650500, China

show less

DOI: 10.3788/LOP202259.0707001 Cite this Article Set citation alerts

Weikang Tang, Yubin Shao, Hua Long, Qingzhi Du, Yi Peng, Liang Chen. Syllable Matching Algorithm with Spectral Peak Point Feature for Chinese Speech[J]. Laser & Optoelectronics Progress, 2022, 59(7): 0707001 Copy Citation Text

show less

Speech spectrograms and energy peak point distribution of Chinese phonetic syllables. (a) Original speech spectrogram; (b) speech envelope spectrogram; (c) energy peak point distribution

Fig. 1. Speech spectrograms and energy peak point distribution of Chinese phonetic syllables. (a) Original speech spectrogram; (b) speech envelope spectrogram; (c) energy peak point distribution

Download full size

Fig. 2. Syllable matching algorithm steps

Download full size

Fig. 3. Extraction of spectral peak point feature

Download full size

Fig. 4. Gray-scale spectrogram of speech signal

Download full size

Fig. 5. Envelope spectrogram of speech signal

Download full size

Fig. 6. Energy point after thresholding

Download full size

Fig. 7. Envelope spectrograms. (a) Envelope spectrogram of commonly used Chinese character pronunciation “de”; (b) envelope spectrogram after discarding partial information blow 300 Hz

Download full size

Fig. 8. Distribution of energy maximum points using different division methods in frequency bands. (a) Frequency band is equally spaced; (b) logarithmic division of frequency bands

Download full size

Fig. 9. Illustration of two maximum feature points in each signal frame

Download full size

SNR /dB	Accuracy /%
SNR /dB	N_f=5	N_f=10	N_f=15	N_f=20
30	61.2	75.6	70.7	65.6
25	59.0	73.4	68.7	62.4
20	57.3	71.8	66.4	60.1
15	55.6	70.1	62.5	57.8

Table 1. Matching accuracy of different frame numbers under different signal-to-noise ratios

SNR /dB	Accuracy /%
SNR /dB	N_b=2	N_b=4	N_b=8	N_b=16
30	35.6	75.6	70.1	58.6
25	32.3	73.4	68.4	54.7
20	30.1	71.8	65.8	51.2
15	29.3	70.1	62.4	48.7

Table 2. Matching accuracy of different logarithmic frequency bands under different signal-to-noise ratios

Matching algorithm	Accuracy /%
Mahalanobis distance^［11］	62.3
Cosine similarity^［12］	71.6
Our algorithm	80.4

Table 3. Matching accuracy of different algorithms for the same person's pronunciation in a noise-free environment

Matching algorithm	Accuracy /%
Matching algorithm	SNR of 25 dB	SNR of 20 dB	SNR of 15 dB	SNR of 10 dB
Mahalanobis distance^［11］	58.3	56.8	54.6	51.1
Cosine similarity^［12］	68.4	66.7	64.0	61.2
Our algorithm	76.4	74.8	72.2	71.1

Table 4. Matching accuracy of different algorithms for the same person's pronunciation in a noisy environment

Matching algorithm	Accuracy /%
Mahalanobis distance^［11］	58.3
Cosine similarity^［12］	65.5
Our algorithm	74.4

Table 5. Matching accuracy of different algorithms for different people's pronunciation in a noise-free environment

Matching algorithm	Accuracy /%
Matching algorithm	SNR of 25 dB	SNR of 20 dB	SNR of 15 dB	SNR of 10 dB
Mahalanobis distance^［11］	56.3	54.8	53.6	52.1
Cosine similarity^［12］	63.4	62.1	60.7	58.2
Our algorithm	70.8	68.9	67.1	64.5

Table 6. Matching accuracy of different algorithms for different people's pronunciation in a noisy environment

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information