Author Affiliations
School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming , Yunnan 650500, Chinashow less
Fig. 1. Speech spectrograms and energy peak point distribution of Chinese phonetic syllables. (a) Original speech spectrogram; (b) speech envelope spectrogram; (c) energy peak point distribution
Fig. 2. Syllable matching algorithm steps
Fig. 3. Extraction of spectral peak point feature
Fig. 4. Gray-scale spectrogram of speech signal
Fig. 5. Envelope spectrogram of speech signal
Fig. 6. Energy point after thresholding
Fig. 7. Envelope spectrograms. (a) Envelope spectrogram of commonly used Chinese character pronunciation “de”; (b) envelope spectrogram after discarding partial information blow 300 Hz
Fig. 8. Distribution of energy maximum points using different division methods in frequency bands. (a) Frequency band is equally spaced; (b) logarithmic division of frequency bands
Fig. 9. Illustration of two maximum feature points in each signal frame
SNR /dB | Accuracy /% |
---|
Nf=5 | Nf=10 | Nf=15 | Nf=20 |
---|
30 | 61.2 | 75.6 | 70.7 | 65.6 | 25 | 59.0 | 73.4 | 68.7 | 62.4 | 20 | 57.3 | 71.8 | 66.4 | 60.1 | 15 | 55.6 | 70.1 | 62.5 | 57.8 |
|
Table 1. Matching accuracy of different frame numbers under different signal-to-noise ratios
SNR /dB | Accuracy /% |
---|
Nb=2 | Nb=4 | Nb=8 | Nb=16 |
---|
30 | 35.6 | 75.6 | 70.1 | 58.6 | 25 | 32.3 | 73.4 | 68.4 | 54.7 | 20 | 30.1 | 71.8 | 65.8 | 51.2 | 15 | 29.3 | 70.1 | 62.4 | 48.7 |
|
Table 2. Matching accuracy of different logarithmic frequency bands under different signal-to-noise ratios
Matching algorithm | Accuracy /% |
---|
Mahalanobis distance[11] | 62.3 | Cosine similarity[12] | 71.6 | Our algorithm | 80.4 |
|
Table 3. Matching accuracy of different algorithms for the same person's pronunciation in a noise-free environment
Matching algorithm | Accuracy /% |
---|
SNR of 25 dB | SNR of 20 dB | SNR of 15 dB | SNR of 10 dB |
---|
Mahalanobis distance[11] | 58.3 | 56.8 | 54.6 | 51.1 | Cosine similarity[12] | 68.4 | 66.7 | 64.0 | 61.2 | Our algorithm | 76.4 | 74.8 | 72.2 | 71.1 |
|
Table 4. Matching accuracy of different algorithms for the same person's pronunciation in a noisy environment
Matching algorithm | Accuracy /% |
---|
Mahalanobis distance[11] | 58.3 | Cosine similarity[12] | 65.5 | Our algorithm | 74.4 |
|
Table 5. Matching accuracy of different algorithms for different people's pronunciation in a noise-free environment
Matching algorithm | Accuracy /% |
---|
SNR of 25 dB | SNR of 20 dB | SNR of 15 dB | SNR of 10 dB |
---|
Mahalanobis distance[11] | 56.3 | 54.8 | 53.6 | 52.1 | Cosine similarity[12] | 63.4 | 62.1 | 60.7 | 58.2 | Our algorithm | 70.8 | 68.9 | 67.1 | 64.5 |
|
Table 6. Matching accuracy of different algorithms for different people's pronunciation in a noisy environment