• Laser & Optoelectronics Progress
  • Vol. 57, Issue 18, 181702 (2020)
Kailong Ren, Yi Wang*, Xiaodong Chen, and Huaiyu Cai
Author Affiliations
  • School of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China
  • show less
    DOI: 10.3788/LOP57.181702 Cite this Article Set citation alerts
    Kailong Ren, Yi Wang, Xiaodong Chen, Huaiyu Cai. Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181702 Copy Citation Text show less
    References

    [1] Abdulla W H, Chow D, Sin G. Cross-words reference template for DTW-based speech recognition systems[C]∥2003 Conference on Convergent Technologies for Asia-Pacific Region. 15-17 Oct. 2003, Bangalore, India., 1576-1579(2003).

    [2] Zhao X, Chen X D, Chang X et al. Parameter extraction and enhancing method for mixed phonetic features based on multi-fisher criterion[J]. Nanotechnology and Precision Engineering, 15, 317-322(2017).

    [3] Sak H, Senior A, Beanfays F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. [C]∥ 2014 Proceedings of Annual Conference of International Speech Communication Association. [S.l.:s.n.], 338-342(2014).

    [4] AAbdel-Hamid O, Mohamed A R, Jiang H et al. Convolutional neural networks for speech recognition[J]. ACM Transactions on Audio, Speech, and Language Processing, 22, 1533-1545(2014).

    [5] Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks[C]∥2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 26-31 May 2013, Vancouver, BC, Canada., 6645-6649(2013).

    [6] Dehak N, Kenny P J, Dehak R et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 19, 788-798(2011).

    [7] Variani E, Lei X. McDermott E, et al. Deep neural networks for small footprint text-dependent speaker verification[C]∥2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4-9 May 2014, 4052-4056(2014).

    [8] Li Y X, Zhang J Q, Pan D et al. A study of speech recognition based on RNN-RBM language model[J]. Journal of Computer Research and Development, 51, 1936-1944(2014).

    [9] Yang H J, Yan Z, Wu Z L et al. Extraction method of interest text in image based on recurrent neural network[J]. Laser & Optoelectronics Progress, 56, 241501(2019).

    [10] Li J Y, Yu D, Huang J T et al. Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM[C]∥2012 IEEE Spoken Language Technology Workshop (SLT). 2-5 Dec. 2012, Miami, FL, USA., 131-136(2012).

    [11] Chen H K, Chen Y. Speaker identification based on multimodal long short-term memory with depth-gate[J]. Laser & Optoelectronics Progress, 56, 031007(2019).

    [12] Yao Y S. 04874[2020-03-05]. 2016-02-16) https:∥arxiv., org/abs/1602, 04874(1602).

    [13] Scheffer N, Bonastre J F. UBM-GMM driven discriminative approach for speaker verification[C]∥2006 IEEE Odyssey - the Speaker and Language Recognition Workshop. 28-30 June 2006, San Juan, Puerto Rico., 1-7(2006).

    [14] Snyder D, Garcia-Romero D, Povey D. Time delay deep neural network-based universal background models for speaker recognition[C]∥2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). 13-17 Dec. 2015, Scottsdale, AZ, 92-97(2015).

    [15] Li P, Zhang Y. Video smoke detection based on Gaussian mixture model and convolutional neural network[J]. Laser & Optoelectronics Progress, 56, 211502(2019).

    [16] Garcia-Romero D. Espy-Wilson C Y. Analysis of i-vector length normalization in speaker recognition systems. [C]∥ Proceedings of the Annual Conference of the International Speech Communication Association. Florence, Italy:[s.n.], 249-252(2011).

    [17] Kenny P, Boulianne G, Ouellet P et al. Joint factor analysis versus eigenchannels in speaker recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 15, 1435-1447(2007).

    [18] Kenny P, Boulianne G, Dumouchel P. Eigenvoice modeling with sparse training data[J]. IEEE Transactions on Speech and Audio Processing, 13, 345-354(2005).

    [19] Kenny P, Ouellet P, Dehak N et al. A study of interspeaker variability in speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 16, 980-988(2008).

    [20] Gupta V, Kenny P, Ouellet P et al. I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription[C]∥2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4-9 May 20, 6334-6338(2014).

    [21] Li Z Y, Zhang W Q, He L et al. Total variability subspace adaptation based speaker recognition[J]. Acta Automatica Sinica, 40, 1836-1840(2014).

    [22] Zhang J C, Inoue N. 00290[2020-03-05]. 2018-04-01) https:∥arxiv.org/abs/1804.00290v1.(1804).

    [23] Glembek O, Burget L, Matějka P et al. Simplification and optimization of i-vector extraction[C]∥2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)., 12176147(2011).

    [24] Chakroborty S, Saha G. Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on Gaussian filter[J]. International Journal of Signal Processing, 5, 11-19(2009).

    [25] Murty K S R, Yegnanarayana B. Combining evidence from residual phase and MFCC features for speaker recognition[J]. IEEE Signal Processing Letters, 13, 52-55(2006).

    [26] Ai O C, Hariharan M, Yaacob S et al. Classification of speech dysfluencies with MFCC and LPCC features[J]. Expert Systems with Applications, 39, 2157-2165(2012).

    [27] Huang G X, Tian Y, Kang J et al. Long short term memory recurrent neural network acoustic models using i-vector for low resource speech recognition[J]. Application Research of Computers, 34, 392-396(2017).

    Kailong Ren, Yi Wang, Xiaodong Chen, Huaiyu Cai. Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181702
    Download Citation