Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control

Kailong Ren; Yi Wang; Xiaodong Chen; Huaiyu Cai

doi:10.3788/LOP57.181702

Journals >Laser & Optoelectronics Progress >Volume 57 >Issue 18 >Page 181702 > Article

Laser & Optoelectronics Progress
Vol. 57, Issue 18, 181702 (2020)

Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control

Kailong Ren, Yi Wang^*, Xiaodong Chen, and Huaiyu Cai

Author Affiliations

School of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China

show less

DOI: 10.3788/LOP57.181702 Cite this Article Set citation alerts

Kailong Ren, Yi Wang, Xiaodong Chen, Huaiyu Cai. Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181702 Copy Citation Text

show less

Fig. 1. Diagram of simple RNN and its expansion

Download full size

Fig. 2. Diagram of the unit of LSTM recurrent neural network hidden layer

Download full size

Fig. 3. Diagram of BiLSTM RNN structure

Download full size

Fig. 4. Diagram of LSTM recurrent neural network model with i-vector feature

Download full size

Fig. 5. Diagrams of i-vector parameter fusion and adding rejection identification unit. (a) Parameter fusion of i-vector; (b) adding rejection identification unit

Download full size

Layer ID	Name	Numberof units	Activationfunction
1	Input layer	-	-
2	FC1	64	ReLU
3	FC2	64	ReLU
4	FC3	64	ReLU
5	BiLSTM	64	-
6	FC4	64	ReLU
7	FC5	64	ReLU
8	Output layer	-	Softmax

Table 1. LSTM recurrent neural network model structure with i-vector feature

Word ID	DTW				GMM-HMM				LSTM RNN with i-vector
	Total	Correct	Error		Total	Correct	Error		Total	Correct	Error
			Error		Total	Correct	FR	FA	Total	Correct	FR	FA
			FR	FA
1	60	50	4	6	60	53	1	6	60	59	1	0
2	60	54	2	4	60	54	2	4	60	60	0	0
3	60	54	2	4	60	56	1	3	60	60	0	0
4	60	53	2	5	60	55	1	4	60	59	1	0
5	60	52	3	5	60	54	1	5	60	60	0	0
6	60	54	3	3	60	55	3	2	60	60	0	0
7	60	52	6	2	60	52	1	7	60	60	0	0
8	60	54	2	4	60	54	1	5	60	60	0	0
Sum	480	423	24	33	480	433	11	36	480	478	2	0

Table 2. Recognition results of surgeon speech by three models

Word ID	DTW			GMM-HMM			LSTM RNN with i-vector
Word ID	Total	Rejection	FA	Total	Rejection	FA	Total	Rejection	FA
1	60	53	7	60	54	6	60	60	0
2	60	55	5	60	56	4	60	60	0
3	60	58	2	60	57	3	60	60	0
4	60	54	6	60	55	5	60	60	0
5	60	56	4	60	56	4	60	60	0
6	60	53	7	60	52	8	60	60	0
7	60	55	5	60	56	4	60	60	0
8	60	56	4	60	57	3	60	60	0
Sum	480	440	40	480	443	37	480	480	0

Table 3. Recognition results of assistant doctors speech by three models

Word ID	DTW			GMM-HMM			LSTM RNN with i-vector
Word ID	Toatl	Rejection	FA	Total	Rejection	FA	Total	Rejection	FA
1	80	72	8	80	70	10	80	80	0
2	80	72	8	80	75	5	80	77	3
3	80	75	5	80	76	4	80	78	2
4	80	73	7	80	72	8	80	76	4
5	80	73	7	80	74	6	80	77	3
6	80	74	6	80	75	5	80	78	2
7	80	75	5	80	75	5	80	79	1
8	80	73	7	80	72	8	80	79	1
Sum	640	587	53	640	589	51	640	624	16

Table 4. Recognition results of interference speech by three models

Kailong Ren, Yi Wang, Xiaodong Chen, Huaiyu Cai. Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181702

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information