Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism

Tianbao Liu; Lingtao Zhang; Wentao Yu; Dongchuan Wei; Yijun Fan

doi:10.3788/LOP202158.0210017

Journals >Laser & Optoelectronics Progress >Volume 58 >Issue 2 >Page 0210017 > Article

Laser & Optoelectronics Progress
Vol. 58, Issue 2, 0210017 (2021)

Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism

Tianbao Liu, Lingtao Zhang^*, Wentao Yu, Dongchuan Wei, and Yijun Fan

Author Affiliations

College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, Hunan 410004, China

show less

DOI: 10.3788/LOP202158.0210017 Cite this Article Set citation alerts

Tianbao Liu, Lingtao Zhang, Wentao Yu, Dongchuan Wei, Yijun Fan. Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism[J]. Laser & Optoelectronics Progress, 2021, 58(2): 0210017 Copy Citation Text

show less

Fig. 1. Flow chart of audio and video emotion recognition system

Download full size

Fig. 2. Structure of recursive neuron

Download full size

Fig. 3. Schematic of attention mechanism

Download full size

Fig. 4. Schematic of stacking LSTM model with attention mechanism

Download full size

Fig. 5. Diagram of video emotion recognition system

Download full size

Fig. 6. Relationship between LSTM layers and recognition rate

Download full size

Fig. 7. Performance comparison of different feature fusion algorithms

Download full size

Network	RML	AFEW6.0	eNTERFACE'05
SVM^[23]	0.6020	0.3790	0.4831
Random forest^[24]	0.6528	0.3508	0.4711
LSTM+CNN^[25]	0.8546		0.4915
CNN	0.8363		0.4691
CNN+LSTM	0.8446	0.4217	0.4952
Proposed network	0.9011	0.5473	0.5932

Table 1. Comparison of recognition rate in speech emotion recognition experiment

Dataset	3-layer LSTM
Dataset	Ordinary	Add attentionmechanism
RML	0.8661	0.8873
AFEW6.0	0.4633	0.4965
eNTERFACE'05	0.5315	0.5739

Table 2. Recognition rate comparison of hierarchical attention mechanism

Dataset	Ordinary	Add penalty
RML	0.8873	0.9011
AFEW6.0	0.4965	0.5473
eNTERFACE'05	0.5739	0.5932

Table 3. Recognition rate comparison under penalty items

Video sequencefeature	RML	AFEW6.0	eNTERFACE'05
EF-A	0.8653	0.5074	0.7458
EF-B	0.8812	0.5185	0.7974
EF-C	0.8232	0.4713	0.7515
EF-VGG	0.8346	0.4882	0.7627

Table 4. Recognition rate of facial expression

Dataset	Facial expressionrecognition	Speech expressionrecognition
RML	0.60	0.40
AFEW6.0	0.75	0.25
eNTERFACE'05	0.80	0.20

Table 5. Weight settings on three datasets

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information