• Laser & Optoelectronics Progress
  • Vol. 58, Issue 2, 0210017 (2021)
Tianbao Liu, Lingtao Zhang*, Wentao Yu, Dongchuan Wei, and Yijun Fan
Author Affiliations
  • College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, Hunan 410004, China
  • show less
    DOI: 10.3788/LOP202158.0210017 Cite this Article Set citation alerts
    Tianbao Liu, Lingtao Zhang, Wentao Yu, Dongchuan Wei, Yijun Fan. Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism[J]. Laser & Optoelectronics Progress, 2021, 58(2): 0210017 Copy Citation Text show less
    Flow chart of audio and video emotion recognition system
    Fig. 1. Flow chart of audio and video emotion recognition system
    Structure of recursive neuron
    Fig. 2. Structure of recursive neuron
    Schematic of attention mechanism
    Fig. 3. Schematic of attention mechanism
    Schematic of stacking LSTM model with attention mechanism
    Fig. 4. Schematic of stacking LSTM model with attention mechanism
    Diagram of video emotion recognition system
    Fig. 5. Diagram of video emotion recognition system
    Relationship between LSTM layers and recognition rate
    Fig. 6. Relationship between LSTM layers and recognition rate
    Performance comparison of different feature fusion algorithms
    Fig. 7. Performance comparison of different feature fusion algorithms
    NetworkRMLAFEW6.0eNTERFACE'05
    SVM[23]0.60200.37900.4831
    Random forest[24]0.65280.35080.4711
    LSTM+CNN[25]0.85460.4915
    CNN0.83630.4691
    CNN+LSTM0.84460.42170.4952
    Proposed network0.90110.54730.5932
    Table 1. Comparison of recognition rate in speech emotion recognition experiment
    Dataset3-layer LSTM
    OrdinaryAdd attentionmechanism
    RML0.86610.8873
    AFEW6.00.46330.4965
    eNTERFACE'050.53150.5739
    Table 2. Recognition rate comparison of hierarchical attention mechanism
    DatasetOrdinaryAdd penalty
    RML0.88730.9011
    AFEW6.00.49650.5473
    eNTERFACE'050.57390.5932
    Table 3. Recognition rate comparison under penalty items
    Video sequencefeatureRMLAFEW6.0eNTERFACE'05
    EF-A0.86530.50740.7458
    EF-B0.88120.51850.7974
    EF-C0.82320.47130.7515
    EF-VGG0.83460.48820.7627
    Table 4. Recognition rate of facial expression
    DatasetFacial expressionrecognitionSpeech expressionrecognition
    RML0.600.40
    AFEW6.00.750.25
    eNTERFACE'050.800.20
    Table 5. Weight settings on three datasets
    Tianbao Liu, Lingtao Zhang, Wentao Yu, Dongchuan Wei, Yijun Fan. Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism[J]. Laser & Optoelectronics Progress, 2021, 58(2): 0210017
    Download Citation