• Laser & Optoelectronics Progress
  • Vol. 58, Issue 2, 0210017 (2021)
Tianbao Liu, Lingtao Zhang*, Wentao Yu, Dongchuan Wei, and Yijun Fan
Author Affiliations
  • College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, Hunan 410004, China
  • show less
    DOI: 10.3788/LOP202158.0210017 Cite this Article Set citation alerts
    Tianbao Liu, Lingtao Zhang, Wentao Yu, Dongchuan Wei, Yijun Fan. Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism[J]. Laser & Optoelectronics Progress, 2021, 58(2): 0210017 Copy Citation Text show less

    Abstract

    A single-layer long short term memory (LSTM) network is not generalizable to solve complex speech emotion recognition problems. Therefore, a hierarchical LSTM model with a self-attention mechanism is proposed. Penalty items are introduced to improve network performance. For the emotion recognition of video sequences, the attention mechanism is introduced to assign a weight to each video frame according to its emotional information and then classify these frames. The weighted decision fusion method is used to fuse expressions and speech signals to achieve the final emotion recognition. The experimental results demonstrate that compared with single-modal emotion recognition, the recognition accuracy of the proposed method on the selected data is improved by approximately 4%, thus the proposed method has a better recognition results.
    Tianbao Liu, Lingtao Zhang, Wentao Yu, Dongchuan Wei, Yijun Fan. Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism[J]. Laser & Optoelectronics Progress, 2021, 58(2): 0210017
    Download Citation