• Laser & Optoelectronics Progress
  • Vol. 57, Issue 18, 181506 (2020)
Na Pan, Min Jiang*, and Jun Kong
Author Affiliations
  • Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, Jiangsu 214122, China
  • show less
    DOI: 10.3788/LOP57.181506 Cite this Article Set citation alerts
    Na Pan, Min Jiang, Jun Kong. Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181506 Copy Citation Text show less
    Framework of action recognition network based on spatio-temporal interactive attention model
    Fig. 1. Framework of action recognition network based on spatio-temporal interactive attention model
    Local_Mask feature maps generated from UCF101 dataset. (a) Balance beam; (b) walking with dog
    Fig. 2. Local_Mask feature maps generated from UCF101 dataset. (a) Balance beam; (b) walking with dog
    Mask guided spatial attention model
    Fig. 3. Mask guided spatial attention model
    Optical flow guided temporal attention model
    Fig. 4. Optical flow guided temporal attention model
    Training and testing iteration curves of each algorithm on UCF101 dataset.(a) Proposed model; (b) proposed model with OGTAM;(c) proposed model with MGSAM;(d) proposed model with OGTAM+MGSAM
    Fig. 5. Training and testing iteration curves of each algorithm on UCF101 dataset.(a) Proposed model; (b) proposed model with OGTAM;(c) proposed model with MGSAM;(d) proposed model with OGTAM+MGSAM
    Visualization results of proposed algorithm on different datasets. (a) UCF101; (b) Penn Action
    Fig. 6. Visualization results of proposed algorithm on different datasets. (a) UCF101; (b) Penn Action
    ParameterValue
    Loss functionCategorical_cross entropy
    OptimizerAdam
    Learning rate0.0001
    Batch size18
    Epoch150(Penn Action)/250(UCF101)
    Table 1. Experimental parameters
    ModalityattentionRGBTVNet
    WithWithoutWithWithout
    3D ConvNet76.5875.4382.7981.71
    Bi-LSTM82.2280.1580.3679.38
    Table 2. Effects of optical flow guided temporal attention mechanism on UCF101 datasetunit: %
    AttentionWithWithout
    RGB85.4480.15
    TVNet82.6281.71
    RGB+TVNet92.8091.70
    Table 3. Effects of mask guided spatial attention mechanism on UCF101 dataset%
    ModelAccuracy
    VideoLSTM-two stream[15]89.2
    Two-stream MLDF-3D[16]91.3
    Two-stream HHF[17]91.2
    Proposed model91.7
    Proposed model(with OGTAM)92.2
    Proposed model(with MGSAM)92.8
    Proposed model(with OGTAM+MGSAM)94.9
    Table 4. Comparison of proposed model and other basic models on UCF101 dataset%
    ModelAccuracy
    IDT+FV[18]85.9
    IDT+HSV[19]87.9
    MIFS[20]89.1
    TSN(two modalities)[2]94.0
    Hidden two-stream[21]93.1
    MLDF-3D[16]94.4
    MS-NET[22]93.9
    Two-stream I3D[3]98.0
    Two-stream FCAN-comp[23]92.0
    VideoLSTM[15]89.2
    JSTA[11]93.7
    RSTAN[24]94.6
    VideoYOLO[10]90.6
    Proposed model91.7
    Proposed model(with OGTAM+MGSAM)94.9
    Table 5. Comparison of accuracy of different algorithms on UCF101 dataset%
    ModelAccuracy
    Good-practice CNN88.6
    JDD[25]87.4
    C3D[25]86.0
    TSN-S+T[2]93.8
    GLTF[26]86.1
    Im2Flow[27]77.4
    Spatial81.7
    Temporal83.4
    Proposed model89.3
    Proposed model(with OGTAM)90.7
    Proposed model(with MGSAM)90.6
    Proposed model(with OGTAM+MGSAM)91.7
    Table 6. Comparison of accuracy of different algorithms on Penn Action dataset%
    Na Pan, Min Jiang, Jun Kong. Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181506
    Download Citation