Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model

Na Pan; Min Jiang; Jun Kong

doi:10.3788/LOP57.181506

Journals >Laser & Optoelectronics Progress >Volume 57 >Issue 18 >Page 181506 > Article

Laser & Optoelectronics Progress
Vol. 57, Issue 18, 181506 (2020)

Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model

Na Pan, Min Jiang^*, and Jun Kong

Author Affiliations

Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, Jiangsu 214122, China

show less

DOI: 10.3788/LOP57.181506 Cite this Article Set citation alerts

Na Pan, Min Jiang, Jun Kong. Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181506 Copy Citation Text

show less

Fig. 1. Framework of action recognition network based on spatio-temporal interactive attention model

Download full size

Fig. 2. Local_Mask feature maps generated from UCF101 dataset. (a) Balance beam; (b) walking with dog

Download full size

Fig. 3. Mask guided spatial attention model

Download full size

Fig. 4. Optical flow guided temporal attention model

Download full size

Fig. 5. Training and testing iteration curves of each algorithm on UCF101 dataset.(a) Proposed model; (b) proposed model with OGTAM;(c) proposed model with MGSAM;(d) proposed model with OGTAM+MGSAM

Download full size

Fig. 6. Visualization results of proposed algorithm on different datasets. (a) UCF101; (b) Penn Action

Download full size

Parameter	Value
Loss function	Categorical_cross entropy
Optimizer	Adam
Learning rate	0.0001
Batch size	18
Epoch	150(Penn Action)/250(UCF101)

Table 1. Experimental parameters

Modalityattention	RGB		TVNet
Modalityattention	With	Without	With	Without
3D ConvNet	76.58	75.43	82.79	81.71
Bi-LSTM	82.22	80.15	80.36	79.38

Table 2. Effects of optical flow guided temporal attention mechanism on UCF101 datasetunit: %

Attention	With	Without
RGB	85.44	80.15
TVNet	82.62	81.71
RGB+TVNet	92.80	91.70

Table 3. Effects of mask guided spatial attention mechanism on UCF101 dataset%

Model	Accuracy
VideoLSTM-two stream^[15]	89.2
Two-stream MLDF-3D^[16]	91.3
Two-stream HHF^[17]	91.2
Proposed model	91.7
Proposed model(with OGTAM)	92.2
Proposed model(with MGSAM)	92.8
Proposed model(with OGTAM+MGSAM)	94.9

Table 4. Comparison of proposed model and other basic models on UCF101 dataset%

Model	Accuracy
IDT+FV^[18]	85.9
IDT+HSV^[19]	87.9
MIFS^[20]	89.1
TSN(two modalities)^[2]	94.0
Hidden two-stream^[21]	93.1
MLDF-3D^[16]	94.4
MS-NET^[22]	93.9
Two-stream I3D^[3]	98.0
Two-stream FCAN-comp^[23]	92.0
VideoLSTM^[15]	89.2
JSTA^[11]	93.7
RSTAN^[24]	94.6
VideoYOLO^[10]	90.6
Proposed model	91.7
Proposed model(with OGTAM+MGSAM)	94.9

Table 5. Comparison of accuracy of different algorithms on UCF101 dataset%

Model	Accuracy
Good-practice CNN	88.6
JDD^[25]	87.4
C3D^[25]	86.0
TSN-S+T^[2]	93.8
GLTF^[26]	86.1
Im2Flow^[27]	77.4
Spatial	81.7
Temporal	83.4
Proposed model	89.3
Proposed model(with OGTAM)	90.7
Proposed model(with MGSAM)	90.6
Proposed model(with OGTAM+MGSAM)	91.7

Table 6. Comparison of accuracy of different algorithms on Penn Action dataset%

Na Pan, Min Jiang, Jun Kong. Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181506

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information