Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model

Na Pan; Min Jiang; Jun Kong

doi:10.3788/LOP57.181506

Abstract

A human action recognition algorithm is proposed based on spatio-temporal interactive attention model (STIAM) to solve the problem of low recognition accuracy. This problem is caused by the incapability of the two-stream network to effectively extract the valid frames in each video and the valid regions in each frame. Initially, the proposed algorithm applies two different deep learning networks to extract spatial and temporal features respectively. Subsequently, a mask-guided spatial attention model is designed to calculate the salient regions in each frame. Then, an optical flow-guided temporal attention model is designed to locate the saliency frames in each video. Finally, the weights obtained from temporal and spatial attention are weighted respectively with spatial features and temporal features to make this model realize the spatio-temporal interaction. Compared with the existing methods on UCF101 and Penn Action datasets, the experimental results show that STIAM has high feature extraction performance and the accuracy of action recognition is obviously improved.