Human Action Recognition Combining Sequential Dynamic Images and Two-Stream Convolutional Network

Wenqiang Zhang; Zengqiang Wang; Liang Zhang

doi:10.3788/LOP202158.0210007

Abstract

In order to well model the long-term time-domain information of human action, a human action recognition algorithm based on sequential dynamic images and two-stream convolution network is proposed. First of all, the sequential dynamic images are constructed by using sequential pooling algorithm to realize the mapping of video from three-dimensional space to two-dimensional space, which is used to extract the apparent and long-term sequential information of actions. Then, a two-stream convolution network based on inceptionV3 is proposed, which includes apparent and long-time motion flow and short-time motion flow. The input of the network is sequential dynamic images and stacked frame sequence of optical flow, and it combines data augmentation, pre-trained model, and sparse sampling. Finally, the classification judgment scores output by each branch is fused by average pooling. Experimental results on UCF101 and HMDB51 datasets show that, compared with the traditional two-stream convolution network, this method can effective use the temporal and spatial information of the action, and the recognition rate can be improved greatly, which shows effectiveness and robustness.