Human Action Recognition Algorithm Based on Bi-LSTM-Attention Model

Mingkang Zhu; Xianling Lu

doi:10.3788/LOP56.151503

[1] Buri M, Pobar M, Kos M I. An overview of action recognition in videos. [C]∥2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), May 22-26, 2017, Opatija, Croatia. New York: IEEE, 1098-1103(2017).

[2] Luo H L, Wang C J, Lu F. Survey of video behavior recognition[J]. Journal on Communications, 39, 169-180(2018).

[3] Willems G. Tuytelaars T, van Gool L. An efficient dense and scale-invariant spatio-temporal interest point detector[M]. ∥Forsyth D, Torr P, Zisserman A. Computer vision-ECCV 2008. Lecture notes in computer science. Berlin, Heidelberg: Springer, 5303, 650-663(2008).

[4] Rapantzikos K, Avrithis Y, Kollias S. Dense saliency-based spatiotemporal feature points for action recognition. [C]∥2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 20-25, 2009, Miami, FL, USA. New York: IEEE, 1454-1461(2009).

[5] Abdulmunem A, Lai Y K, Sun X F. Saliency guided local and global descriptors for effective action recognition[J]. Computational Visual Media, 2, 97-106(2016). http://d.wanfangdata.com.cn/Periodical/jsksmt-e201601009

[6] Luo J J, Wang W, Qi H R. Spatio-temporal feature extraction and representation for RGB-D human action recognition[J]. Pattern Recognition Letters, 50, 139-148(2014). http://dl.acm.org/citation.cfm?id=2944096

[7] Liu A A, Su Y T, Nie W Z et al. Hierarchical clustering multi-task learning for joint human action grouping and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 102-114(2017). http://dl.acm.org/citation.cfm?id=3024263

[8] Liu Z, Huang J T, Feng X. Action recognition model construction based on multi-scale deep convolution neural network[J]. Optics and Precision Engineering, 25, 799-805(2017).

[9] Zhu Y, Zhao J K, Wang Y N et al. A review of human action recognition based on deep learning[J]. Acta Automatica Sinica, 42, 848-857(2016).

[10] Charalampous K, Gasteratos A. On-line deep learning method for action recognition[J]. Pattern Analysis and Applications, 19, 337-354(2016). http://link.springer.com/article/10.1007/s10044-014-0404-8

[11] Ji S W, Xu W, Yang M et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 221-231(2013). http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.59

[12] Donahue J, Hendricks L A, Guadarrama S et al. Long-term recurrent convolutional networks for visual recognition and description. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE, 2625-2634(2015).

[13] Gammulle H, Denman S, Sridharan S et al. Two stream LSTM: a deep fusion framework for human action recognition. [C]∥2017 IEEE Winter Conference on Applications of Computer Vision (WACV), March 24-31, 2017, Santa Rosa, CA, USA. New York: IEEE, 177-186(2017).

[14] Li Q H, Li A H, Wang T et al. Double-stream convolutional networks with sequential optical flow image for action recognition[J]. Acta Optica Sinica, 38, 0615002(2018).

[15] Das S, Koperski M, Bremond F et al. Deep-temporal LSTM for daily living action recognition. [C]∥2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance(AVSS), November 27-30, 2018, Auckland, New Zealand. New York: IEEE, 18455900(2018).

[16] Ullah A, Ahmad J, Muhammad K et al. Action recognition in video sequences using deep bi-directional LSTM with CNN features[J]. IEEE Access, 6, 1155-1166(2018). http://ieeexplore.ieee.org/document/8121994

[17] Szegedy C, Vanhoucke V, Ioffe S et al. Rethinking the inception architecture for computer vision. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 2818-2826(2016).

[18] Ravanbakhsh M, Mousavi H, Rastegari M et al. -12-13)[2019-01-02]. https:∥arxiv., org/abs/1512, 03980(2015).

[19] Yang X D, Tian Y L. Action recognition using super sparse coding vector with spatio-temporal awareness[M]. ∥Fleet D, Pajdla T, Schiele B, et al. Computer vision-ECCV 2014. Lecture notes in computer science. Cham: Springer, 8690, 727-741(2014).

[20] Wang J, Liu Z C, Wu Y et al. Mining actionlet ensemble for action recognition with depth cameras. [C]∥2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 16-21, 2012, Providence, RI, USA. New York: IEEE, 1290-1297(2012).

[21] Peng X J, Zou C Q, Qiao Y et al. Action recognition with stacked fisher vectors[M]. ∥Fleet D, Pajdla T, Schiele B, et al. Computer vision-ECCV 2014. Lecture notes in computer science. Cham: Springer, 8693, 581-595(2014).

[22] Li Y D, Xu X P. Human action recognition by decision-making level fusion based on spatial-temporal features[J]. Acta Optica Sinica, 38, 0810001(2018).

[23] Sun L, Jia K, Chan T H et al. DL-SFA: deeply-learned slow feature analysis for action recognition. [C]∥2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 23-28, 2014, Columbus, OH, USA. New York: IEEE, 2625-2632(2014).

[24] Huang Y W, Wan C L, Feng H. Multi-feature fusion human behavior recognition algorithm based on convolutional neural network and long short term memory neural network[J]. Laser & Optoelectronics Progress, 56, 071505(2019).