• Laser & Optoelectronics Progress
  • Vol. 57, Issue 18, 181506 (2020)
Na Pan, Min Jiang*, and Jun Kong
Author Affiliations
  • Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, Jiangsu 214122, China
  • show less
    DOI: 10.3788/LOP57.181506 Cite this Article Set citation alerts
    Na Pan, Min Jiang, Jun Kong. Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181506 Copy Citation Text show less
    References

    [1] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. [C]∥Advances in neural information processing systems, December 8-13, 2014, Montreal, Quebec, Canada: Curran Associates, Inc., 568-576(2014).

    [2] Wang L M, Xiong Y J, Wang Z et al[M]. Temporal segment networks: towards good practices for deep action recognition, 20-36(2016).

    [3] Carreira J, Zisserman A. Quo vadis, action recognition? A new model and the kinetics dataset[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 21-26 July 2017, Honolulu, HI, USA., 4724-4733(2017).

    [4] Mnih V, Heess N, Graves A et al. Recurrent models of visual attention. [C]∥NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2., 2204-2212(2014).

    [5] Fan L F, Chen Y X, Wei P et al. Inferring shared attention in social scene videos[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18-23 June 2018, Salt Lake City, UT, USA., 6460-6468(2018).

    [6] Lu M L, Li Z N, Wang Y M et al. Deep attention network for egocentric action recognition[J]. IEEE Transactions on Image Processing, 28, 3703-3713(2019).

    [7] Fu J, Liu J, Tian H J et al. Dual attention network for scene segmentation[C]∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15-20 June 2019, Long Beach, CA, USA., 3141-3149(2019).

    [8] Zhu M K, Lu X L. Human action recognition algorithm based on Bi-LSTM-attention model[J]. Laser & Optoelectronics Progress, 56, 151503(2019).

    [9] Tang Y S, Tian Y, Lu J W et al. Deep progressive reinforcement learning for skeleton-based action recognition[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18-23 June 2018, Salt Lake City, UT, USA., 5323-5332(2018).

    [10] Jing L L, Yang X D, Tian Y L. Video You only look once: overall temporal convolutions for action recognition[J]. Journal of Visual Communication and Image Representation, 52, 58-65(2018).

    [11] Yu T Z, Guo C X, Wang L F et al. Joint spatial-temporal attention for action recognition[J]. Pattern Recognition Letters, 112, 226-233(2018).

    [12] Lu L H, Di H J, Lu Y et al. Spatio-temporal attention mechanisms based model for collective activity recognition[J]. Signal Processing: Image Communication, 74, 162-174(2019).

    [13] He K M, Gkioxari G, Dollár P et al. Mask R-CNN[C]∥2017 IEEE International Conference on Computer Vision (ICCV). 22-29 Oct. 2017, Venice, Italy., 2980-2988(2017).

    [14] Fan L J, Huang W B, Gan C et al. End-to-end learning of motion representation for video understanding[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18-23 June 2018, Salt Lake City, UT, USA., 6016-6025(2018).

    [15] Li Z Y, Gavrilyuk K, Gavves E et al. Video LSTM convolves, attends and flows for action recognition[J]. Computer Vision and Image Understanding, 166, 41-50(2018).

    [16] Zhang J X, Hu H F. Deep spatiotemporal relation learning with 3D multi-level dense fusion for video action recognition[J]. IEEE Access, 7, 15222-15229(2019).

    [17] Khowaja S A, Lee S L. Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition[J]. Neural Computing and Applications, 1-12(2019).

    [18] Wang H, Schmid C. Action recognition with improved trajectories[C]∥2013 IEEE International Conference on Computer Vision. 1-8 Dec. 2013, Sydney, NSW, Australia., 3551-3558(2013).

    [19] Peng X J, Wang L M, Wang X X et al. Bag of visual words and fusion methods for action recognition: comprehensive study and good practice[J]. Computer Vision and Image Understanding, 150, 109-125(2016).

    [20] Lan ZZ, LinM, Li XC, et al.Beyond Gaussian pyramid: multi-skip feature stacking for action recognition[C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7-12 June 2015, Boston, MA, USA. New York: IEEE Press, 2015: 204- 212.

    [21] Zhu Y, Lan Z Z, Newsam S et al[M]. Hidden two-stream convolutional networks for action recognition, 363-378(2019).

    [22] Tu Z G, Xie W, Dauwels J et al. Semantic cues enhanced multimodality multistream CNN for action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 29, 1423-1437(2019).

    [23] Tran A, Cheong L F. Two-stream flow-guided convolutional attention networks for action recognition[C]∥2017 IEEE International Conference on Computer Vision Workshops (ICCVW). 22-29 Oct. 2017, Venice, Italy., 3110-3119(2017).

    [24] Du W B, Wang Y L, Qiao Y. Recurrent spatial-temporal attention network for action recognition in videos[J]. IEEE Transactions on Image Processing, 27, 1347-1360(2018).

    [25] Cao C Q, Zhang Y F, Zhang C J et al. Action recognition with joints-pooled 3D deep convolutional descriptors. [C]∥IJCAI'16: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence., 3324-3330(2016).

    [26] Villegas R, Yang J, Zou Y et al. Learning to generate long-term future via hierarchical prediction. [C]∥Proceedings of the 34th International Conference on Machine Learning-Volume 70, Aug 6-11, 2017, Sydney, Australia: JMLR. org, 3560-3569(2017).

    [27] Gao RH, XiongB, GraumanK. Im2flow: motion hallucination from static images for action recognition[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18-23 June 2018, Salt Lake City, UT, USA. New York: IEEE Press, 2018: 5937- 5947.

    Na Pan, Min Jiang, Jun Kong. Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181506
    Download Citation