Improved Human Action Recognition Algorithm Based on Two-Stream Faster Region Convolutional Neural Network

Ruyi Guo; Jie Jin; Gaohua Liu; Kaiyan Liu; Shiqi Jiang

doi:10.3788/LOP57.241506

[1] Wang H Y, Qi J, Fang T E et al. Dynamic hand gesture recognition based on track template matching[J]. Microcontrollers & Embedded Systems, 17, 39-43, 46(2017).

[2] Chen S D, He B Q, Chen S Y et al. Human action recognition based on spatio-temporal interest point[J]. Journal of Chengdu University of Information Technology, 33, 143-148(2018).

[3] Dong S G, Hu D D, Li R J et al. Human action recognition based on foreground trajectory and motion difference descriptors[J]. Applied Sciences, 9, 2126(2019).

[4] Tran D, Bourdev L, Fergus R et al. Learning spatiotemporal features with 3D convolutional networks. [C]∥2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile. New York: IEEE, 4489-4497(2015).

[5] Lea C, Flynn M D, Vidal R et al. Temporal convolutional networks for action segmentation and detection. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 1003-1012(2017).

[6] Feichtenhofer C, Pinz A, Wildes R P[2020-04-15]. Spatiotemporal residual networks for video action recognition [2020-04-15].https:∥arxiv., org/abs/1611, 02155.

[7] Zan B F, Kong J, Jiang M. Human action recognition based on discriminative collaborative representation classifier[J]. Laser & Optoelectronics Progress, 55, 011010(2018).

[8] Liu F, Yu F Q. Human action recognition based on global and local features[J]. Laser & Optoelectronics Progress, 57, 021004(2020).

[9] Wang H, Kläser A, Schmid C et al. Dense trajectories and motion boundary descriptors for action recognition[J]. International Journal of Computer Vision, 103, 60-79(2013).

[10] Wang H, Schmid C. Action recognition with improved trajectories. [C]∥2013 IEEE International Conference on Computer Vision, December 1-8, 2013, Sydney, NSW, Australia. New York: IEEE, 3551-3558(2013).

[11] Chen L C, Papandreou G, Kokkinos I et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848(2018).

[12] Huang Y W, Wan C L, Feng H. Multi-feature fusion human behavior recognition algorithm based on convolutional neural network and long short term memory neural network[J]. Laser & Optoelectronics Progress, 56, 071505(2019).

[13] Shrivastava A, Gupta A, Girshick R[2020-04-13]. Training region-based object detectors with online hard example mining [2020-04-13].https:∥arxiv., org/abs/1604, 03540.

[14] Peng X J, Schmid C. Multi-region two-stream R-CNN for action detection[M]. ∥Leibe B, Matas J, Sebe N, et al. Computer Vision-ECCV 2016. Lecture Notes in Computer Science. Cham: Springer, 9908, 744-759(2016).

[15] Gkioxari G, Girshick R, Dollár P et al[2020-04-13]. Detecting and recognizing human-object interactions [2020-04-13].https:∥arxiv., org/abs/1704, 07333.

[16] Ren S Q, He K M, Girshick R et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149(2017).

[17] Hu J, Shen L, Albanie S et al[2020-04-12]. Squeeze-and-excitation networks [2020-04-12].https:∥arxiv., org/abs/1709, 01507.

[18] Howard A, Sandler M, Chu G et al[2020-04-14]. Searching for MobileNetV3 [2020-04-14].https:∥arxiv., org/pdf/1905, 02244.

[19] Zheng Z H, Wang P, Liu W et al. Distance-IoU loss: faster and better learning for bounding box regression[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12993-13000(2020).

[20] Wang L M, Qiao Y, Tang X O[2020-04-15]. Action recognition with trajectory-pooled deep-convolutional descriptors [2020-04-15].https:∥arxiv., org/abs/1505, 04868.

[21] Feichtenhofer C, Pinz A, Wildes R P[2020-04-11]. Spatiotemporal residual networks for video action recognition [2020-04-11].https:∥arxiv., org/abs/1611, 02155.