• Opto-Electronic Engineering
  • Vol. 47, Issue 2, 190139 (2020)
Li Lianghua* and Wang Yongxiong
Author Affiliations
  • [in Chinese]
  • show less
    DOI: 10.12086/oee.2020.190139 Cite this Article
    Li Lianghua, Wang Yongxiong. Efficient 3D dense residual network and its application in human action recognition[J]. Opto-Electronic Engineering, 2020, 47(2): 190139 Copy Citation Text show less
    References

    [1] He K M, Zhang X Y, Ren S Q, et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classifica-tion[C]//2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015: 1026–1034.

    [2] Shojaeilangari S, Yau W Y, Li J, et al. Dynamic facial expression analysis based on extended spatio-temporal histogram of oriented gradients[J]. International Journal of Biometrics, 2014, 6(1): 33–52.

    [3] Scovanner P, Ali S, Shah M. A 3-dimensional sift descriptor and its application to action recognition[C]//Proceeding MM '07 Pro-ceedings of the 15th ACM international conference on Multime-dia, New York, 2007: 357–360.

    [4] Laptev I, Marszalek M, Schmid C, et al. Learning realistic human actions from movies[C]//2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 2008: 1–8.

    [5] Willems G, Tuytelaars T, Van Gool L. An efficient dense and scale-invariant spatio-temporal interest point detec-tor[C]//European Conference on Computer Vision, Berlin, 2008: 650–663.

    [6] Wang H, Schmid C. Action recognition with improved trajecto-ries[C]//2013 IEEE International Conference on Computer Vision, Sydney, 2014: 3551–3558.

    [7] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 27th In-ternational Conference on Neural Information Processing Sys-tems, Montreal, Canada, 2014: 568–576.

    [8] Yao L, Torabi A, Cho K, et al. Describing videos by exploiting temporal structure[C]//2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015: 199–211.

    [9] Shao L, Zhen X T, Tao D C, et al. Spatio-temporal laplacian pyramid coding for action recognition[J]. IEEE Transactions on Cybernetics, 2014, 44(6): 817–827.

    [10] Hara K, Kataoka H, Satoh Y. Learning spatio-temporal features with 3D residual networks for action recognition[C]//2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, 2017: 3154–3160.

    [11] Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 2015: 448–456.

    [12] Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017: 2261–2269.

    [13] Song T Z, Song Y, Wang Y X, et al. Residual network with dense block[J]. Journal of Electronic Imaging, 2018, 27(5): 053036.

    [15] Ji S W, XuW, Yang M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221–231.

    [16] Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//2015 IEEE Interna-tional Conference on Computer Vision (ICCV), Santiago, 2014: 4489–4497.

    [17] Qiu Z F, Yao T, Mei T. Learning spatio-temporal representation with pseudo-3D residual networks[C]//2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017: 5534–5542.

    [18] He K M, Sun J. Convolutional neural networks at constrained time cost[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 2015: 5353–5360.

    [19] Soomro K, Zamir A R, Shah M. UCF101: A dataset of 101 hu-man actions classes from videos in the wild[Z]. arXiv:1212.0402, 2012.

    [20] Tran D, Torresani L. EXMOVES: mid-level features for efficient action recognition and video analysis[J]. International Journal of Computer Vision, 2016, 119(3): 239–253.

    [22] Wang X H, Gao L L, Wang P, et al. Two-stream 3-D convNet fusion for action recognition in videos with arbitrary size and length[J]. IEEE Transactions on Multimedia, 2018, 20(3): 634–644.

    [23] Varol G, Laptev I, Schmid C. Long-term temporal convolutions for action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1510–1517.

    CLP Journals

    [1] Li Xun, Li Linpeng, Alexander Lazovik, Wang Wenjie, Wang Xiaohua. RGB-D object recognition algorithm based on improved double stream convolution recursive neural network[J]. Opto-Electronic Engineering, 2021, 48(2): 200069

    Li Lianghua, Wang Yongxiong. Efficient 3D dense residual network and its application in human action recognition[J]. Opto-Electronic Engineering, 2020, 47(2): 190139
    Download Citation