Video Classification Based on Three-Dimensional Squeeze Excitation Module

Ningxiao Li; Guodong Wang; Yanjie Wang; Shiyu Hu; Liangliang Wang

doi:10.3788/LOP56.121004

[1] Wang K Z, Wang X L, Lin L et al. 3D human activity recognition with reconfigurable convolutional neural networks[C]∥Proceedings of the 22nd ACM International Conference on Multimedia, November 3-7, 2017, Orlando, Florida, USA., 97-106(2014).

[2] Liu H, Peng L, Wen J W. Multi-scale aware pedestrian detection algorithm based on improved full convolutional network[J]. Laser & Optoelectronics Progress, 55, 091504(2018).

[3] Karpathy A, Toderici G, Shetty S et al. Large-scale video classification with convolutional neural networks. [C]∥2014 IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2014, Columbus, OH, USA. New York: IEEE, 1725-1732(2014).

[4] Deng J, Dong W, Socher R et al. ImageNet: a large-scale hierarchical image database. [C]∥2009 IEEE Conference on Computer Vision and Pattern Recognition, June 20-25, 2009, Miami, FL, USA. New York: IEEE, 248-255(2009).

[5] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 770-778(2016).

[6] Kay W, Carreira J, Simonyan K et al. -05-19)[2018-11-15]. https:∥arxiv., org/abs/1705, 06950(2017).

[7] Soomro K, Zamir A R. -12-03)[2018-11-15]. https:∥arxiv., org/abs/1212, 0402(2012).

[8] Kuehne H, Jhuang H, Stiefelhagen R et al. HMDB51: a large video database for human motion recognition[M]. ∥ Nagel W, Kröner D, Resch M. High Performance Computing in Science and Engineering ‘12. Berlin, Heidelberg: Springer, 571-582(2012).

[9] Huang G, Liu Z. Maaten L V D, et al. Densely connected convolutional networks. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 2261-2269(2017).

[10] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA. New York: IEEE, 7132-7141(2018).

[11] Hara K, Kataoka H, Satoh Y. Learning spatio-temporal features with 3D residual networks for action recognition. [C]∥2017 IEEE International Conference on Computer Vision Workshops (ICCVW), October 22-29, 2017, Venice, Italy. New York: IEEE, 3154-3160(2017).

[12] Xu H Y, Kong J, Jiang M et al. Action recognition based on histogram of spatio-temporal oriented principal components[J]. Laser & Optoelectronics Progress, 55, 061009(2018).

[13] Hara K, Kataoka H, Satoh Y. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA. New York: IEEE, 6546-6555(2018).

[14] Liu D, Zhou Y Z, Sun X Y et al. Adaptive pooling in multi-instance learning for web video annotation. [C]∥2017 IEEE International Conference on Computer Vision Workshops (ICCVW), October 22-29, 2017, Venice, Italy. New York: IEEE, 318-327(2017).

[15] Nair V, Hinton G E. Rectified linear units improve restricted Boltzmann machines. [C]∥27th International Conference on Machine Learning, 2010, Haifa, Israel. Omnipress, 807-814(2010).

[16] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 9, 1735-1780(1997).

[17] He K M, Zhang X Y, Ren S Q et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. [C]∥2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile. New York: IEEE, 1026-1034(2015).

[18] Bordes A, Bottou L, Gallinari P. SGD-QN: careful quasi-Newton stochastic gradient descent[J]. Journal of Machine Learning Research, 10, 1737-1754(2009). http://dl.acm.org/citation.cfm?id=1755842

[19] Kingma D P. -01-30)[2018-11-15]. https:∥arxiv., org/abs/1412, 6980(2017).

[20] Liu F, Liu P Y, Zhang J N et al. Joint detection of RGB-D images based on double flow convolutional neural network[J]. Laser & Optoelectronics Progress, 55, 021503(2018).

[21] Szegedy C, Liu W, Jia Y Q et al. Going deeper with convolutions. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE, 7298594(2015).

[22] Ng J Y H, Hausknecht M, Vijayanarasimhan S et al. . Beyond short snippets: deep networks for video classification. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE, 4694-4702(2015).

[23] Wang H, Schmid C. Action recognition with improved trajectories. [C]∥2013 IEEE International Conference on Computer Vision, December 1-8, 2013, Sydney, NSW, Australia. New York: IEEE, 3551-3558(2013).

[24] Qiu Z F, Yao T, Mei T. Learning spatio-temporal representation with pseudo-3D residual networks. [C]∥2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy. New York: IEEE, 5534-5542(2017).

[25] Tran D, Bourdev L, Fergus R et al. Learning spatiotemporal features with 3D convolutional networks. [C]∥2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile. New York: IEEE, 4489-4497(2015).