Video Anomaly Event Detection Based on Two-Stream Residual Network

WANG Zixu; JIN Lizuo; ZHANG Shan; SU Guowei; CHEN Ruijie

doi:10.3969/j.issn.1671-637x.2022.08.016

[1] JI S W, XU W, YANG M, et al.3D convolutional neural networks for human action recognition［J］.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(1):221-231.

[2] QIU Z F, YAO T, MEI T.Learning spatio-temporal representation with pseudo-3D residual networks［C］//IEEE International Conference on Computer Vision.Venice:IEEE, 2017:5534-5542.

[3] TRAN D, WANG H, TORRESANI L, et al.A closer look at spatiotemporal convolutions for action recognition［C］//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE, 2018:6450-6459.

[4] SUN L, JIA K, YEUNG D Y, et al.Human action recognition using factorized spatio-temporal convolutional networks［C］//IEEE International Conference on Computer Vision.Santiago:IEEE, 2015:4597-4605.

[5] ZHOU Y Z, SUN X Y, ZHA Z J, et al.MiCT:mixed 3D/2D convolutional tube for human action recognition［C］//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE, 2018:449-458.

[6] SIMONYAN K, ZISSERMAN A.Two-stream convolutional networks for action recognition in videos［EB/OL］.(2014-11-12)［2022-04-13］.https://arxiv.org/abs/1406.2199.

[7] FEICHTENHOFER C, PINZ A, ZISSERMAN A.Convolutional two-stream network fusion for video action recognition［C］//IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE, 2016:1933-1941.

[8] DONAHUE J, HENDRICKS L A, GUADARRAMA S, et al.Long-term recurrent convolutional networks for visual re-cognition and description［C］//IEEE Conference on Computer Vision and Pattern Recognition.Boston:IEEE, 2015:2625-2634.

[9] NG J Y H, HAUSKNECHT M, VIJAYANARASIMHAN S, et al.Beyond short snippets:deep networks for video classification［C］//IEEE Conference on Computer Vision and Pattern Recognition.Boston:IEEE, 2015:4694-4702.

[11] WANG L M, XIONG Y J, WANG Z, et al.Temporal segment networks:towards good practices for deep action recognition［C］//European Conference on Computer Vision.Amsterdam:Springer, 2016:20-36.

[12] ZHOU B L, ANDONIAN A, OLIVA A, et al.Temporal relational reasoning in videos［C］//European Conference on Computer Vision.Munich:Springer, 2018:831-846.

[13] LIN J, GAN C, HAN S.TSM:temporal shift module for efficient video understanding［C］//IEEE/CVF International Conference on Computer Vision.Seoul:IEEE, 2019:7082-7092.

[14] FEICHTENHOFER C, PINZ A, WILDES R P.Spatiotemporal residual networks for video action recognition［C］//Proceedings of the 30th International Conference on Neural Information Processing Systems.Barcelona:Curran Associates Inc., 2016:3476-3484.

[15] FEICHTENHOFER C, PINZ A, WILDES R P.Spatiotemporal multiplier networks for video action recognition［C］//IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE, 2017:7445-7454.

[16] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition［C］//IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE, 2016:770-778.

[17] WANG L M, XIONG Y J, WANG Z, et al.Temporal segment networks for action recognition in videos［J］.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(11):2740-2755.

[18] SULTANI W, CHEN C, SHAH M.Real-world anomaly detection in surveillance videos［C］//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE, 2018:6479-6488.

[19] WU P, LIU J, SHI Y J, et al.Not only look, but also li-sten:learning multimodal violence detection under weak supervision［C］//European Conference on Computer Vision.Cham:Springer, 2020:322-339.