Algorithm for Video Temporal Action Proposal Combining Watershed and Regression Networks

Yunwen Huang; Fei Wang; Jinghong Li; Guorui Wang

doi:10.3788/CJL201946.1109001

[1] Oneata D, Verbeek J, Schmid C. The LEAR submission at thumos 2014[M]. ∥Fleet D, Pajdla T, Schiele B, et al. European conference on computer vision-ECCV 2014. Lecture notes in computer science Cham, 8692, 1-7(2014).

[2] Li Y D, Xu X P. Humanaction recognition by decision-making level fusion based on spatial-temporal features[J]. Acta Optica Sinica, 38, 0810001(2018).

[3] Li Q H, Li A H, Wang T et al. Double-stream convolutional networks with sequential optical flow image for action recognition[J]. Acta Optica Sinica, 38, 0615002(2018).

[4] Tran D, Bourdev L, Fergus R et al. Learning spatiotemporal features with 3D convolutional networks. [C]∥2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile. New York: IEEE, 4489-4497(2015).

[5] Gorban A, Idrees H, Jiang Y G et al. 2019-05-25]. http:∥www.thumos.info/.(2015).

[6] Feng X Y, Mei W, Hu D S. Aerial Target detection based on improved Faster R-CNN[J]. Acta Optica Sinica, 38, 0615004(2018).

[7] Xin P, Xu Y L, Tang H et al. Fast airplane detection based on multi-layer feature fusion of fully convolutional networks[J]. Acta Optica Sinica, 38, 0315003(2018).

[8] Felzenszwalb P F, Girshick R B. McAllester D, et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1627-1645(2010). http://ieeexplore.ieee.org/document/6756765

[9] Lazebnik S, Schmid C, Ponce J. Beyond bags of features:spatial pyramid matching for recognizing natural scene categories. [C]∥2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), June 17-22, 2006, New York, NY, USA. New York: IEEE(2006).

[10] Shou Z, Wang D, Chang S F. Temporal action localization in untrimmed videos via multi-stage CNNs. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 1049-1058(2016).

[11] Donahue J, Hendricks L A, Guadarrama S et al. Long-term recurrent convolutional networks for visual recognition and description. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE, 2625-2634(2015).

[12] Wang L M, Qiao Y, Tang X O et al. Actionness estimation using hybrid fully convolutional networks. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 2708-2717(2016).

[13] Wang L M, Xiong Y J, Wang Z et al. Temporal segment networks: towards good practices for deep action recognition[M]. ∥Leibe B, Matas J, Sebe N, et al. European conference on computer vision-ECCV 2016. lecture notes in computer science. Cham: Springer, 9912, 20-36(2016).

[14] Escorcia V, Caba Heilbron F, Niebles J C et al. DAPs: deep action proposals for action understanding[M]. ∥Leibe B, Matas J, Sebe N, et al. European conference on computer vision-ECCV 2016. lecture notes in computer science. Cham: Springer, 9907, 768-784(2016).

[15] Heilbron F C, Niebles J C, Ghanem B. Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 1914-1923(2016).

[16] Buch S, Escorcia V, Shen C Q et al. SST: single-stream temporal action proposals. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 6373-6382(2017).

[17] Lin T W, Zhao X, Su H S et al. BSN: boundary sensitive network for temporal action proposal generation[M]. ∥Ferrari V, Hebert M, Sminchisescu C, et al. European conference on computer vision-ECCV 2018. lecture notes in computer science. Cham: Springer, 11208, 3-21(2018).

[18] Shou Z, Chan J, Zareian A et al. CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 1417-1426(2017).

[19] Dai X Y, Singh B, Zhang G Y et al. Temporal context network for activity localization in videos. [C]∥2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy. New York: IEEE, 5727-5736(2017).

[20] Heilbron F C, Barrios W, Escorcia V et al. SCC: semantic context cascade for efficient action detection. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 3175-3184(2017).