[1] Oneata D, Verbeek J, Schmid C. The LEAR submission at thumos 2014[M]. ∥Fleet D, Pajdla T, Schiele B,
[4] Tran D, Bourdev L, Fergus R et al. Learning spatiotemporal features with 3D convolutional networks. [C]∥2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile. New York: IEEE, 4489-4497(2015).
[5] Gorban A, Idrees H, Jiang Y G et al. 2019-05-25]. http:∥www.thumos.info/.(2015).
[9] Lazebnik S, Schmid C, Ponce J. Beyond bags of features:spatial pyramid matching for recognizing natural scene categories. [C]∥2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), June 17-22, 2006, New York, NY, USA. New York: IEEE(2006).
[10] Shou Z, Wang D, Chang S F. Temporal action localization in untrimmed videos via multi-stage CNNs. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 1049-1058(2016).
[11] Donahue J, Hendricks L A, Guadarrama S et al. Long-term recurrent convolutional networks for visual recognition and description. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE, 2625-2634(2015).
[12] Wang L M, Qiao Y, Tang X O et al. Actionness estimation using hybrid fully convolutional networks. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 2708-2717(2016).
[13] Wang L M, Xiong Y J, Wang Z et al. Temporal segment networks: towards good practices for deep action recognition[M]. ∥Leibe B, Matas J, Sebe N,
[14] Escorcia V, Caba Heilbron F, Niebles J C et al. DAPs: deep action proposals for action understanding[M]. ∥Leibe B, Matas J, Sebe N,
[15] Heilbron F C, Niebles J C, Ghanem B. Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 1914-1923(2016).
[16] Buch S, Escorcia V, Shen C Q et al. SST: single-stream temporal action proposals. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 6373-6382(2017).
[17] Lin T W, Zhao X, Su H S et al. BSN: boundary sensitive network for temporal action proposal generation[M]. ∥Ferrari V, Hebert M, Sminchisescu C,
[18] Shou Z, Chan J, Zareian A et al. CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 1417-1426(2017).
[19] Dai X Y, Singh B, Zhang G Y et al. Temporal context network for activity localization in videos. [C]∥2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy. New York: IEEE, 5727-5736(2017).
[20] Heilbron F C, Barrios W, Escorcia V et al. SCC: semantic context cascade for efficient action detection. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 3175-3184(2017).