• Laser & Optoelectronics Progress
  • Vol. 58, Issue 24, 2400005 (2021)
Jian Lu, Tengfei Yang*, Bo Zhao, Hangying Wang, Maoxin Luo, Yanran Zhou, and Zhe Li
Author Affiliations
  • School of Electronics and Information, Xi’an Polytechnic University, Xi’an, Shaanxi 710048, China
  • show less
    DOI: 10.3788/LOP202158.2400005 Cite this Article Set citation alerts
    Jian Lu, Tengfei Yang, Bo Zhao, Hangying Wang, Maoxin Luo, Yanran Zhou, Zhe Li. Review of Deep Learning-Based Human Pose Estimation[J]. Laser & Optoelectronics Progress, 2021, 58(24): 2400005 Copy Citation Text show less
    References

    [1] Wang C Y, Wang Y Z, Yuille A L. An approach to pose-based action recognition[C]. //2013 IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2013, Portland, OR, USA., 915-922(2013).

    [2] Liang Z J, Wang X L, Huang R et al. An expressive deep model for human action parsing from a single image[C]. //2014 IEEE International Conference on Multimedia and Expo (ICME), July 14-18, 2014, Chengdu, China.(2014).

    [3] Cho N G, Yuille A L, Lee S W. Adaptive occlusion state estimation for human pose tracking under self-occlusions[J]. Pattern Recognition, 46, 649-661(2013).

    [4] Nie B X, Xiong C M, Zhu S C. Joint action recognition and pose estimation from video[C]. //2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA., 1293-1301(2015).

    [5] Huang Y W, Zhao P, You Y D. Pose-guided human image synthesis based on fusion feature feedback mechanism[J]. Laser & Optoelectronics Progress, 57, 141011(2020).

    [6] Shotton J, Fitzgibbon A, Cook M et al. Real-time human pose recognition in parts from single depth images[C]. //2011 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), June 20-25, 2011, Colorado Springs, CO, USA., 1297-1304(2011).

    [7] Ionescu C, Li F X, Sminchisescu C. Latent structured models for human pose estimation[C]. //2011 International Conference on Computer Vision, November 6-13, 2011, Barcelona, Spain., 2220-2227(2011).

    [8] LeCun Y, Ranzato M. Deep learning[M](2011).

    [9] Pishchulin L, Andriluka M, Gehler P et al. Poselet conditioned pictorial structures[C]. //2013 IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2013, Portland, OR, USA., 588-595(2013).

    [10] Lifshitz I, Fetaya E, Ullman S. Human pose estimation using deep consensus voting[M]. //Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science, 9906, 246-260(2016).

    [11] Ke L P, Chang M C, Qi H G et al. Multi-scale structure-aware network for human pose estimation[M]. //Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018. Lecture notes in computer science, 11206, 731-746(2018).

    [12] Johnson S, Everingham M. Clustered pose and nonlinear appearance models for human pose estimation[C]. //Procedings of the British Machine Vision Conference 2010, August 31-September 3, 2010, Aberystwyth(2010).

    [13] Chen X J, Yuille A L. Articulated pose estimation by a graphical model with image dependent pairwise relations[C]. //Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13, 2014, Montreal, Quebec, Canada, 1736-1744(2014).

    [14] Varamesh A, Tuytelaars T. Mixture dense regression for object detection and human pose estimation[C]. //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13-19, 2020, Seattle, WA, USA., 13086-13095(2020).

    [15] Toshev A, Szegedy C. DeepPose: human pose estimation via deep neural networks[C]. //2014 IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2014, Columbus, OH, USA, 1653-1660(2014).

    [16] Pfister T, Simonyan K, Charles J et al. Deep convolutional neural networks for efficient pose estimation in gesture videos[M]. //Cremers D, Reid I, Saito H, et al. Computer vision-ACCV 2014. Lecture notes in computer science, 9003, 538-552(2015).

    [17] Fan X C, Zheng K, Lin Y W et al. Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation[C]. //2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA, 1347-1355(2015).

    [18] Pfister T, Charles J, Zisserman A. Flowing ConvNets for human pose estimation in videos[C]. //2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile., 1913-1921(2015).

    [19] Bulat A, Tzimiropoulos G. Human pose estimation via convolutional part heatmap regression[M]. //Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science, 9911, 717-732(2016).

    [20] Zhang N, Shelhamer E, Gao Y et al. Fine-grained pose prediction, normalization, and recognition[EB/OL]. (2015-11-22)[2020-11-10]. https://arxiv.org/abs/1511.07063

    [21] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition[C]. //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA., 770-778(2016).

    [22] Chu X, Yang W, Ouyang W L et al. Multi-context attention for human pose estimation[C]. //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA., 5669-5678(2017).

    [23] Artacho B, Savakis A. UniPose: unified human pose estimation in single images and videos[C]. //2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 13-19, 2020, Seattle, WA, USA, 7033-7042(2020).

    [24] Artacho B, Savakis A. Waterfall atrous spatial pooling architecture for efficient semantic segmentation[J]. Sensors, 19, 5361(2019).

    [25] Chen L C, Papandreou G, Kokkinos I et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848(2018).

    [26] Chou C J, Chien J T, Chen H T. Self adversarial training for human pose estimation[C]. //2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), November 12-15, 2018, Honolulu, HI, USA., 17-30(2018).

    [27] Chen Y, Shen C H, Wei X S et al. Adversarial PoseNet: a structure-aware convolutional network for human pose estimation[C]. //2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy, 1221-1230(2017).

    [28] Yang W, Li S, Ouyang W L et al. Learning feature pyramids for human pose estimation[C]. //2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy., 1290-1299(2017).

    [29] Newell A, Yang K Y, Deng J. Stacked hourglass networks for human pose estimation[M]. //Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science, 9912, 483-499(2016).

    [30] Tian Y D, Zitnick C L, Narasimhan S G. Exploring the spatial hierarchy of mixture models for human pose estimation[M]. //Fitzgibbon A, Lazebnik S, Perona P, et al. Computer vision-ECCV 2012. Lecture notes in computer science, 7576, 256-269(2012).

    [31] Rothrock B, Park S, Zhu S C. Integrating grammar and segmentation for human pose estimation[C]. //2013 IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2013, Portland, OR, USA., 3214-3221(2013).

    [32] Park S, Zhu S C. Attributed grammars for joint estimation of human attributes, part and pose[C]. //2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile., 2372-2380(2015).

    [33] Tang W, Yu P, Wu Y. Deeply learned compositional models for human pose estimation[C]. //Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018. Lecture notes in computer science, 11207, 197-214(2018).

    [34] Tompson J J, Jain A. LeCun Y, et al. Joint training of a convolutional network and a graphical model for human pose estimation[C]. //Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13, 2014, Montreal, Quebec, Canada, 1799-1807(2014).

    [35] Gkioxari G, Hariharan B, Girshick R et al. Using k-poselets for detecting people and localizing their keypoints[C]. //2014 IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2014, Columbus, OH, USA., 3582-3589(2014).

    [36] Chen X J, Yuille A. Parsing occluded people by flexible compositions[C]. //2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA., 3945-3945(2015).

    [37] Yan F T, Wang P, Lü Z G et al. Real-time multi-person video-based pose estimation[J]. Laser & Optoelectronics Progress, 57, 021006(2020).

    [38] Pishchulin L, Insafutdinov E, Tang S Y et al. DeepCut: joint subset partition and labeling for multi person pose estimation[C]. //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA, 4929-4937(2016).

    [39] Insafutdinov E, Pishchulin L, Andres B et al. DeeperCut: a deeper, stronger, and faster multi-person pose estimation model[M]. //Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science, 9910, 34-50(2016).

    [40] Cheng B W, Xiao B, Wang J D et al. HigherHRNet: scale-aware representation learning for bottom-up human pose estimation[C]. //2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 13-19, 2020, Seattle, WA, USA, 5385-5394(2020).

    [41] Pishchulin L, Jain A, Andriluka M et al. Articulated people detection and pose estimation: reshaping the future[C]. //2012 IEEE Conference on Computer Vision and Pattern Recognition, June 16-21, 2012, Providence, RI, USA, 3178-3185(2012).

    [42] Sun M, Savarese S. Articulated part-based model for joint object detection and pose estimation[C]. //2011 International Conference on Computer Vision, November 6-13, 2011, Barcelona, Spain., 723-730(2011).

    [43] Huang J J, Zhu Z, Guo F et al. The devil is in the details: delving into unbiased data processing for human pose estimation[C]. //2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 13-19, 2020, Seattle, WA, USA, 5699-5708(2020).

    [44] Iqbal U, Gall J. Multi-person pose estimation with local joint-to-person associations[M]. //Hua G, Jégou H. Computer vision-ECCV 2016 workshops. Lecture notes in computer science, 9914, 627-642(2016).

    [45] Papandreou G, Zhu T, Kanazawa N et al. Towards accurate multi-person pose estimation in the wild[[C]. //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA., 3711-3719(2017).

    [46] Chen Y L, Wang Z C, Peng Y X et al. Cascaded pyramid network for multi-person pose estimation[C]. //2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA., 7103-7112(2018).

    [47] Fang H S, Xie S Q, Tai Y W et al. RMPE: regional multi-person pose estimation[C]. //2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy, 2353-2362(2017).

    [48] Cao Z, Simon T, Wei S H et al. Realtime multi-person 2D pose estimation using part affinity fields[C]. //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA., 1302-1310(2017).

    [49] Lin T Y, Maire M, Belongie S J et al. Microsoft COCO: common objects in context[M]. //Fleet D, Pajdla T, Schiele B, et al. Computer vision-ECCV 2014. Lecture notes in computer science, 8693, 740-755(2014).

    [50] Su Z H, Ye M, Zhang G H et al. Cascade feature aggregation for human pose estimation[EB/OL]. (2019-02-21)[2020-11-10]. https://arxiv.org/abs/1902.07837

    [51] Dong J T, Jiang W, Huang Q X et al. Fast and robust multi-person 3D pose estimation from multiple views[C]. //2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15-20, 2019, Long Beach, CA, USA., 7784-7793(2019).

    [52] Chollet F. Xception: deep learning with depthwise separable convolutions[C]. //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA, 1800-1807(2017).

    [53] Xie S N, Girshick R, Dollár P et al. Aggregated residual transformations for deep neural networks[C]. //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA., 5987-5995(2017).

    [54] Zhang X Y, Zhou X Y, Lin M X et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]. //2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA, 6848-6856(2018).

    [55] Osokin D. Real-time 2D multi-person pose estimation on CPU: lightweight OpenPose[C]. //Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods, February 19-21, 2019, Prague, Czech Republic, 744-748(2019).

    [56] Howard A G, Zhu M L, Chen B et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. (2019-04-17)[2020-11-10]. https://arxiv.org/abs/1704.04861

    [57] Zhang F, Zhu X T, Ye M. Fast human pose estimation[C]. //2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15-20, 2019, Long Beach, CA, USA., 3512-3521(2019).

    [58] Zhang T, Qi G J, Xiao B et al. Interleaved group convolutions for deep neural networks[EB/OL]. (2017-07-10)[2020-11-10]. https://arxiv.org/abs/1707.02725

    [59] Bertasius G, Feichtenhofer C, Tran D et al. Learning temporal pose estimation from sparsely-labeled videos[EB/OL]. (2019-06-06)[2020-11-10]. https://arxiv.org/abs/1906.04016

    Jian Lu, Tengfei Yang, Bo Zhao, Hangying Wang, Maoxin Luo, Yanran Zhou, Zhe Li. Review of Deep Learning-Based Human Pose Estimation[J]. Laser & Optoelectronics Progress, 2021, 58(24): 2400005
    Download Citation