Text Detection Based on Split-Attention and Path Enhancement Feature Pyramid

Qi Cheng; Guodong Wang; Yi Zhao

doi:10.3788/LOP57.241023

[1] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 770-778(2016).

[2] Huang G. Liu Z,van der Maaten L, et al. Densely connected convolutional networks. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 2261-2269(2017).

[3] Li X, Wang W H, Hu X L et al. Selective kernel networks. [C]∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15-20, 2019, Long Beach, CA, USA. New York: IEEE, 510-519(2019).

[4] Zhu X Z, Hu H, Lin S et al. Deformable ConvNets V2: more deformable, better results. [C]∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15-20, 2019, Long Beach, CA, USA. New York: IEEE, 9300-9308(2019).

[5] Chen L L, Zhang Z D, Peng L. Real-time detection based on improved single shot MultiBox detector[J]. Laser & Optoelectronics Progress, 56, 011001(2019).

[6] Wang D C, Chen X N, Zhao F et al. Vehicle detection algorithm based on convolutional neural network and RGB-D images[J]. Laser & Optoelectronics Progress, 56, 181003(2019).

[7] Shi F F, Zhang S L, Peng L. Salient object detection based on deep residual networks and edge supervised learning[J]. Laser & Optoelectronics Progress, 56, 151502(2019).

[8] Liu Y L, Jin L W, Zhang S T et al. Curved scene text detection via transverse and longitudinal sequence connection[J]. Pattern Recognition, 90, 337-345(2019). http://www.researchgate.net/publication/330878708_Curved_Scene_Text_Detection_via_Transverse_and_Longitudinal_Sequence_Connection

[9] Liu X H, Sun S Y, Gu L P et al. 3D object detection based on improved Frustum PointNet[J]. Laser & Optoelectronics Progress, 57, 201508(2020).

[10] Xue C H, Lu S J, Zhang W[2020-05-22]. MSR: multi-scale shape regression for scene text detection [2020-05-22].https:∥arxiv., org/abs/1901, 02596.

[11] Tian Z T, Shu M, Lyu P Y et al. Learning shape-aware embedding for scene text detection. [C]∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15-20, 2019, Long Beach, CA, USA. New York: IEEE, 4229-4238(2019).

[12] Liao M H, Shi B G, Bai X et al[2020-05-28]. Textboxes: a fast text detector with a single deep neural network [2020-05-28].https:∥arxiv., org/abs/1611, 06779.

[13] Liao M H, Shi B G, Bai X. TextBoxes: a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 27, 3676-3690(2018).

[14] He P, Huang W L, He T et al. Single shot text detector with regional attention. [C]∥2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy. New York: IEEE, 3066-3074(2017).

[15] Liao M H, Zhu Z, Shi B G et al. Rotation-sensitive regression for oriented scene text detection. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA. New York: IEEE, 5909-5918(2018).

[16] Zhang Z, Zhang C Q, Shen W et al. Multi-oriented text detection with fully convolutional networks. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 4159-4167(2016).

[17] Yao C, Bai X, Sang N et al[2020-05-22]. Scene text detection via holistic, multi-channel prediction [2020-05-22].https:∥arxiv., org/abs/1606, 09002.

[18] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE, 3431-3440(2015).

[19] Lyu P Y, Yao C, Wu W H et al. Multi-oriented scene text detection via corner localization and region segmentation. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA. New York: IEEE, 7553-7563(2018).

[20] Deng D, Liu H F, Li X L et al[2020-05-25]. PixeLink: detecting scene text via instance segmentation [2020-05-25].https:∥arxiv., org/abs/1606, 09002.

[21] Xie E Z, Zang Y H, Shao S et al[2020-05-25]. Scene text detection with supervised pyramid context network [2020-05-25].https:∥arxiv., org/abs/1811, 08605.

[22] Wang W H, Xie E Z, Li X et al. Shape robust text detection with progressive scale expansion network. [C]∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15-20, 2019, Long Beach, CA, USA. New York: IEEE, 9328-9337(2019).

[23] Tan M X, Le Q V[2020-05-25]. EfficientNet: rethinking model scaling for convolutional neural networks [2020-05-25].https:∥arxiv., org/abs/1905, 11946.

[24] Lin T Y, Dollár P, Girshick R et al. Feature pyramid networks for object detection. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 936-944(2017).

[25] Zhang H, Wu C R, Zhang Z Y et al[2020-05-28]. ResNeSt: split-attention networks [2020-05-28].https:∥arxiv., org/abs/2004, 08955.

[26] Chollet F. Xception:deep learning with depthwise separable convolutions. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 1800-1807(2017).

[27] de Boer P T, Kroese D P, Mannor S et al. A tutorial on the cross-entropy method[J]. Annals of Operations Research, 134, 19-67(2005).

[28] Milletari F, Navab N, Ahmadi S A. V-net: fully convolutional neuralnetworks for volumetric medical image segmentation. [C]∥2016 Fourth International Conference on 3D Vision (3DV), October 25-28, 2016, Stanford, CA, USA. New York: IEEE, 565-571(2016).

[29] Shrivastava A, Martens J, Dahl G et al. On the importance of initialization and momentum in deep learning. [C]∥Proceedings of the 30th International Conference on Machine Learning, June 16-21, 2013, Atlanta, Georgia, USA. New York: ACM, 28, 1139-1147(2013).

[30] Liu Y L, Jin L W, Zhang S T et al[2020-05-26]. Detecting curve text in the wild:new dataset and new solution [2020-05-26].https:∥arxiv., org/abs/1712, 02170.

[31] Ch'ng C K, Chan C S. Total-text:a comprehensive dataset for scene text detection and recognition. [C]∥14th IAPR International Conference on Document Analysis and Recognition, November 9-15, 2017, Kyoto, Japan. New York: IEEE, 935-942(2017).

[32] Karatzas D, Gomez-Bigorda L, Nicolaou A et al. ICDAR 2015 competition on robust reading. [C]∥2015 13th International Conference on Document Analysis and Recognition (ICDAR), August 23-26, 2015, Tunis, Tunisia. New York: IEEE, 1156-1160(2015).

[33] Yao C, Bai X, Liu W et al. Detecting texts of arbitrary orientations in natural images. [C]∥2012 IEEE Conference on Computer Vision and Pattern Recognition, June16-21, 2012, Providence, RI, USA. New York: IEEE, 1083-1090(2012).

[34] He K M, Zhang X Y, Ren S Q et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. [C]2015∥ IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile. New York: IEEE, 1026-1034(2015).

[35] Simonyan K, Zisserman A[2020-05-26]. Very deep convolution networks for large-scale image recognition [2020-05-26].https:∥arxiv., org/abs/1409, 1556.

[36] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA. New York: IEEE, 7132-7141(2018).

[37] Tian Z, Huang W L, He T et al. Detecting text in natural image with connectionist text proposal network[M]. ∥Leibe B, Matas J, Sebe N, et al. Computer Vision-ECCV 2016. Lecture Notes in Computer Science. Cham: Springer, 9912, 56-72(2016).

[38] Shi B G, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 3482-3490(2017).

[39] Zhou X Y, Yao C, Wen H et al. EAST: an efficient and accurate scene text detector. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 2642-2651(2017).

[40] Long S B, Ruan J Q, Zhang W J et al. TextSnake: a flexible representation for detecting text of arbitrary shapes[M]. ∥Ferrari V, Hebert M, Sminchisescu C, et al. Computer Vision-ECCV 2018. ECCV 2018. Lecture Notes in Computer Science., 11206, 19-35(2018).