Matching Multi-Scale Features and Prediction Tasks for Real-Time Object Detection

Hongjie Du; Hanqing Sun; Jiale Cao; Yanwei Pang

doi:10.3788/LOP202158.1210014

[1] Yang L, Su J, Huang H et al. SAR ship detection based on convolutional neural network with deep multiscale feature fusion[J]. Acta Optica Sinica, 40, 0215002(2020).

[2] Song Y L, Pang Y W. Backbone network for object detection task[J]. Laser & Optoelectronics Progress, 57, 041021(2020).

[3] Ji Z, Kong Q K, Wang J et al. Object detection algorithm guided by dual attention models[J]. Laser & Optoelectronics Progress, 57, 061008(2020).

[4] Zhou B, Li R X, Shang Z H et al. Object detection algorithm based on improved Faster R-CNN[J]. Laser & Optoelectronics Progress, 57, 101009(2020).

[5] Ju M R, Luo J N, Wang Z B et al. Multi-scale target detection algorithm based on attention mechanism[J]. Acta Optica Sinica, 40, 1315002(2020).

[6] Yang Q L, Zhou B H, Zheng W et al. Dim and small target detection based on fully convolutional recursive network[J]. Acta Optica Sinica, 40, 1310002(2020).

[7] Lin T Y, Dollár P, Girshick R et al. Feature pyramid networks for object detection[C]. //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA., 936-944(2017).

[8] Liu W, Anguelov D, Erhan D et al. SSD: single shot MultiBox detector[M]. //Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science, 9905, 21-37(2016).

[9] Redmon J, Divvala S, Girshick R et al. You only look once: unified, real-time object detection[C]. //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA, 779-788(2016).

[10] Oksuz K, Cam B C, Kalkan S et al. Imbalance problems in object detection: a review[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99(2020). http://ieeexplore.ieee.org/document/9042296/

[11] Kong T, Sun F C, Liu H P et al. Consistent optimization for single-shot object detection[EB/OL]. (2019-01-19)[2020-09-24]. https://arxiv.org/abs/1901.06563v2

[12] Cao J L, Pang Y W, Han J G et al. Hierarchical shot detector[C]. //2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 27-November 2, 2019, Seoul, Korea (South)., 9704-9713(2019).

[13] Lin T Y, Goyal P, Girshick R et al. Focal loss for dense object detection[C]. //2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy., 2999-3007(2017).

[14] Zhou X Y, Wang D Q, Krähenbühl P et al. Objects as points[EB/OL]. (2019-04-25)[2020-09-24]. https://arxiv.org/abs/1904.07850

[15] Tian Z, Shen C H, Chen H et al. FCOS: fully convolutional one-stage object detection[C]. //2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 27-November 2, 2019, Seoul, Korea (South), 9626-9635(2019).

[16] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition[C]. //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA., 770-778(2016).

[17] Yu F, Wang D Q, Shelhamer E et al. Deep layer aggregation[C]. //2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA., 2403-2412(2018).

[18] Newell A, Yang K Y, Deng J et al. Stacked hourglass networks for human pose estimation[M]. //Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science, 9912, 483-499(2016).

[19] Law H, Deng J. CornerNet: detecting objects as paired keypoints[J]. International Journal of Computer Vision, 128, 642-656(2020). http://link.springer.com/article/10.1007/s11263-019-01204-1

[20] He K M, Zhang X Y, Ren S Q et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1904-1916(2015). http://www.sciencedirect.com/science/article/pii/S0031320315004252

[21] Chen L C, Papandreou G, Kokkinos I et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848(2018). http://europepmc.org/abstract/MED/28463186

[22] Rothe R, Guillaumin M, Gool L et al. Non-maximum suppression for object detection by passing messages between windows[M]. //Cremers D, Reid I, Saito H, et al. Computer vision-ACCV 2014. Lecture notes in computer science, 9903, 290-306(2015).

[23] Everingham M, Eslami S M A, Gool L et al. The pascal visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 111, 98-136(2015).

[24] Lin T Y, Maire M, Belongie S et al. Microsoft COCO: common objects in context[M]. //Fleet D, Pajdla T, Schiele B, et al. Computer vision-ECCV 2014. Lecture notes in computer science, 8693, 740-755(2014).

[25] Russakovsky O, Deng J, Su H et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 115, 211-252(2015).

[26] Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks[J]. Journal of Machine Learning Research, 9, 249-256(2010).

[27] Kingma D P, Ba J. Adam: a method for stochastic optimization[EB/OL]. (2017-01-30)[2020-09-24]. https://arxiv.org/abs/1412.6980

[28] Tan M X, Pang R M, Le Q V et al. EfficientDet: scalable and efficient object detection[C]. //2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 13-19, 2020, Seattle, WA, USA, 10778-10787(2020).

[29] Redmon J, Farhadi A. YOLOv3: an incremental improvement[EB/OL]. (2018-08-08)[2020-09-24]. https://arxiv.org/abs/1804.02767

[30] Liu Z L, Zheng T, Xu G D et al. Training-time-friendly network for real-time object detection[EB/OL]. (2019-11-24)[2020-09-24]. https://rxiv.org/abs/1909.00700v2