[1] Girshick R, Donahue J, Darrell T et al. Rich feature hierarchies for accurate object detection and semantic segmentation. [C]∥2014 IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2014, Columbus, OH, USA. New York: IEEE, 580-587(2014).
[2] Girshick R. Fast R-CNN. [C]∥2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile. New York: IEEE, 1440-1448(2015).
[3] Ren S Q, He K M, Girshick R et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149(2017).
[4] Liu W, Anguelov D, Erhan D et al. SSD: single shot MultiBox detector[M]. ∥Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science. Cham: Springer, 9905, 21-37(2016).
[5] Redmon J, Divvala S, Girshick R et al. You only look once: unified, real-time object detection. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 779-788(2016).
[7] Yu F. -04-30)[2019-07-09]. https:∥arxiv.gg363., site/abs/1511, 07122(2016).
[8] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 770-778(2016).
[11] Simonyan K. -04-10)[2019-07-09]. https:∥arxiv.gg363., site/abs/1409, 1556(2015).
[14] Deng J, Dong W, Socher R et al. ImageNet: a large-scale hierarchical image database. [C]∥2009 IEEE Conference on Computer Vision and Pattern Recognition, June 20-25, 2009, Miami, FL, USA. New York: IEEE, 248-255(2009).
[16] Lin T Y, Maire M, Belongie S et al. Microsoft COCO: common objects in context[M]. ∥Fleet D, Pajdla T, Schiele B, et al. Computer vision-ECCV 2014. Lecture notes in computer science. Cham: Springer, 8693, 740-755(2014).
[17] Bahdanau D, Cho K. -05-19)[2019-07-09]. https:∥arxiv.gg363., site/abs/1409, 0473(2016).
[18] Xu K, Ba J, Kiros R et al. Show,, 2048-2057(2015).
[19] Jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks. [C]∥Advances in Neural Information Processing Systems, December 7-12, 2015, Montreal, Quebec, Canada. Canada: NIPS, 2017-2025(2015).
[20] Chen L C, Yang Y, Wang J et al. Attention to scale: scale-aware semantic image segmentation. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 3640-3649(2016).
[21] Lin T Y, Dollár P, Girshick R et al. Feature pyramid networks for object detection. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 936-944(2017).
[22] Zhu Y S, Zhao C Y, Wang J Q et al. CoupleNet: coupling global structure with local parts for object detection. [C]∥2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy. New York: IEEE, 4146-4154(2017).
[23] Liu Y, Wang R P, Shan S G et al. Structure inference net: object detection using scene-level context and instance-level relationships. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA. New York: IEEE, 6985-6994(2018).
[24] Redmon J, Farhadi A. YOLO9000: better, faster, stronger. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 6517-6525(2017).
[25] Redmon J. -04-08)[2019-07-09]. https:∥arxiv.gg363., site/abs/1804, 02767(2018).
[26] Fu C Y, Liu W, Ranga A et al. -01-23)[2019-07-09]. https:∥arxiv.gg363., site/abs/1701, 06659(2017).
[27] Dai J, Li Y, He K et al. R-FCN: object detection via region-based fully convolutional networks. [C]∥Advances in Neural Information Processing Systems, December 4-9, 2017, Long Beach, CA, USA. Canada: NIPS, 379-387(2016).
[28] Huang J, Rathod V, Sun C et al. Speed/accuracy trade-offs for modern convolutional object detectors. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 3296-3305(2017).
[29] Tychsen-Smith L, Petersson L. DeNet: scalable real-time object detection with directed sparse sampling. [C]∥2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy. New York: IEEE, 428-436(2017).
[30] Wang H, Wang Q L, Gao M Q et al. Multi-scale location-aware kernel representation for object detection. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA. New York: IEEE, 1248-1257(2018).
[31] Kong T, Sun F C, Yao A B et al. RON: reverse connection with objectness prior networks for object detection. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 5244-5252(2017).
[32] Hu H, Gu J, Zhang Z et al. Relation networks for object detection. [C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 18-22, 2018, Salt Lake City, Utah. New York: IEEE, 3588-3597(2018).