Improved Faster R-CNN Target Detection Algorithm Based on Attention Mechanism and Soft-NMS

Fengsui Wang; Qisheng Wang; Jingang Chen; Furong Liu

doi:10.3788/LOP202158.2420001

[1] Pan Q C, Zhang H H. Key algorithms of video target detection and recognition in intelligent transportation systems[J]. International Journal of Pattern Recognition and Artificial Intelligence, 34, 2055016(2020).

[2] Liu Y J, Yang F B, Hu P. Parallel FPN algorithm based on cascade R-CNN for object detection from UAV aerial images[J]. Laser & Optoelectronics Progress, 57, 201505(2020).

[3] Cao Y J, Xu G M, Shi G C. Low altitude armored target detection based on rotation invariant Faster R-CNN[J]. Laser & Optoelectronics Progress, 55, 101501(2018).

[4] Huang G, Liu X L. Automatic road marking extraction and classification method based on deep learning[J]. Chinese Journal of Lasers, 46, 0804002(2019).

[5] Wang K J, Zhao Y D, Xing X L. Deep learning in driverless vehicles[J]. CAAI Transactions on Intelligent Systems, 13, 55-69(2018).

[6] Redmon J, Divvala S, Girshick R et al. You only look once: unified, real-time object detection[C]. //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA, 779-788(2016).

[7] Liu W, Anguelov D, Erhan D et al. SSD: single shot MultiBox detector[M]. //Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science, 9905, 21-37(2016).

[8] Girshick R, Donahue J, Darrell T et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. //2014 IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2014, Columbus, OH, USA., 580-587(2014).

[9] He K M, Zhang X Y, Ren S Q et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[M]. //Fleet D, Pajdla T, Schiele B, et al. Computer vision-ECCV 2014. Lecture notes in computer science, 8691, 346-361(2014).

[10] Girshick R. Fast R-CNN[C]. //2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile., 1440-1448(2015).

[11] Ren S Q, He K M, Girshick R et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149(2017).

[12] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]. //3rd International Conference on Learning Representations(ICLR), May 7-9, 2015, San Diego, CA, USA. [S.l.: s.n.], 1150-1210(2015).

[13] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition[C]. //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA., 770-778(2016).

[14] Everingham M, Eslami S M A, Gool L et al. The pascal visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 111, 98-136(2015).

[15] Neubeck A, Van Gool L. Efficient non-maximum suppression[C]. //18th International Conference on Pattern Recognition (ICPR’06), August 20-24, 2006, Hong Kong, China., 850-855(2006).

[16] Bodla N, Singh B, Chellappa R et al. Soft-NMS: improving object detection with one line of code[C]. //2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy, 5562-5570(2017).

[17] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]. //2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA., 7132-7141(2018).

[18] Woo S, Park J, Lee J Y et al. CBAM: convolutional block attention module[M]. //Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018. Lecture notes in computer science, 11211, 3-19(2018).

[19] Wang Q L, Wu B G, Zhu P F et al. ECA-net: Efficient channel attention for deep convolutional neural networks[C]. //2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 13-19, 2020, Seattle, WA, USA, 11531-11539(2020).

[20] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions[C]. //Proceeding of the 4 ^th International Conference on Learning Representations (ICLR), May 2-4, 2016, San Juan, Puerto Rico. [S.l.: s.n.], 23-24(2016).

[21] Alexey B, Wang C Y, Liao H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23)[2021-01-01]. https://arxiv.org/abs/2004.10934

[22] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition[C]. //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA., 770-778(2016).

[23] Gao S H, Cheng M M, Zhao K et al. Res2Net: a new multi-scale backbone architecture[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 652-662(2021).

[24] Wang T, Yuan L, Zhang X P et al. Distilling object detectors with fine-grained feature imitation[C]. //2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15-20, 2019, Long Beach, CA, USA., 4928-4937(2019).

[25] Mehta S, Hajishirzi H, Rastegari M. DiCENet: dimension-wise convolutions for efficient networks[EB/OL]. (2019-06-08)[2021-01-01]. https://arxiv.org/abs/1906.03516v3

[26] Shlok M, Anshul S, Ankan B et al. Learning visual representations for transfer learning by suppressing texture[EB/OL]. (2020-11-03)[2021-01-01]. https://arxiv.org/abs/2011.01901