Mask R-CNN Object Detection Method Based on Improved Feature Pyramid

Zhijun Ren; Suzhen Lin; Dawei Li; Lifang Wang; Jianhong Zuo

doi:10.3788/LOP56.041502

[1] Lin S Z, Zheng Y, Lu X F et al. Adaptive tracking algorithm for aerial small targets based on multi-domain convolutional neural networks and autoregression model[J]. Acta Optica Sinica, 37, 1215006(2017).

[2] Liu F, Shen T S, Ma X X. Convolutional neural network based multi-band ship target recognition with feature fusion[J]. Acta Optica Sinica, 37, 1015002(2017).

[3] He Z C, Zhao L Z, Chen C. Convolution neural network with multi-resolution feature fusion for facial expression recognition[J]. Laser & Optoelectronics Progress, 55, 071503(2018).

[4] Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 60, 91-110(2004). http://doi.ieeecomputersociety.org/resolve?ref_id=doi:10.1023/B:VISI.0000029664.99615.94&rfr_id=trans/tp/2008/10/ttp2008101683.htm

[5] Dalal N, Triggs B. Histograms of oriented gradients for human detection. [C]∥IEEE Conference on Computer Vision and Pattern Recognition, June 20-25, 2005, San Diego, CA, USA. IEEE: New York, 886-893(2005).

[6] Felzenszwalb P. McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model. [C]∥IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2008, Anchorage, AK, USA. New York: IEEE, 1-8(2008).

[7] Redmon J, Divvala S, Girshick R et al. You only look once: unified, real-time object detection. [C]∥IEEE Conference on Computer Vision and Pattern Recognition, June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 779-788(2016).

[8] Liu W, Anguelov D, Erhan D et al. SSD: single shot multibox detector. [C]∥Leibe B, Matas J, Sebe N, et al. European Conference on Computer Vision, Cham: Springer, 9905, 21-37(2016).

[9] Redmon J. -04-08)[2018-07-31]. https:∥arxiv., org/abs/1804, 02767(2018).

[10] Lin T Y, Goyal P, Girshick R et al. Focal loss for dense object detection. [C]∥IEEE International Conference on Computer Vision, October 22-29, 2017, Venice, Italy. New York: IEEE, 2999-3007(2017).

[11] Girshick R, Donahue J, Darrell T et al. Rich feature hierarchies for accurate object detection and semantic segmentation. [C]∥IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2014, Columbus, OH, USA. New York: IEEE, 580-587(2014).

[12] He K M, Zhang X Y, Ren S Q et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1904-1916(2015). http://www.sciencedirect.com/science/article/pii/S0031320315004252

[13] Girshick R. Fast R-CNN. [C]∥IEEE International Conference on Computer Vision, December 7-13, 2015, Santiago, Chile. New York: IEEE, 1440-1448(2015).

[14] Ren S Q, He K M, Girshick R et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149(2017). http://dl.acm.org/citation.cfm?id=3101780

[15] Feng X Y, Mei W, Hu D S. Aerial target detection based on improved faster R-CNN[J]. Acta Optica Sinica, 38, 0615004(2018).

[16] He K M, Gkioxari G, Dollár P et al. Mask R-CNN. [C]∥2017 IEEE International Conference on Computer Vision, October 22-29, 2017, Venice, Italy. New York: IEEE, 2980-2988(2017).

[17] Lin T Y, Dollár P, Girshick R et al. Feature pyramid networks for object detection. [C]∥IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 936-944(2017).

[18] Lin T Y, Maire M, Belongie S et al. Microsoft coco: common objects in context. [C]∥Fleet D, Pajdla T, Schiele B, et al. European Conference on Computer Vision, Cham: Springer, 8693, 740-755(2014).

[19] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition. [C]∥IEEE Conference on Computer Vision and Pattern Recognition, June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 770-778(2016).

[20] Dai J F, He K M, Sun J. Instance-aware semantic segmentation via multi-task network cascades. [C]∥IEEE Conference on Computer Vision and Pattern Recognition, June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 3150-3158(2016).

[21] Li Y, Qi H Z, Dai J F et al. Fully convolutional instance-aware semantic segmentation. [C]∥IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 4438-4446(2017).