[1] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]. //2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), June 20-25, 2005, San Diego, CA, USA., 886-893(2005).
[2] Fang H, Gupta S, Iandola F et al. From captions to visual concepts and back[C]. //2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA., 1473-1482(2015).
[4] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition[C]. //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA., 770-778(2016).
[5] Tao Z Y, Li J, Tang X L. Texture images classification algorithm combining wavelet transform and capsule network[J]. Laser & Optoelectronics Progress, 57, 241002(2020).
[6] Szegedy C, Ioffe S, Vanhoucke V et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]. //Thirty-first AAAI conference on artificial intelligence, February 4-9, 2017, San Francisco, California, USA, 4278-4284(2017).
[7] Huang G, Liu Z, van der Maaten L et al. Densely connected convolutional networks[C]. //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA., 2261-2269(2017).
[8] You Q Z, Jin H L, Wang Z W et al. Image captioning with semantic attention[C]. //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA., 4651-4659(2016).
[10] Huang L, Wang W M, Chen J et al. Attention on attention for image captioning[C]. //2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 27-November 2, 2019, Seoul, Korea (South)., 4633-4642(2019).
[12] Yang X, Tang K H, Zhang H W et al. Auto-encoding scene graphs for image captioning[C]. //2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15-20, 2019, Long Beach, CA, USA., 10677-10686(2019).
[14] Lu J S, Xiong C M, Parikh D et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning[C]. //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA, 3242-3250(2017).
[15] Yang Z C, He X D, Gao J F et al. Stacked attention networks for image question answering[C]. //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA., 21-29(2016).
[17] Zhu Y K, Groth O, Bernstein M et al. Visual7W: grounded question answering in images[C]. //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA, 4995-5004(2016).
[20] Wang Y H. Image caption based on multi-fusion model[J]. Henan Science and Technology, 34-36(2019).
[21] Vinyals O, Toshev A, Bengio S et al. Show and tell: a neural image caption generator[C]. //2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA, 3156-3164(2015).
[23] Li R F, Liang H Y, Feng F X et al. Paragraph image captioning with deep fully convolutional neural networks[J]. Journal of Beijing University of Posts and Telecommunications, 42, 155-161(2019).
[24] Karpathy A, Li F F. Deep visual-semantic alignments for generating image descriptions[C]. //2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA., 3128-3137(2015).
[25] Chen L, Zhang H W, Xiao J et al. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning[C]. //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA, 6298-6306(2017).
[26] Ren Z, Wang X Y, Zhang N et al. Deep reinforcement learning-based image captioning with embedding reward[C]. //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI., 1151-1159(2017).
[31] Lin T Y, Maire M, Belongie S et al. Microsoft COCO: common objects in context[M]. //Fleet D, Pajdla T, Schiele B, et al. Computer vision-ECCV 2014, 8693, 740-755(2014).
[33] Papineni K, Roukos S, Ward T et al. BLEU: a method for automatic evaluation of machine translation[C]. //Proceedings of the 40th Annual Meeting on Association for Computational Linguistics-ACL’02, July 7-12, 2002, Philadelphia, Pennsylvania(2002).