A lightweight convolutional neural network for large-scale Chinese image caption

Dexin ZHAO; Ruixue YANG; Shutao GUO

doi:10.1007/s11801-021-0100-z

[1] Li X, Uricchio T, Ballan L, Bertini M, Snoek C.G and Bimbo A.D, ACM Computing Surveys 49, 1 (2016).

[2] Vinyals O, Toshev A, Bengio S and Erhan D, Show and Tell: A Neural Image Caption Generator, IEEE Conference on Computer Vision and Pattern Recognition, 3156 (2015).

[3] Jia X, Gavves E, Fernando B and Tuytelaars T, Guiding the Long-Short Term Memory Model for Image Caption Generation, IEEE International Conference on Computer Vision IEEE Computer Society, 2407 (2015).

[4] Lu J, Yang J, Batra D and Parikh D, Neural Baby Talk, Conference on Computer Vision and Pattern Recognition, 7219 (2018).

[5] Rennie S J, Marcheret E, Mroueh Y, Ross J and Goel V, Self-Critical Sequence Training for Image Captioning, IEEE Conference on Computer Vision and Pattern Recognition, 7008 (2017).

[6] Yang J, Sun Y, Liang J, Ren B and Lai S, Neurocomputing 328, 56 (2019).

[7] Szegedy C, Vanhoucke V, Ioffe S, Shlens J and Wojna Z, Rethinking the Inception Architecture for Computer Vision, IEEE Conference on Computer Vision and Pattern Recognition, 2818 (2016).

[8] Liu Z, Ma L, Wu J and Sun L, Journal of Chinese Information Processing 31, 162 (2017). (in Chinese)

[9] Lan W, Wang X, Yang G and LI X, Chinese Journal of Computers 42, 136 (2019). (in Chinese)

[10] Zhao D, Chang Z and Guo S, Neurocomputing 329, 476 (2019).

[11] Srivastava R, Greff K and Schmidhuber J, Training Very Deep Networks, Advances in Neural Information Processing Systems, 2368 (2015).

[12] Kulkarni G, Premraj V, Dhar S, Li S, Choi Y, Berg A and Berg T, Baby Talk: Understanding and Generating Simple Image Descriptions, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2891 (2014).

[13] Wu J, Zheng H, Zhao B, Li Y, Yan B, Liang R, Wang W, Zhou S, Lin G, Fu Y, Wang Y and Wang Y, Large-Scale Datasets for Going Deeper in Image Understanding, IEEE International Conference on Multimedia and Expo (ICME), 1480 (2019).

[14] He K, Zhang X, Ren S and Sun Y, Deep Residual Learning for Image Recognition, IEEE Conference on Computer Vision and Pattern Recognition, 770 (2014).

[15] Szegedy C, Ioffe S, Vanhoucke V and Alemi A A, Inception- v4, inception-resnet and the impact of residual connections on learning, AAAI Conference on Artificial Intelligence, 4278 (2017).

[16] Papineni K, Roukos S, Ward T and Zhu W, Bleu: A Method for Automatic Evaluation of Machine Translation, 40th Annual Meeting of the Association for Computational Linguistics, 311 (2002).

[17] Banerjee S and Lavie A, METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments, Meeting of the association for computational linguistics, 65 (2005).

[18] Lin C, ROUGE: A Package for Automatic Evaluation of Summaries, Meeting of the Association for Computational Linguistics, 74 (2004).

[19] Vedantam R, Zitnick C L and Parikh D, CIDEr: Consensus- Based Image Description Evaluation, Computer Vision and Pattern Recognition, 4566 (2015).