Application of SENet generative adversarial network in image semantics description

Zhongmin LIU; Heng CHEN; Wenjin HU

doi:10.37188/OPE.20233109.1379

Abstract

An SENet-based method for image semantics description of generative adversarial networks is proposed to address the inaccurate description of utterances and inadequate involvement of emotional colors in image semantics descriptions. The method first adds a channel attention mechanism to the feature extraction stage of the generator model so that the network can completely extract features from salient regions of the image and input the extracted image features into the encoder. Second， a sentiment corpus is added to the original text corpus， and a word vector is generated through natural language processing. This word vector is then combined with the encoded image features and input to the decoder， and a sentiment description statement is generated to match the content depicted in the image through continuous adversarial training. The proposed method is compared with existing methods through simulation experiments， and it is found to improve the BLEU metric by approximately 15% compared with the SentiCap method； improvements in other related metrics are also noted. In self-comparison experiments， the method exhibits an improvement of approximately 3% in the CIDEr metric. Thus， the proposed network can better extract image features， resulting in more accurate statements describing images and richer emotional colors.

微信扫一扫：分享

微信扫一扫：分享