Image Semantic Description Algorithm with Integrated Spatial Attention Mechanism

Lie Guo; Tuanshan Zhang; Weizhen Sun; Jielong Guo

doi:10.3788/LOP202158.1210030

Journals >Laser & Optoelectronics Progress >Volume 58 >Issue 12 >Page 1210030 > Article

Laser & Optoelectronics Progress
Vol. 58, Issue 12, 1210030 (2021)

Image Semantic Description Algorithm with Integrated Spatial Attention Mechanism

Lie Guo^1、*, Tuanshan Zhang¹, Weizhen Sun², and Jielong Guo²

Author Affiliations

¹Xi'an Key Laboratory of Modern Intelligent Textile Equipment, College of Mechanical and Electrical Engineering, Xi'an Polytechnic University, Xi'an, Shaanxi 710600, China

²Quanzhou Institute of Equipment Manufacturing, Haixi Institutes, Chinses Academy of Science, Quanzhou, Fujian 362216, China

show less

DOI: 10.3788/LOP202158.1210030 Cite this Article Set citation alerts

Lie Guo, Tuanshan Zhang, Weizhen Sun, Jielong Guo. Image Semantic Description Algorithm with Integrated Spatial Attention Mechanism[J]. Laser & Optoelectronics Progress, 2021, 58(12): 1210030 Copy Citation Text

show less

Fig. 1. Encoder-decoder model with integrated spatial attention mechanism

Download full size

Fig. 2. Diagram of spatial attention module

Download full size

Fig. 3. Diagram of encoder-decoder network with integrated spatial attention mechanism

Download full size

Fig. 4. Experimental data loss curves of spatial attention mechanism in VGG network. (a) VGG(MSCOCO); (b) VGG(Flickr30k)

Download full size

Fig. 5. Experimental data loss curves of spatial attention mechanism in ResNet network. (a) ResNet-50(MSCOCO); (b) ResNet-50(Flickr30k)

Download full size

Fig. 6. Comparison of visualization results. (a) Test set; (b) SAT model visualization results; (c) proposed model visualization results

Download full size

Fig. 7. Comparison of visualization results. (a) Test set; (b) SAT model visualization results; (c) proposed model visualization results

Download full size

Fig. 8. Comparison of visualization results. (a) Test set; (b) SAT model visualization results; (c) proposed model visualization results

Download full size

Fig. 9. Comparison of visualization results. (a) Test set; (b) SAT model visualization results; (c) proposed model visualization results

Download full size

Instruction	Parameter
Memory	16.0 GB
CPU	Inter(R)Core(TM)i7-6700 CPU @3.40 GHz
GPU	NVIDIA GeForce GTX 1080 Ti

Table 1. Server configuration used for the experiment

Dataset name	Train	Valid	Test
Flickr30k	29783	1000	1000
MSCOCO	82783	40504	40775

Table 2. Experimental server configuration

Model(VGG)	MSCOCO				Flickr30k
Model(VGG)	BLEU-1	BLEU-2	BLEU-3	BLEU-4	BLEU-1	BLEU-2	BLEU-3	BLEU-4
Deep VS	62.5	45.0	32.1	23.0	57.3	36.9	24.0	16.0
Log bilinear	70.8	48.9	34.4	24.3	60.0	38.0	25.4	17.1
SAT	70.7	49.2	34.4	24.3	61.0	40.5	27.3	18.2
Proposed	71.9	51.9	37.2	26.2	62.2	41.5	28.2	19.0

Table 3. Experimental comparison of spatial attention mechanism in VGG network

Model(ResNet-50)	MSCOCO				Flickr30k
Model(ResNet-50)	BLEU-1	BLEU-2	BLEU-3	BLEU-4	BLEU-1	BLEU-2	BLEU-3	BLEU-4
Deep VS	62.5	45.0	32.1	23.0	57.3	36.9	24.0	16.0
Google NIC	66.6	46.1	32.9	24.6	66.3	42.3	27.7	18.3
m-RNN	67.0	49.0	35.0	25.0	60.0	41.0	28.0	19.0
SAT	72.7	52.8	37.9	26.7	63.4	42.6	29.2	19.7
Proposed	73.0	53.5	39.0	27.9	64.6	43.9	30.3	20.6

Table 4. Experimental comparison of spatial attention mechanism in ResNet network

Lie Guo, Tuanshan Zhang, Weizhen Sun, Jielong Guo. Image Semantic Description Algorithm with Integrated Spatial Attention Mechanism[J]. Laser & Optoelectronics Progress, 2021, 58(12): 1210030

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information