Author Affiliations
1Xi'an Key Laboratory of Modern Intelligent Textile Equipment, College of Mechanical and Electrical Engineering, Xi'an Polytechnic University, Xi'an, Shaanxi 710600, China2Quanzhou Institute of Equipment Manufacturing, Haixi Institutes, Chinses Academy of Science, Quanzhou, Fujian 362216, Chinashow less
Fig. 1. Encoder-decoder model with integrated spatial attention mechanism
Fig. 2. Diagram of spatial attention module
Fig. 3. Diagram of encoder-decoder network with integrated spatial attention mechanism
Fig. 4. Experimental data loss curves of spatial attention mechanism in VGG network. (a) VGG(MSCOCO); (b) VGG(Flickr30k)
Fig. 5. Experimental data loss curves of spatial attention mechanism in ResNet network. (a) ResNet-50(MSCOCO); (b) ResNet-50(Flickr30k)
Fig. 6. Comparison of visualization results. (a) Test set; (b) SAT model visualization results; (c) proposed model visualization results
Fig. 7. Comparison of visualization results. (a) Test set; (b) SAT model visualization results; (c) proposed model visualization results
Fig. 8. Comparison of visualization results. (a) Test set; (b) SAT model visualization results; (c) proposed model visualization results
Fig. 9. Comparison of visualization results. (a) Test set; (b) SAT model visualization results; (c) proposed model visualization results
Instruction | Parameter |
---|
Memory | 16.0 GB | CPU | Inter(R)Core(TM)i7-6700 CPU @3.40 GHz | GPU | NVIDIA GeForce GTX 1080 Ti |
|
Table 1. Server configuration used for the experiment
Dataset name | Train | Valid | Test |
---|
Flickr30k | 29783 | 1000 | 1000 | MSCOCO | 82783 | 40504 | 40775 |
|
Table 2. Experimental server configuration
Model(VGG) | MSCOCO | Flickr30k |
---|
BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 |
---|
Deep VS | 62.5 | 45.0 | 32.1 | 23.0 | 57.3 | 36.9 | 24.0 | 16.0 | Log bilinear | 70.8 | 48.9 | 34.4 | 24.3 | 60.0 | 38.0 | 25.4 | 17.1 | SAT | 70.7 | 49.2 | 34.4 | 24.3 | 61.0 | 40.5 | 27.3 | 18.2 | Proposed | 71.9 | 51.9 | 37.2 | 26.2 | 62.2 | 41.5 | 28.2 | 19.0 |
|
Table 3. Experimental comparison of spatial attention mechanism in VGG network
Model(ResNet-50) | MSCOCO | Flickr30k |
---|
BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 |
---|
Deep VS | 62.5 | 45.0 | 32.1 | 23.0 | 57.3 | 36.9 | 24.0 | 16.0 | Google NIC | 66.6 | 46.1 | 32.9 | 24.6 | 66.3 | 42.3 | 27.7 | 18.3 | m-RNN | 67.0 | 49.0 | 35.0 | 25.0 | 60.0 | 41.0 | 28.0 | 19.0 | SAT | 72.7 | 52.8 | 37.9 | 26.7 | 63.4 | 42.6 | 29.2 | 19.7 | Proposed | 73.0 | 53.5 | 39.0 | 27.9 | 64.6 | 43.9 | 30.3 | 20.6 |
|
Table 4. Experimental comparison of spatial attention mechanism in ResNet network