Fig. 1. Overall framework of DCA-Net
Fig. 2. Intra-modal feature channel grouping and reorganization module
Fig. 3. Aggregated feature attention mechanism module
Fig. 4. Channel attention module
Fig. 5. Spatial attention module
Fig. 6. Position attention module
Fig. 7. Comparison of visible images and infrared images
Setting | all-search | indoor-search |
---|
Method | r=1 | r =10 | r =20 | mAP /% | r=1 | r =10 | r =20 | mAP /% |
---|
HOG[14] | 2.76 | 18.30 | 31.90 | 4.24 | 3.22 | 24.70 | 44.50 | 7.25 | BDTR[33] | 17.01 | 55.43 | 71.96 | 19.66 | | | | | HSME[23] | 20.68 | 32.74 | 77.95 | 23.12 | | | | | D2RL[22] | 28.90 | 70.60 | 82.40 | 29.20 | | | | | MAC[34] | 33.26 | 79.04 | 90.09 | 36.22 | 36.43 | 62.36 | 71.63 | 37.03 | MSR[35] | 37.35 | 83.40 | 93.34 | 38.11 | 39.64 | 89.29 | 97.66 | 50.88 | AlignGAN[11] | 42.40 | 85.00 | 93.70 | 40.70 | 45.90 | 87.60 | 94.40 | 54.30 | cmGAN[26] | 26.97 | 67.51 | 80.56 | 31.49 | 31.63 | 77.23 | 89.18 | 42.19 | HPILN[36] | 41.36 | 84.78 | 94.31 | 42.95 | 45.77 | 91.82 | 98.46 | 56.52 | LZM[37] | 45.00 | 89.06 | 95.77 | 45.94 | 49.66 | 92.47 | 97.15 | 59.81 | AGW[1] | 47.50 | 84.39 | 92.14 | 47.65 | 54.17 | 91.14 | 95.98 | 62.97 | X-modal[38] | 49.92 | 89.79 | 95.96 | 50.73 | | | | | DDAG[12] | 54.75 | 90.39 | 95.81 | 53.02 | 61.02 | 94.06 | 98.41 | 67.98 | Proposed method | 59.23 | 91.83 | 96.63 | 56.55 | 63.22 | 94.39 | 98.20 | 69.54 |
|
Table 1. Performance comparison of DCA-Net and current state-of-the-art methods on the SYSU-MM01 dataset
Setting | Visible to thermal | Thermal to visible |
---|
Method | r=1 | r =10 | r =20 | mAP /% | r=1 | r =10 | r =20 | mAP /% |
---|
HCML[24] | 24.44 | 47.53 | 56.78 | 20.08 | 21.70 | 45.02 | 55.58 | 22.24 | BDTR[33] | 33.56 | 58.61 | 67.43 | 32.76 | 32.92 | 58.46 | 68.43 | 31.96 | D2RL[22] | 43.40 | 66.10 | 76.30 | 44.10 | | | | | HSME[23] | 50.85 | 73.36 | 81.66 | 47.00 | 50.15 | 72.40 | 81.07 | 46.16 | MAC[39] | 36.43 | 62.36 | 71.63 | 37.03 | 36.20 | 61.68 | 70.99 | 36.63 | MSR[35] | 48.43 | 70.32 | 79.95 | 48.67 | | | | | EDFL[40] | 52.58 | 72.10 | 81.47 | 52.98 | 51.89 | 72.09 | 81.04 | 52.13 | AlignGAN[11] | 57.90 | | | 53.60 | 56.30 | | | 53.40 | LZM[37] | 57.03 | 76.10 | 84.34 | 58.06 | | | | | X-modal[38] | 62.21 | 83.13 | 91.72 | 60.18 | | | | | AGW[1] | 70.05 | 86.21 | 91.55 | 66.37 | 70.49 | 87.12 | 91.84 | 65.90 | DDAG[12] | 69.34 | 86.19 | 91.49 | 63.46 | 68.06 | 85.15 | 90.31 | 61.80 | Proposed method | 78.16 | 91.75 | 94.66 | 71.18 | 77.62 | 91.60 | 94.47 | 70.56 |
|
Table 2. Performance comparison of DCA-Net and current state-of-the-art methods on RegDB dataset
Baseline | CGSA | AFA | ICGR | SYSU-MM01 |
---|
Rank-1 | mAP |
---|
√ | | | | 48.18 | 47.64 | √ | √ | | | 50.75 | 49.73 | √ | √ | √ | | 57.73 | 54.42 | √ | √ | √ | √ | 59.23 | 56.55 |
|
Table 3. Experimental study of ablation on SYSU-MM01 dataset
Baseline | Conv2 | Conv3 | Conv4 | SYSU-MM01 |
---|
Rank-1 | mAP |
---|
√ | | | | 57.73 | 54.42 | √ | √ | | | 58.27 | 54.73 | √ | √ | √ | | 59.19 | 56.19 | √ | √ | √ | √ | 59.23 | 56.55 |
|
Table 4. Experimental results of ICGR inserted different position under SYSU-MM01 dataset
Loss function | SYSU-MM01 | RegDB |
---|
Rank-1 | mAP | Rank-1 | mAP |
---|
| 56.89 | 54.75 | 70.63 | 62.03 | | 57.73 | 54.42 | 72.18 | 66.03 | | 59.23 | 56.55 | 78.16 | 71.18 |
|
Table 5. Effect of different loss functions on model performance
Model | Model memory /MB | Training time /s |
---|
AGW | 273 | 234.33 | DDAG | 362.48 | 299.82 | DCA-Net | 364 | 237.07 |
|
Table 6. Model complexity analysis