Author Affiliations
1School of Computer and Information, Hefei University of Technology, Hefei, Anhui 230009, China2Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei, Anhui 230009, China3Science and Technology Information Section of Bengbu Public Security Bureau, Bengbu, Anhui 233040, Chinashow less
Fig. 1. Comparison of the VIS image (the first row) generated by some algorithms from NIR domain with the real visible image (the last row)
Fig. 2. The structure diagram of the proposed method. To simplify the network structure, the identity loss is not indicated in the figure, see Section 2.4.4 for details
Fig. 3. The structure diagram of generator in the proposed method
Fig. 4. Crop out facial regions and extract edges from face images in NIR and VIS conditions respectively
Fig. 5. The comparison experimental results on two datasets. From left to right: input NIR face image, CycleGAN, CSGAN, CDGAN, UNIT, Pix2pixHD, the proposed method, and real VIS face image. Where rows Ⅰ~Ⅲ are from NIR-VIS Sx1 dataset, and rows Ⅳ~Ⅶ are from NIR-VIS Sx2 dataset
Fig. 6. Results of the ablation experiments on two datasets. From left to right: input NIR face image, Baseline method, the proposed method without StyleGAN2、、、、 respectively, the proposed method and real VIS face image. Where rows Ⅰ~Ⅱ are from NIR-VIS Sx1 dataset and rows Ⅲ~Ⅳ are from NIR-VIS Sx2 dataset
Fig. 7. Comparison of edge images obtained by using each edge extraction method separately. From left to right: real face image, Roberts operator, Prewitt operator, Sobel operator, Laplacian operator, Canny operator
Fig. 8. The effect of different values of
on the performance of our method on the NIR-VIS Sx1 dataset
Method | Mean SSIM | Mean PSNR/dB | CycleGAN | 0.7433 | 29.0987 | CSGAN | 0.7964 | 29.9471 | CDGAN | 0.7636 | 29.4922 | UNIT | 0.7935 | 29.8568 | Pix2pixHD | 0.8023 | 31.6584 | Ours | 0.8096 | 31.0976 |
|
Table 1. Performance comparison of image translation networks on the NIR-VIS Sx1 dataset
Method | Mean SSIM | Mean PSNR/dB | CycleGAN | 0.6317 | 28.7974 | CSGAN | 0.6891 | 28.8176 | CDGAN | 0.5283 | 28.1679 | UNIT | 0.6986 | 29.0634 | Pix2pixHD | 0.7894 | 30.5449 | Ours | 0.8135 | 31.2393 |
|
Table 2. Performance comparison of image translation networks on the NIR-VIS Sx2 dataset
Method | FID (NIR-VIS Sx1) | FID (NIR-VIS Sx2) | Time/s | CycleGAN | 142.2574 | 171.3596 | 0.181 | CSGAN | 70.2146 | 102.6718 | 0.344 | CDGAN | 123.7183 | 212.4299 | 0.098 | UNIT | 74.8315 | 95.7638 | 0.358 | Pix2pixHD | 67.1044 | 106.3615 | 0.079 | Ours | 58.5286 | 46.9364 | 0.337 |
|
Table 3. Comparison of FID performance and average single test time of each image translation network on different datasets
Method | Mean SSIM | Mean PSNR/dB | Baseline | 0.5279 | 28.3419 | Ours w/o StyleGAN2 | 0.5293 | 28.4381 | Ours w/o GAN | 0.3617 | 11.5007 | Ours w/o IDT | 0.6864 | 29.2308 | Ours w/o PMC | 0.6359 | 28.6156 | Ours w/o FEE | 0.7982 | 30.2057 | Ours | 0.8096 | 31.0976 |
|
Table 4. Performance comparison of ablation methods on the NIR-VIS Sx1 dataset
Method | Mean SSIM | Mean PSNR/dB | Ours (Prewitt) | 0.7924 | 30.2815 | Ours (Sobel) | 0.8096 | 31.0976 |
|
Table 5. Performance comparison of applying the Prewitt operator and Sobel operator respectively on the NIR-VIS Sx1 dataset