• Opto-Electronic Engineering
  • Vol. 49, Issue 4, 210317 (2022)
Rui Sun1、2, Xiaoquan Shan1、2、*, Qijing Sun1、2, Chunjun Han3, and Xudong Zhang1
Author Affiliations
  • 1School of Computer and Information, Hefei University of Technology, Hefei, Anhui 230009, China
  • 2Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei, Anhui 230009, China
  • 3Science and Technology Information Section of Bengbu Public Security Bureau, Bengbu, Anhui 233040, China
  • show less
    DOI: 10.12086/oee.2022.210317 Cite this Article
    Rui Sun, Xiaoquan Shan, Qijing Sun, Chunjun Han, Xudong Zhang. NIR-VIS face image translation method with dual contrastive learning framework[J]. Opto-Electronic Engineering, 2022, 49(4): 210317 Copy Citation Text show less
    Comparison of the VIS image (the first row) generated by some algorithms from NIR domain with the real visible image (the last row)
    Fig. 1. Comparison of the VIS image (the first row) generated by some algorithms from NIR domain with the real visible image (the last row)
    The structure diagram of the proposed method. To simplify the network structure, the identity loss is not indicated in the figure, see Section 2.4.4 for details
    Fig. 2. The structure diagram of the proposed method. To simplify the network structure, the identity loss is not indicated in the figure, see Section 2.4.4 for details
    The structure diagram of generator in the proposed method
    Fig. 3. The structure diagram of generator in the proposed method
    Crop out facial regions and extract edges from face images in NIR and VIS conditions respectively
    Fig. 4. Crop out facial regions and extract edges from face images in NIR and VIS conditions respectively
    The comparison experimental results on two datasets. From left to right: input NIR face image, CycleGAN, CSGAN, CDGAN, UNIT, Pix2pixHD, the proposed method, and real VIS face image. Where rows Ⅰ~Ⅲ are from NIR-VIS Sx1 dataset, and rows Ⅳ~Ⅶ are from NIR-VIS Sx2 dataset
    Fig. 5. The comparison experimental results on two datasets. From left to right: input NIR face image, CycleGAN, CSGAN, CDGAN, UNIT, Pix2pixHD, the proposed method, and real VIS face image. Where rows Ⅰ~Ⅲ are from NIR-VIS Sx1 dataset, and rows Ⅳ~Ⅶ are from NIR-VIS Sx2 dataset
    Results of the ablation experiments on two datasets. From left to right: input NIR face image, Baseline method, the proposed method without StyleGAN2、LGAN、LIDT、LPMC、LFEE respectively, the proposed method and real VIS face image. Where rows Ⅰ~Ⅱ are from NIR-VIS Sx1 dataset and rows Ⅲ~Ⅳ are from NIR-VIS Sx2 dataset
    Fig. 6. Results of the ablation experiments on two datasets. From left to right: input NIR face image, Baseline method, the proposed method without StyleGAN2、LGANLIDTLPMCLFEE respectively, the proposed method and real VIS face image. Where rows Ⅰ~Ⅱ are from NIR-VIS Sx1 dataset and rows Ⅲ~Ⅳ are from NIR-VIS Sx2 dataset
    Comparison of edge images obtained by using each edge extraction method separately. From left to right: real face image, Roberts operator, Prewitt operator, Sobel operator, Laplacian operator, Canny operator
    Fig. 7. Comparison of edge images obtained by using each edge extraction method separately. From left to right: real face image, Roberts operator, Prewitt operator, Sobel operator, Laplacian operator, Canny operator
    The effect of different values ofλFEE on the performance of our method on the NIR-VIS Sx1 dataset
    Fig. 8. The effect of different values of λFEE on the performance of our method on the NIR-VIS Sx1 dataset
    MethodMean SSIMMean PSNR/dB
    CycleGAN0.743329.0987
    CSGAN0.796429.9471
    CDGAN0.763629.4922
    UNIT0.793529.8568
    Pix2pixHD0.802331.6584
    Ours0.809631.0976
    Table 1. Performance comparison of image translation networks on the NIR-VIS Sx1 dataset
    MethodMean SSIMMean PSNR/dB
    CycleGAN0.631728.7974
    CSGAN0.689128.8176
    CDGAN0.528328.1679
    UNIT0.698629.0634
    Pix2pixHD0.789430.5449
    Ours0.813531.2393
    Table 2. Performance comparison of image translation networks on the NIR-VIS Sx2 dataset
    MethodFID (NIR-VIS Sx1)FID (NIR-VIS Sx2)Time/s
    CycleGAN142.2574171.35960.181
    CSGAN70.2146102.67180.344
    CDGAN123.7183212.42990.098
    UNIT74.831595.76380.358
    Pix2pixHD67.1044106.36150.079
    Ours58.528646.93640.337
    Table 3. Comparison of FID performance and average single test time of each image translation network on different datasets
    MethodMean SSIMMean PSNR/dB
    Baseline0.527928.3419
    Ours w/o StyleGAN20.529328.4381
    Ours w/o GAN0.361711.5007
    Ours w/o IDT0.686429.2308
    Ours w/o PMC0.635928.6156
    Ours w/o FEE0.798230.2057
    Ours0.809631.0976
    Table 4. Performance comparison of ablation methods on the NIR-VIS Sx1 dataset
    MethodMean SSIMMean PSNR/dB
    Ours (Prewitt)0.792430.2815
    Ours (Sobel)0.809631.0976
    Table 5. Performance comparison of applying the Prewitt operator and Sobel operator respectively on the NIR-VIS Sx1 dataset
    Rui Sun, Xiaoquan Shan, Qijing Sun, Chunjun Han, Xudong Zhang. NIR-VIS face image translation method with dual contrastive learning framework[J]. Opto-Electronic Engineering, 2022, 49(4): 210317
    Download Citation