• Acta Photonica Sinica
  • Vol. 53, Issue 6, 0610003 (2024)
Fan YANG1, Zhishe WANG1,*, Jing SUN1, and Zhaofa YU2
Author Affiliations
  • 1School of Applied Science, Taiyuan University of Science and Technology, Taiyuan 030024, China
  • 2Ordnance NCO Academy, Army Engineering University of PLA, Wuhan 430075, China
  • show less
    DOI: 10.3788/gzxb20245306.0610003 Cite this Article
    Fan YANG, Zhishe WANG, Jing SUN, Zhaofa YU. Infrared and Visible Image Fusion Method via Interactive Self-attention[J]. Acta Photonica Sinica, 2024, 53(6): 0610003 Copy Citation Text show less
    The framework of interactive self-attention fusion method
    Fig. 1. The framework of interactive self-attention fusion method
    The frameworks of Token ViT, Channel ViT and vision transformer
    Fig. 2. The frameworks of Token ViT, Channel ViT and vision transformer
    The subjective comparison results of different fusion models
    Fig. 3. The subjective comparison results of different fusion models
    The subjective comparison of “Nato_camp”, “Kaptrin_1123”and “Street” selected from the TNO dataset
    Fig. 4. The subjective comparison of “Nato_camp”, “Kaptrin_1123”and “Street” selected from the TNO dataset
    The objective comparison results of different methods for the TNO dataset
    Fig. 5. The objective comparison results of different methods for the TNO dataset
    The subjective comparison of “00443”and “03989” selected from the M3FD dataset
    Fig. 6. The subjective comparison of “00443”and “03989” selected from the M3FD dataset
    The objective comparison results of different methods for the M3FD dataset
    Fig. 7. The objective comparison results of different methods for the M3FD dataset
    The subjective comparison of “FILR_08910” and “FILR_06307” selected from the Roadscene dataset
    Fig. 8. The subjective comparison of “FILR_08910” and “FILR_06307” selected from the Roadscene dataset
    The objective comparison results of different methods for the Roadscene dataset
    Fig. 9. The objective comparison results of different methods for the Roadscene dataset
    MethodsAGMIPCFMIpQeQabfMS_SSIMVIF
    w/o Trans4.365 42.929 20.344 80.910 40.361 60.429 20.864 80.399 6
    w/o CNN4.731 72.474 00.240 60.893 60.324 30.444 00.910 30.390 0
    w/o Channel4.916 42.852 80.358 00.911 10.507 80.554 90.924 90.430 9
    w/o Token5.296 42.925 40.372 80.909 90.501 40.575 20.912 60.447 1
    with PE5.221 13.288 40.390 50.908 60.502 20.604 20.904 40.447 5
    WP15.035 73.177 40.346 10.901 50.483 50.518 80.883 60.435 7
    WP25.584 33.095 90.360 40.903 80.425 80.511 10.883 00.446 5
    Ours5.492 13.358 10.393 50.910 50.511 70.609 50.911 90.447 7
    Table 1. The objective comparison results of different fusion models
    MethodsU2FusionRFN-NestFusionGANGANMcCYDTRSwinFusionSwinFuseOurs
    TNO1.5150.2350.5130.7850.2012.3120.2230.210
    M3FD4.6460.8640.9881.2570.7716.2570.9460.833
    Roadscene0.9320.1700.5631.0140.0871.5640.1450.096
    Table 2. The comparison results of computational efficiency for different fusion methods(units:s)
    Fan YANG, Zhishe WANG, Jing SUN, Zhaofa YU. Infrared and Visible Image Fusion Method via Interactive Self-attention[J]. Acta Photonica Sinica, 2024, 53(6): 0610003
    Download Citation