• Acta Photonica Sinica
  • Vol. 52, Issue 11, 1110003 (2023)
Xin FENG, Jieming YANG*, Hongde ZHANG, and Guohang QIU
Author Affiliations
  • School of Mechanical Engineering,Key Laboratory of Manufacturing Equipment Mechanism Design and Controlof Chongqing,Chongqing Technology and Business University,Chongqing 400067,China
  • show less
    DOI: 10.3788/gzxb20235211.1110003 Cite this Article
    Xin FENG, Jieming YANG, Hongde ZHANG, Guohang QIU. Infrared and Visible Image Fusion Based on Dual Channel Residual Dense Network[J]. Acta Photonica Sinica, 2023, 52(11): 1110003 Copy Citation Text show less

    Abstract

    In the infrared and visible image fusion task, the visible image contains a large amount of texture and background information, while the infrared image contains obvious target information. The two complement each other and can effectively and comprehensively represent the visual information of a scene. In order to improve the problem of partial feature loss between infrared and visible fusion image and source image, and fully extract the feature information in infrared and visible image, this paper proposes an improved dual channel deep learning auto-encoder network for infrared and visible image fusion. The encoder is composed of three cascaded dual channel layers, and they are composed of the cascaded residual and dense connection modules. The source image is divided into two paths and input the residual connection network and the dense connection network at the same time. The residual connection network has a good effect in highlighting the target features. And the dense connection is good at preserving the texture details of the source image, so the encoder structure can fully extract the multi-level features of infrared and visible images. In the design of fusion layer, the spatial L1 norm and the channel attention mechanism are respectively used to fuse the cascades of residuals and dense channel features. The spatial L1 norm fusion strategy uses the L1 norm to calculate the value of activity level measurement and lays more emphasis on the fusion of spatial information. The channel attention mechanism obtains the weight graph of each channel through the global pooling operation. The information contained in each channel can be measured by weight so that the channel information can be fused effectively. Finally, the corresponding decoder is designed to reconstruct the fusion feature image, and the decoder processes the dense and residual features differently according to the characteristics of the encoder. The dense feature layers in high dimension are deeper, so more convolutional sampling layers are used to restore the features; the residual feature layers in low dimension are sharer, so the number of convolutional layers is reduced. In this way, the features of different channels and levels are combined to obtain the final fusion result. In the network training stage, the fusion layer is removed, and 5 000 images are randomly selected from the ImageNet data set as the training set for the auto-encoder network. Meanwhile, the sum of pixel loss, gradient loss and structural similarity loss is used as the loss function to guide the optimization of network parameters. In the experimental phase, the network structure and fusion strategy of the ablation experiment. In terms of network structure, the comparison with single residual or dense channel network proves that the two-channel network structure indeed improves the feature extraction ability. In terms of fusion strategy, the comparison with classical fusion strategies such as addition, mean and maximum proves that the dual path fusion strategy can give play to the advantages of the dual channel structure. It can effectively integrate the salient features and detail features of the source image. Finally, the proposed method is compared with the traditional and the latest deep learning algorithms in recent years. The results show that the proposed method can better reflect the target features and background contour information subjectively, and can maintain the information balance between infrared and visible images in most fusion scenes, so as to obtain high-quality fusion images. In the objective indicators CC, PSNR and MI is in the lead, the rest of the indicators are also in the middle level, with excellent comprehensive performance.
    Xin FENG, Jieming YANG, Hongde ZHANG, Guohang QIU. Infrared and Visible Image Fusion Based on Dual Channel Residual Dense Network[J]. Acta Photonica Sinica, 2023, 52(11): 1110003
    Download Citation