• Acta Photonica Sinica
  • Vol. 51, Issue 4, 0410006 (2022)
Tao ZHOU1、2, Yali DONG1、*, Shan LIU1, Huiling LU3, Zongjun MA4, Senbao HOU1, and Shi QIU5
Author Affiliations
  • 1School of Computer Science and Technology,North Minzu University,Yinchuan 750021,China
  • 2The Key Laboratory of Images & Graphics Intelligent Processing of State Ethnic Affairs Commission,North Minzu University,Yinchuan 750021,China
  • 3School of Science,Ningxia Medical University,Yinchuan 750004,China
  • 4Department of Orthopedics,Ningxia Medical University General Hospital,Yinchuan 750004,China
  • 5Xi'an Institute of Optics and Precision Mechanics,Chinese Academy of Sciences,Xi'an 710119,China
  • show less
    DOI: 10.3788/gzxb20225104.0410006 Cite this Article
    Tao ZHOU, Yali DONG, Shan LIU, Huiling LU, Zongjun MA, Senbao HOU, Shi QIU. Cross-modality Multi-encoder Hybrid Attention U-Net for Lung Tumors Images Segmentation[J]. Acta Photonica Sinica, 2022, 51(4): 0410006 Copy Citation Text show less

    Abstract

    The lung lesions segmentation in medical imaging is an important task. However, there are still some challenges. The lesions delineation relies on manual segmentation by experienced clinicians, which is time-consuming and labor-intensive due to the complex anatomical structure of the human body; Lung tumor images have the characteristics of low contrast, different size and shape of the lesions, and variable location of the lesions, and are characterized by unbalanced data distribution. U-Net can segment lesions under a small number of datasets and has been widely used in medical image segmentation of lesions and organs. However, U-Net has the following three problems. First, U-Net uses uniform parameters for each feature map. For lesions of different sizes and complex shapes, the network may have poor spatial perception, which leads to the decline of segmentation performance. Second, U-Net channel dimension doubles with the number of down-sampling, and the feature map of the encoder layer is concatenated to the decoding layer through skip connection. However, in the segmentation task, the importance of different channels to the segmentation task is different. Third, most of the current multi-encoder segmentation networks extract the features of the single-modal target slice and their continuous slices to improve the network segmentation performance, but ignore the ability of different modal medical images to express the characteristics of the lesion. To solve the above problems, this paper proposes the MEAU-Net network to extract complementary features of multi-modals images. First, for the unbalanced data distribution, the Hough transform is used to detect the line of the lung Computed Tomography (CT) image marked by the doctor to obtain the region of interest, and cropped image size from 356 pixel×356 pixel to 50 pixel×50 pixel. Then, for the low contrast of medical image, use exposure fusion image contrast enhancement method improves the contrast between lesion and the background of lung CT image. To extract the features of lesions in multi-modal medical images, this paper proposes a multi-encoder hybrid attention mechanism network MEAU-Net. Positron Emission Tomography (PET) images provide metabolic information of lesions, CT images provide anatomical information of lesions, and Positron Emission Tomography/ Computed Tomography (PET/CT) images combine their advantages and utilize their complementarity and redundancy. MEAU-Net encoder path includes three branches of PET/CT, PET and CT, which are used to extract corresponding modal image features. In the skip connection of the network, hybrid attention mechanism is used, including spatial attention mechanism and channel attention mechanism. The features of PET/CT and CT are used in the spatial attention mechanism to emphasize key areas in the feature map and suppress irrelevant background. The channel attention mechanism extracts the weight value of each channel for the three branches of PET/CT, CT and PET, and then selects the maximum weight value after the three branches sigmoid to multiply the corresponding channel, and assign a higher value to the important channel. The weighting coefficient realizes the selection of important channels. The network inputs the feature map through the hybrid attention mechanism into the corresponding decoder layer, so that the network focuses on the lesion part in medical image, suppresses useless background information, and achieves accurate segmentation of the image lesion. Finally, for the semantic features of different scales of the decoding path, this paper uses a multi-scale feature aggregation block to perform feature mapping on the features of the decoding path, and refine the segmentation of the lesion. We compared our model with 4 classical segmentation model on our dataset, including SegNet, Attention Unet and Wnet. The experiment results show that our model uses multi-modal medical image features to effectively segment lung lesions with complex shapes, and outperforms all other methods in our dataset. The average DSC, Recall, VOE and RVD of MEAU-Net segmentation results are 96.4%, 97.27%, 7.0% and 6.94%, respectively.
    Tao ZHOU, Yali DONG, Shan LIU, Huiling LU, Zongjun MA, Senbao HOU, Shi QIU. Cross-modality Multi-encoder Hybrid Attention U-Net for Lung Tumors Images Segmentation[J]. Acta Photonica Sinica, 2022, 51(4): 0410006
    Download Citation