• Chinese Journal of Lasers
  • Vol. 50, Issue 15, 1507107 (2023)
Yanqi Lu, Minghui Chen*, Kaibo Qin, Yuquan Wu, Zhijie Yin, and Zhengqi Yang
Author Affiliations
  • Shanghai Engineering Research Center of Interventional Medical Device, the Ministry of Education of Medical Optical Engineering Center, School of Health Sciences and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
  • show less
    DOI: 10.3788/CJL230624 Cite this Article Set citation alerts
    Yanqi Lu, Minghui Chen, Kaibo Qin, Yuquan Wu, Zhijie Yin, Zhengqi Yang. Super‐Resolution Reconstruction of OCT Image Based on Pyramid Long‐Range Transformer[J]. Chinese Journal of Lasers, 2023, 50(15): 1507107 Copy Citation Text show less

    Abstract

    Objective

    Optical coherence tomography (OCT) is widely employed for ophthalmic imaging and diagnosis because of its low latency, noncontact nature, noninvasiveness, high resolution, and high sensitivity. However, two major issues have hindered the development of OCT diagnostics for ophthalmology. First, OCT images are inevitably corrupted by scattering noise owing to the low-coherence interferometric imaging modality, which severely degrades the quality of OCT images. Second, low sampling rates are often used to accelerate the acquisition process and reduce the impact of unconscious motion in clinical practice. This practice leads to a reduction in the resolution of OCT images. With the development of deep learning, the use of neural networks to achieve super-resolution reconstruction of OCT images has compensated for the shortcomings of traditional methods and has gradually become mainstream. Most current mainstream super-resolution OCT image reconstruction networks adopt convolutional neural networks, which mainly use local feature extraction to recover low-resolution OCT images. However, traditional models based on convolutional neural networks typically encounter two fundamental problems that originate from the underlying convolutional layers. First, the interaction between the image and convolutional kernel is content-independent, and second, using the same convolutional kernel to recover different image regions may not be the best choice. This often leads to problems, such as excessive image smoothing, missing edge structures, and failure to reliably reconstruct pathological structures. In addition, acquiring real OCT images affects the training effectiveness of previous models. First, deep learning models usually require a large amount of training data to avoid overfitting; however, it is difficult to obtain a large number of real OCT images. Second, even if the results are excellent, it is meaningless to train the model without using images acquired from OCT devices commonly used in today’s clinics. To address the above problems, this study proposes a new OCT image super-resolution model that has the advantages of a convolutional neural network and incorporates a transformer to compensate for its disadvantages, while simultaneously solving the data aspect problem considering recent real clinical images and data enhancement methods during training to increase the generalizability of the model.

    Methods

    In this study, a transformer-based TESR for OCT image super-resolution network was constructed. It constituting three parts: a shallow feature extraction module, a deep feature extraction module, and an image reconstruction module. First, the input image is fused with the extracted edge details using the edge enhancement module, and then shallow feature extraction is performed using a basic 3×3 convolution block. The deep feature extraction module comprises six feature fusion modules, FIB, and a convolution block to extract more abstract semantic information. The FIB module comprises six newly proposed pyramidal long-range transformer layers, PLT, and a convolutional block. The PLT module fuses two mechanisms of local and global information acquisition, where the shifted convolutional extraction module is used to expand the perceptual field and effectively extract local features of the image, and the pyramidal pooling self-attention module is used to strengthen the attentional relationships between different parts of the image and capture feature dependencies over long distances. Finally, image reconstruction was completed using a pixel-blending module.

    Results and Discussions

    We compare our model with four classical super-resolution reconstruction models for 2× and 4× reconstruction, namely, SRGAN, RCAN, IPT, and SwinIR. Quantitative evaluation metrics include the peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and learning perceptual image patch similarity (LPIPS). For qualitative evaluation, we provide 4× reconstructed images sampled from both datasets for comparison. The experimental results show that TESR outperformed the other methods on both datasets. Objectively, the PSNR results of TESR improved by 7.1%, 6.5%, 3.2%, and 1.9%, the SSIM results improved by 5.9%, 5.3%, 3.5%, and 2.2% (Table 1), and the LPIPS results decreased by 0.1, 0.13, 0.06, and 0.01 (Table 2) for the 4× image reconstruction. Similar results are obtained for 2× image reconstruction. Zooming in on the key reconstructed areas, it is clear that the TESR-reconstructed images can better restore the hierarchical information of the retina using the edge enhancement module and image feature extraction (Fig. 9). The retinal edge structure is sharp, the texture details are clear, and there are no obvious noise or artifact problems (Fig. 10). The overall image is clean with high realism and is close to the HR reference image. The experiment verifies the effectiveness and superiority of TESR for super-resolution reconstruction of OCT images.

    Conclusions

    To address the problems that OCT image super-resolution reconstruction algorithms focus too much on local features and ignore the internal knowledge of the overall image, while lacking the extraction of retinal edge details, we proposes a transformer-based edge enhancement OCT image super-resolution network TESR. TESR restores the edge detail information of OCT images with high quality through the new edge enhancement module, while suppressing the noise problem of the images. The PLT module used in deep feature extraction further fuses the local and global information of the image to model the overall internal information of the image over a long range. This approach eliminates the artifact problem that tended to occur in previous algorithms and improves the realism of the reconstructed images. The experiment shows that the TESR model proposed in this study is better than other classical methods in terms of PSNR and SSIM, respectively. It is excellent in terms of LPIPS, and has a significant improvement in subjective visual quality. Additionally, the model has a strong generalization ability. In the future, more effective self-attentive implementations will be explored to reduce the computational complexity of the transformer and improve the convenience of the super-resolution reconstruction technique for clinical practice.

    Yanqi Lu, Minghui Chen, Kaibo Qin, Yuquan Wu, Zhijie Yin, Zhengqi Yang. Super‐Resolution Reconstruction of OCT Image Based on Pyramid Long‐Range Transformer[J]. Chinese Journal of Lasers, 2023, 50(15): 1507107
    Download Citation