• Chinese Journal of Lasers
  • Vol. 48, Issue 15, 1507004 (2021)
Kun Yuan and Li Huo*
Author Affiliations
  • Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
  • show less
    DOI: 10.3788/CJL202148.1507004 Cite this Article Set citation alerts
    Kun Yuan, Li Huo. Multiple-Scale Inpainting Convolutional Neural Network for Retinal OCT Image Segmentation[J]. Chinese Journal of Lasers, 2021, 48(15): 1507004 Copy Citation Text show less

    Abstract

    Objective Optical coherence tomography (OCT) has become the de facto gold standard of diagnosis in ophthalmology. In recent years, with the rapid improvement in imaging speed and the wide adoption of OCT angiography (OCTA), a large amount of retinal OCT B-Scan can be generated in one clinical scan. Automatic and effective retinal tissue segmentation is required to realize this trend. Conventional segmentation algorithms based on path searching are time consuming and error prone when dealing with morbid retinas. In such cases, neural-network (NN)-based methods such as U-Net and ReLayNet are promising approaches. These NN-based methods differ in complexity and performance. For the desktop computer in the current mainstream OCT equipment, an NN with moderate parameter sets and high performance is highly desirable. In this study, we demonstrate a novel end-to-end segmentation method for retina images, named multiple-scale inpainting convolutional NN (MsiNet) for retinal layer segmentation. MsiNet is based on human visual characteristics and can be implemented on a desktop computer with high performance. The framework was validated on two retina image datasets with comparisons against U-Net and ReLayNet, which are well established in retinal OCT image segmentation. MsiNet showed better performance than the other two methods in terms of both retinal layer segmentation accuracy and morbid tissue segmentation, with a moderate parameter set size suitable for desktop computers.

    Methods MsiNet is based on semantic segmentation with encoder-decoder architecture and convolutional NNs (CNNs). We regarded retinal layers as different categories and predicted a pixel’s probability to different categories. The human visual system usually detects objects in two steps: first, it tends to obtain semantic (outline, location, etc.) information; second, it is used to focusing on details. Inspired by this fact, we employed a small-scale network as a decoder to refine semantic information and then used inpainting networks to extract spatial structures from high-resolution feature maps for inpainting low-resolution results. Thus, semantic and detailed information in different stages could be refined directly with less redundancy. To fit MsiNet into the limited computation resource, we reduced the number of parameters and floating-point operations using new structures: interlaced residual unit (IRU) and biased fusion unit (BFU). We also adopted a single-stage decoder instead of a traditional decoder and improved the segmentation results stage by stage. Further, we designed a joint weighting method for some special pixels to intensify punishment. Multiple losses were provided in different resolution stages for obtaining different resolution results. Compared with two well-established NN-based methods, the segmentation accuracy on edges was significantly improved.

    Results and Discussions MsiNet was tested on two retinal OCT datasets: the SD-OCT dataset and a dataset provided by a third-party company (TOWARD π Medical Technology). We compared MsiNet with two well-established NN-based methods: U-Net and RaLayNet. First, using the GSE(generalized semantic boundary) weighting method, the accuracy of edge and disease tissue prediction of MsiNet was better than those of U-Net and RaLayNet (Table 1). Second, by comparing the outputs of different stages in Fig. 6, we confirmed that a high-level decoder significantly improves the accuracy of low-level outputs. Third, MsiNet outperformed U-Net and RaLayNet in terms of both retinal layer and morbid tissue segmentation accuracy, with a moderate parameter set size suitable for desktop computers. The results of the three methods are demonstrated in Fig.7 and Table 1.

    Conclusions Based on human visual characteristics, we propose MsiNet for retinal tissue segmentation. MsiNet replaces the traditional decoder that merges different-resolution feature maps with a single-stage decoder in low resolution, and an inpainting network is designed to rectify segmentation errors and add structural information to phased results. An extended GSE mask is applied to the loss function to adjust the weights of edge pixels. Because of the clear semantic information, the parameter set size is significantly reduced. Experiments show that MsiNet outperforms U-Net and ReLayNet in terms of both layer segmentation and morbid tissue segmentation, mainly due to the improvement in edge point classification.

    Kun Yuan, Li Huo. Multiple-Scale Inpainting Convolutional Neural Network for Retinal OCT Image Segmentation[J]. Chinese Journal of Lasers, 2021, 48(15): 1507004
    Download Citation