High-speed multimode fiber imaging system based on conditional generative adversarial network

Zhenming Yu; Zhenyu Ju; Xinlei Zhang; Ziyi Meng; Feifei Yin; Kun Xu

doi:10.3788/COL202119.081101

Abstract

The multimode fiber (MMF) has great potential to transmit high-resolution images with less invasive methods in endoscopy due to its large number of spatial modes and small core diameter. However, spatial modes crosstalk will inevitably occur in MMFs, which makes the received images become speckles. A conditional generative adversarial network (GAN) composed of a generator and a discriminator was utilized to reconstruct the received speckles. We conduct an MMF imaging experimental system of transmitting over 1 m MMF with a 50 μm core. Compared with the conventional method of U-net, this conditional GAN could reconstruct images with fewer training datasets to achieve the same performance and shows higher feature extraction capability.

Keywords

conditional generative adversarial network deep learning fiber optics imaging imaging systems

1. Introduction

Endoscopes are the devices that acquire images or other information through thin tubular structures, which have gradually developed into popular tools in medical and industrial fields [1] . Optical fiber bundles consisting of many single mode fiber cores are widely applied in endoscopes for image transmission [2] . However, the imaging resolution of the fiber bundle suffers from the coupling of light between adjacent cores when the cores are not sufficiently spaced from the others. With the increasing demand for imaging in medical and other fields, the imaging system with both higher resolution and smaller access diameter is very urgent. Multimode fibers (MMFs) own a large number of transmittable modes for unparalleled information transport and ultra-thin shape, which makes single fiber imaging possible. However, due to the strong mode coupling and interference within the MMFs, the images cannot be obtained directly. When the modulated coherent light passes through the MMF, a variety of transmission modes are excited and coupled with each other, which leads the received images at the distal end to become random speckle patterns. Moreover, the local defects along the MMFs also cause mode coupling and inter-mode interference, and the speckle patterns can be formed even after transmitting over a few millimeters [3, 4] . Therefore, the main challenge of MMF imaging is how to reconstruct the original images from speckle patterns.

Some approaches developed from computational imaging [5, 6] such as phase conjugation, digital scanning, and holographic technique have been used to address the issues in MMF imaging systems [7 - 10] . However, these approaches are almost based on calibration, which are difficult in practical applications due to the low speed. Deep learning showed the potential to speed up the image reconstruction of MMFs and improve the robustness of environment. By training a deep neural network (DNN) with a large amount of captured data, the images transmitted through MMFs can be recovered very fast [\simmilliseconds (ms)] by the DNN, and this process is calibration free [11, 12] . U-net, which is powerful for image segmentation and restoration, has been verified to provide decent results for the image reconstruction from the MMF speckles. However, the conventional training strategy, i.e., using L 2 loss functions, usually requires tens of thousands of image pairs for training, which brings a challenge for the training data acquisition.

Generative adversarial networks (GANs) are utilized to optimize the DNN method. GAN was first proposed to get the natural image distribution from a random vector [13] . However, the GAN model uses an extensive training method during training, where all training samples are fed into the model for training without constraints. Therefore, the GAN model is uncontrollable when it is used for image generation, which leads to the generated image's unpredictability. A framework of conditional GAN was proposed by conditioning on an input image and generating a corresponding output image, which shows high performance in image-to-image translation tasks [14] . In this paper, we explore a new training strategy, which employs the framework of conditional GAN to recover the images transmitted through MMFs.

Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you！Sign up now

2. Operating Principle

L_{cGAN} (G, D) = E_{x, y} [\log D (x, y)] + E_{x, z} {\log {1 - D [x, G (x, z)]}},

Figure 1.Structure of the conditional GAN; (a) architecture of the generator; (b) principle of the discriminator. G, generator; D, discriminator.

G^{*} = \arg \min_{G} \max_{D} L_{cGAN} (G, D) + λ L_{1} (G),

3. Experimental Setup and Results

1280 \times 768

Figure 2.Experiment setup. DMD, digital micromirror device; OBJ, microscope objective lens; MMF, multimode fiber; CCD, charge-coupled device.

28 \times 28

1 \times 1 \times 256

Figure 3.Structures of generator in the (a) MNIST experiment and (b) Fashion-MNIST experiment.

32 \times 32 \times 2

L_{1}

Number of Convolutional Layers	Kernel Size	Stride	Output Resolution	Receptive Field
1	1×1	1	32×32	1×1
1	4×4	2	16×16	4×4
2	4×4	2	8×8	10×10
3	4×4	2	4×4	22×22
4	4×4	2	2×2	46×46
5	4×4	2	1×1	94×94

Table 1. Relationship between the Receptive Field and Convolutional Layer

View all Tables

10^{- 4}

8 \times 8

Figure 4.Structures of discriminator in the (a) MNIST experiment and (b) Fashion-MNIST experiment.

Figure 5.Reconstruction performances with different output resolutions of the discriminator in (a) the MNIST experiment and (b) the Fashion-MNIST experiment.

4 \times 4 \times 256

Figure 6.Loss for training process of U-net and the conditional GAN in (a) the MNIST experiment and (b) the Fashion-MNIST experiment.

Figure 7.Reconstruction results of U-Net and the conditional GAN in (a) the MNIST experiment and (b) the Fashion-MNIST experiment.

The number of training datasets is an important parameter that affects the performance of networks. In order to compare the performance of the two networks with different numbers of training datasets, the conditional GAN and U-net are trained by different numbers of training sets for 100 epochs, respectively. Figure 8 shows the comparison of the reconstructions between our network and U-net in the test set evaluated by PSNR and SSIM. The horizontal axis is the size of different training sets. Generally, increasing the number of training sets in both networks can significantly improve the quality of image reconstruction. With the same number of training sets, the reconstruction quality of the conditional GAN is better than that of U-net, which is more obvious with the small training set and complex images. In other words, the conditional GAN only requires smaller training sets than U-net to achieve the same reconstruction quality. It can also be concluded that this conditional GAN has stronger capabilities for feature extraction than U-net.

Figure 8.PSNR and SSIM at each training set number by U-net and the conditional GAN in (a) the MNIST experiment and (b) the Fashion-MNIST experiment.

4. Conclusion

Considering the structure of the conditional GAN, the generator is essentially a U-net, while the additional discriminator can guide the training of the generator to converge faster. Moreover, the discriminator enables the generator to show better ability in discrimination and constraint for both high- and low-frequency parts of the image to improve the feature extraction ability of the generator. Therefore, smaller datasets can work in reconstruction with the conditional GAN. The experimental results show that this conditional GAN could reconstruct images with fewer training datasets and shows higher feature extraction capability compared with the conventional method of U-net. For both MNIST and Fashion-MNIST, hundreds of training images are reduced by using this conditional GAN in the case of achieving the same accuracy of reconstructed images. In addition, this conditional GAN also has higher feature extraction capability because of the discriminator and the advantage in high-frequency processing. However, the network only performs well for the samples similar to the training data. In the future, it is important to use stronger network or transfer learning methods to improve the generalization ability of models.

References

[1] M. Kyrish, T. S. Tkaczyk. Achromatized endomicroscope objective for optical biopsy. Biomed. Opt. Express, 4, 287(2013).

[2] M. Hughes, T. P. Chang, G.-Z. Yang. Fiber bundle endocytoscopy. Biomed. Opt. Express, 4, 2781(2013).

[3] T. Čižmár, K. Dholakia. Exploiting multimode waveguides for pure fibre-based imaging. Nat. Commun., 3, 1027(2012).

[4] C. Liu, L. Deng, D. Liu, L. Su. Modeling of a single multimode fiber imaging system(2016).

[5] H. Shen, J. Gao. Deep learning virtual colorful lens-free on-chip microscopy. Chin. Opt. Lett., 18, 121705(2020).

[6] X. Wang, H. Liu, M. Chen, Z. Liu, S. Han. Imaging through dynamic scattering media with stitched speckle patterns. Chin. Opt. Lett., 18, 042604(2020).

[7] I. N. Papadopoulos, S. Farahi, C. Moser, D. Psaltis. Focusing and scanning light through a multimode optical fiber using digital phase conjugation. Opt. Express, 20, 10583(2012).

[8] Y. Choi, C. Yoon, M. Kim, T. D. Yang, C. Fang-Yen, R. R. Dasari, K. J. Lee, W. Choi. Scanner-free and wide-field endoscopic imaging by using a single multimode optical fiber. Phys. Rev. Lett., 109, 203901(2012).

[9] D. B. Conkey, E. Kakkava, T. Lanvin, D. Loterie, N. Stasio, E. Morales-Delgado, C. Moser, D. Psaltis. High power, ultrashort pulse control through a multi-core fiber for ablation. Opt. Express, 25, 11491(2017).

[10] R. Di Leonardo, S. Bianchi. Hologram transmission through multi-mode optical fibers. Opt. Express, 19, 247(2011).

[11] B. Rahmani, D. Loterie, G. Konstantinou, D. Psaltis, C. Moser. Multimode optical fiber transmission with a deep learning network. Light: Sci. Appl., 7, 69(2018).

[12] N. Borhani, E. Kakkava, C. Moser, D. Psaltis. Learning to see through multimode fibers. Optica, 5, 960(2018).

[13] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley27th International Conference on Neural Information Processing Systems. Generative adversarial nets, 2672(2014).

[14] P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros. Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1125(2017).

[15] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, O. Winther. Autoencoding beyond pixels using a learned similarity metric. International Conference on Machine Learning, 1558(2016).

[16] V. Dumoulin, F. Visin. A guide to convolution arithmetic for deep learning(2016).

[17] D. P. Kingma, J. Ba. Adam: a method for stochastic optimization(2014).

微信扫一扫：分享

微信扫一扫：分享