Compact planar-waveguide integrated diffractive optical neural network chip

Jianan Feng; Chang Li; Dahai Yang; Yang Liu; Jianyang Hu; Chen Chen; Yiqun Wang; Jie Lin; Lei Wang; Peng Jin

doi:10.1117/1.APN.4.1.016010

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Abstract

Diffractive optical neural networks (DONNs) have exhibited the advantages of parallelization, high speed, and low consumption. However, the existing DONNs based on free-space diffractive optical elements are bulky and unsteady. In this study, we propose a planar-waveguide integrated diffractive neural network chip architecture. The three diffractive layers are engraved on the same side of a quartz wafer. The three-layer chip is designed with 32-mm³ processing space and enables a computing speed of 3.1 × 10⁹ Tera operations per second. The results show that the proposed chip achieves 73.4% experimental accuracy for the Modified National Institute of Standards and Technology database while showing the system’s robustness in a cycle test. The consistency of experiments is 88.6%, and the arithmetic mean standard deviation of the results is ~4.7%. The proposed chip architecture can potentially revolutionize high-resolution optical processing tasks with high robustness.

Keywords

diffractive neural network high robustness optical computing planar waveguide

1 Introduction

Artificial neural networks (ANNs) are rapidly developing and are widely utilized in many fields, such as computer vision,1 natural language processing,2 medical diagnosis,3 and decision-making.4 Although ANNs have notably improved performance at the algorithmic level, these tasks are fundamentally limited by the energy consumption and computing speed of computers.5

Recently, optical neural networks (ONNs) have gained increasing attention owing to their low power consumption, low processing latency, and high computational bandwidth for solving the above problems.6^–14 Various ONN implementations have been proposed, including coherent photonic integrated circuits,15^–21 phase-change materials,22^–24 diffractive optical processors,25^–28 dielectric metasurfaces,29^–31 and optical delay lines.32 Among these ONNs, diffractive optical neural networks (DONNs) have attracted increasing interest because of their large computational scales.33 However, the existing DONNs based on discrete diffractive optical elements (DOEs) are bulky and unsteady.34^–46

In this study, we propose a compact planar-waveguide integrated DONN chip. The three diffractive layers are engraved on the same side of a quartz wafer and enable high-precision alignment. Meanwhile, the optical field avoids noise in the transparent waveguide. A three-layer chip is designed with a $32 - {mm}^{3}$ processing space. The compact architecture enables a computing speed of $3.1 \times 10^{9} Tera$ operations per second (TOPS). The three-layer chip achieves 73.4% experimental accuracy for the Modified National Institute of Standards and Technology (MNIST) database while showing the system’s robustness in a cycle test. The consistency of the experiments is 88.6%, and the arithmetic mean standard deviation of the classification results is 4.7%. Furthermore, the chip can be combined with a complementary metal-oxide-semiconductor to achieve higher integration. This work provides a novel high-density integration solution with high robustness for high-resolution optical processing tasks.

Sign up for Advanced Photonics Nexus TOC. Get the latest issue of Advanced Photonics Nexus delivered right to you！Sign up now

2 Methods

2.1 Oblique Forward Propagation Model

For the mainstream DONNs, the forward propagation model is based on the angular spectrum (AS) theory of diffraction and fast Fourier transform. According to the AS theory, the diffraction field of a beam propagating in free space through a distance $z$ can be expressed as $U_{out} (x, y) = F^{- 1} {F {U_{in} (x, y)} H (ξ, η)},$ (1)where $F {}$ and $F^{- 1} {}$ are Fourier transform and inverse Fourier transform, respectively. $U_{in} (x, y)$ and $U_{out} (x, y)$ are the complex amplitude distributions of the light field on the input and output planes, respectively. $H (ξ, η)$ is a transfer function, which can be expressed as $H (ξ, η) = \exp [j k z \sqrt{1 - λ^{2} (ξ^{2} + η^{2})}],$ (2)where $k = 2 π / λ$ and $λ$ is the wavelength in the air.

For the proposed chip architecture, the diffraction in waveguide space is considered. Reflections are introduced from the $l$ ’th diffractive layer to the $l + 1$ ’th diffractive layer to match the waveguide space. The $l$ ’th diffractive layer with a transmittance function of $t (x_{i}, y_{i})$ is obliquely illuminated with the incident angle $θ$ . $i$ ’th represents the neuron located at $(x_{i}, y_{i})$ of layer $l$ . The center of the input plane is at the origin of the coordinate system, whereas the regions of interest on the output plane are not located in the origin. Consequently, it is necessary to set the observation window after the oblique transmission during the simulations. During one simulation, the diffracted field can be expressed as $U_{out} (x, y) = F^{- 1} {F {t (x_{i}, y_{i}) \exp (j k_{w} x \sin θ)} H (ξ, η)},$ (3)where $k_{w} = 2 π / λ_{w}$ and $λ_{w}$ is the wavelength in the waveguide space. In Eq. (3), a default setting is that the coordinates $(x, y)$ on the output plane are the same as the coordinates $(x_{i}, y_{i})$ corresponding to the input plane. Reflections during transmission will cause the phase change. According to previous research,47^–49 we assume that the phase changes $π$ at the reflecting interface. Therefore, the propagation in the waveguide space can be constructed through multiple diffractions and reflections.

2.2 Chip and Target Fabrication

The input targets are fabricated using laser direct writing on a soda glass substrate. The glass substrate is first cleaned using acetone and isopropyl alcohol. Using electron beam evaporation, the clear substrate is coated with a layer of chromium (Cr) with a thickness of a few hundred nanometers. After spin-coating positive photoresist and a prebake process, the handwritten digit patterns are exposed using laser direct writing. The exposed resist is removed using a developer, and the uncovered Cr is removed with chrome mordant. Any remnant resist is cleaned using acetone and isopropyl alcohol. Finally, 50 amplitude-encoded targets with a total of 10 categories are fabricated. Each category randomly selects five test targets in the MNIST test data set.

The phase value of each neuron is limited to 0 to $2 π$ . The neuron phase value $Δ φ$ is converted into a relative height map $Δ h$ [ $Δ h = λ Δ φ / (2 π Δ n)$ ], where $Δ n$ is the refractive index difference between the substrate and air. The chip layers are fabricated onto a quartz wafer. After spin-coating photoresist and exposure, the exposed resist is stripped using a developer. Then, a magnetic neutral loop discharge etching is applied, which is followed by an oxygen plasma sizing treatment. This process is repeated until the chip layer structures are achieved. A more detailed description of the fabrication process is provided in the Appendix.

3 Results

The schematic of existing free-space DONNs is shown in Fig. 1(a). Discrete DOEs are independently distributed in free space, rendering the entire system bulky and unsteady. Moreover, the beams may be susceptible to the free-space noise. For a comparison, the schematic of the proposed chip is shown in Fig. 1(b). The DOEs are fabricated on the same quartz wafer. A reflective coating is fabricated on the back of the transparent substrate. The beam containing the target information is transmitted in the transparent waveguide space through diffractions and reflections. The output beam is received by a charge-coupled device (CCD) at the detection plane, and its intensity distribution is obtained. To fabricate the chip, all diffractive layers are integrated on the same surface. It enables high-level alignment between cascaded layers. Therefore, the chip design can achieve a compact and stable optical processing architecture.

Figure 1.Schemes of (a) existing DNNs and (b) the proposed chip.

A phase-only three-layer chip is designed for the classification task. To match the fabrication capability, the chip structure parameters are as follows. Each diffractive layer is $2 mm \times 2 mm$ , which contains 250,000 ( $500 \times 500$ ) diffractive neurons. Each neuron unit is $4 μ m$ in size. The horizontal interval of cascade layers is 1 mm. The transparent substrate thickness is 2 mm. The incidence angle of air is 60 deg. Therefore, the waveguide space for the beam propagation is only $32 {mm}^{3}$ . The chip is trained using 55,000 amplitude-encoded handwritten digits. After training, the designed chip classifier tests 10,000 amplitude-encoded handwritten digits. The simulation classification accuracy is 75.4%. Some simulation results are shown in Fig. 2.

Figure 2.Simulation classification for the designed chip. (a) Input digits. (b) Simulation results. (c) Intensity distributions.

Subsequently, a three-layer chip is fabricated. The phase values of the diffractive layers are discretized to simplify fabrication. We construct the experimental optical path. In the experiment, a He–Ne laser (25-STP-912-230, Melles Griot, Rochester, New York, United States) is collimated by lens1 and lens2. The wavelength of the He–Ne laser is 632.8 nm, and the power is 5 mW. A pinhole is used as a filter. The collimated beam illuminates the input plane. The intensity distributions in the output plane are detected by a CCD (DFK 33U×174, Sony, Minato, Tokyo, Japan). The fabricated device without reflective coating is shown in Fig. 3(c).

Figure 3.Schemes of the experimental setup and fabricated chip. (a) Schematic diagram of the experimental setup. (b) Photo of the experimental setup. (c) The fabricated chip. (d) Partial enlarged view of the chip.

We randomly select 50 handwritten digits and fabricate them using laser direct writing. The fabricated handwritten digits are detailed in the Appendix. Some experimental results are shown in Fig. 4. The ability to classify different handwritten digits is assessed. Handwritten “1,” “8,” and “9” are chosen as the input targets, as shown in Fig. 4(a). The experimental output intensity distributions are shown in Fig. 4(b). The output intensity distributions are normalized considering the energy perturbation of the He–Ne laser. Then, we get the intensity ratios for 10 preset regions. As shown in Fig. 4(c), the maximal intensity appears at the preset region corresponding to the input handwritten digit label.

Figure 4.Experimental classification for the designed chip. (a) Input digits. (b) Experimental results. (c) Intensity distributions.

Furthermore, a 10-cycle test is performed to validate the reliability and stability of the three-layer chip. First, the chip is removed from the experimental optical path, whereas the rest of the experimental optical path remains unchanged. Subsequently, the chip is re-installed into the experimental optical path. The same test process is performed with the same test conditions to complete the 10-cycle test. For each handwritten digit, we calculate the arithmetic mean standard deviation of the intensity ratio. The error bars are shown in Fig. 5. For 500 test results, the arithmetic mean standard deviation of intensity ratio is 4.7%. The experimental classification accuracy is 73.4% in Fig. 6(b), and the experimental confusion matrix is shown in Fig. 6(a2). The statistical consistency of the 10-cycle test is 88.6% in Fig. 6(c). This is because there are some smaller errors (including rotational and deviation errors) between the input plane and the chip during the cycle test. The effect of alignment can be found in our previous work.40

Figure 5.Cycle-test intensity results. (a) Intensity distribution of digit “1”. (b) Intensity distribution of digit “8”. (c) Intensity distribution of digit “9.”

Figure 6.Cycle-test consistency results. (a1) Simulation and (a2) experimental confusion matrices. (b) Accuracy of the 10-cycle test. (c) Consistency of the 10-cycle test.

Each layer of the designed three-layer chip contains 250,000 neurons. The cascaded layers are fully connected. The total number of operations is $1.25 \times 10^{11}$ . The distance to complete one above session is $\sim 12 mm$ . The time to complete one above session is $\sim 4 \times 10^{- 11} s$ . Hence, the processing speed is $\sim 3.1 \times 10^{9}$ TOPS. In our previous free-space DONN work,40 the distance to accomplish the same interlayer propagation was 10 cm. The propagation time of the proposed chip is about one-ninth of that of the free space diffractive neural network.

4 Conclusion

In this work, we proposed a compact planar-waveguide integrated diffractive neural network chip. Using micro-electro-mechanical system (MEMS) technology, the designed chip has realized a compact size of $32 {mm}^{3}$ . Moreover, the compact architecture enables a computing speed of $3.1 \times 10^{9}$ TOPS. The experimental accuracy is 73.4% in a 10-cycle test of 50 handwritten digits. The consistency of experiments is 88.6%. The arithmetic mean standard deviation is 4.7% for all 500 experimental normalized intensity distribution ratios. It will achieve an on-chip all-optical information processing unit with high alignment, high density, high reliability, and miniaturization, which provides a novel solution for high-resolution optical processing tasks with high robustness.

5 Appendix

5.1 Tensorflow-based Training

Here, a three-layer chip is constructed using an oblique forward propagation model. Between two adjacent layers, the propagation is completed by one reflection and two diffractions. At the detect plane, we obtain the output intensity distribution. Then, the mean square error is applied as the loss function. We aim to maximize the intensity of interest regions while minimizing the total intensity outside of all regions of interest. The trainable parameters in the chip are the modulation values for each layer, which are optimized using the backpropagation method of adaptive moment estimation (Adam) optimizer with a learning rate of $10^{- 4}$ . To demonstrate the performance of the chip, we use 55,000 images in the MNIST data set for classification training. The designed chip is implemented using TensorFlow framework version 1.12.0 (Google Inc.) and Python (version 3.7.0). Using a desktop computer (GeForce GTX 1660 graphical processing unit, AMD Ryzen 5 3600X CPU @ 3.8 GHz, 32 GB of random access Memory, and Microsoft Windows 10), a three-layer chip is trained for $\sim 20 h$ for 20 epochs of optimization iteration.

5.2 Fabrication of the Designed Three-layer Chip

In this paper, we train the designed three-layer chip to be between 0 and $2 π$ . During the training, the phases are continuously distributed. To facilitate the actual processing, the trained phases are classified into four heights: 0, $π / 2$ , $π$ , and $3 π / 2$ . The designed three-layer chip is processed on a ${SiO}_{2}$ wafer. The etching depths for the ${SiO}_{2}$ corresponding to the phases 0, $π / 2$ , $π$ , and $3 π / 2$ are 1038, 692, 346, and 0 nm, respectively. The fabrication steps are shown in Fig. 7.

Figure 7.Fabrication steps for the three-layer chip.

5.3 Simulation and Experimental Classification Results for 10 Categories of Handwritten Digits

Here, we show the classification results for 10 different categories of handwritten digits. As shown in Figs. 8(a) and 9(a), handwritten digits “0 to 9” are used as input targets. The corresponding simulation and experimental classification results are shown in Figs. 8(b), 9(b), 8(c), and 9(c). In a similar way, we plot the error bars for the 10-cycle test in Figs. 8(d) and 9(d).

Figure 8.Handwritten digit “0 to 4” classification for a three-layer chip. (a) Input digits. (b) Simulation results. (c) Experimental results. (d) Intensity distributions.

Figure 9.Handwritten digit “5 to 9” classification for a three-layer chip. (a) Input digits. (b) Simulation results. (c) Experimental results. (d) Intensity distributions.

5.4 Influence of the Layer Number on Recognition Accuracy

As shown in Fig. 10, the recognition accuracy of the chip is increasing for the increasing number of layers. Furthermore, the recognition accuracy was slightly changed from three to five layers. Therefore, a three-layer chip is analyzed.

Figure 10.Simulation classification accuracy for different numbers of layers for 10,000 test targets in the MNIST test data set.

5.5 Comparison between Different Architectures on TOPS

For optical diffractive neural networks, researchers have proposed many different architectures. We choose architectures that are similar to our work for better benchmarks. The comparison between different works and our work is shown in Table 1.


	Dimension	Size of neuron ( $μ m$ )	Number of neurons in one layer	Propagation distance	Accuracy (%)	TOPS
Ref. 33	2D separation	400	200 × 200	12 cm	91.75	$1.6 \times 10^{7}$
Ref. 40	2D separation	4	1000 × 1000	20 cm	84	$6 \times 10^{9}$
Ref. 27	1D integration	2	186	$500 μ m$	86.7	$1.38 \times 10^{4}$
Our work	2D integration	4	500 × 500	1.2 cm	73.4	$3.1 \times 10^{9}$

Table 1. Comparison between different architectures.

View all Tables

On the one hand, our work can achieve a more compact, integrated architecture compared with separate architectures. The computing power can be the same order of magnitude and can be further increased by expanding the number of neurons of phase layers.

On the other hand, our work has obviously higher computing power compared with the one-dimensional (1D) integrated architecture, and it can directly process the two-dimensional (2D) input objects. Moreover, our work has scalability while retaining stability.

5.6 Total Power Consumption of the Proposed Design

The experimental setup contains a He–Ne laser, a lens, a pinhole, a square aperture, an input plane, a three-layer chip, and a CCD. The He–Ne laser (light source) and CCD (detector) are active devices, and the others are passive devices.

The power of the He–Ne laser is 5 mW. The working current of the CCD is $\sim 720 mA$ at 5 V, and the power of the CCD is $\sim 3.75 W$ . The total power consumption of our scheme is $\sim 3.755 W$ .

5.7 Heights of the Steps of the Fabricated Devices

The corresponding step heights of the diffractive layer are 346 nm. We measured the heights of the fabricated device steps using the confocal laser scanning microscope. The result of a measurement is shown in Fig. 11.

$3D microscope characterization of the step thickness of the proposed diffractive neural networks.$

Figure 11.3D microscope characterization of the step thickness of the proposed diffractive neural networks.

Three-dimensional (3D) microscope characterization of the step thickness of the proposed diffractive neural networks is shown in Fig. 11. The measured step heights are 302, 341, and 320 nm. The measurement error for the multistep photolithography-etching process is $< 30 nm$ . Although this kind of measurement error cannot be avoided, it does show a minor influence on the performance of diffractive networks.

5.8 Experimental Fabricated Targets

The fabricated targets are shown in Fig. 12. All 50 amplitude-encoded targets are randomly selected from the MNIST test dataset.

Figure 12.Amplitude-encoded experimental fabricated targets.

Jianan Feng received his bachelor’s degree in optoelectronic information science and engineering from the Harbin Institute of Technology (Weihai) in 2018. He has been a PhD student in instrument science and technology at the Harbin Institute of Technology since 2019. His research interests include optical diffraction neural networks.

Chang Li received his BS and MS degrees from the Harbin Institute of Technology in 2019 and 2021, respectively. He has been a PhD student in instrument science and technology at the Harbin Institute of Technology since 2021. His research interests include optical metasurfaces and MEMS devices.

Dahai Yang received his BS degree from Lingnan Normal University, his MS degree from the Harbin Institute of Technology (Weihai), and his PhD from the Harbin Institute of Technology in 2015, 2019, and 2024, respectively. He is currently a postdoc at the School of Physical Sciences of Great Bay University. His research interests include optical metasurfaces and OAM.

Yang Liu received his BS degree from the Harbin Institute of Technology (Weihai) in 2018 and his MS degree from the Harbin Institute of Technology in 2020. He has been pursuing a PhD in instrument science and technology at the Harbin Institute of Technology since 2020. His research interests include RF MEMS and millimeter-wave microsystem integration technology.

Jianyang Hu received his BS and MS degrees from the School of Physics and Optoelectronic Engineering at Harbin Engineering University in 2018 and 2021, respectively. He has been a PhD student in instrument science and technology at the Harbin Institute of Technology since 2021. His research interests include micro-nano optics and MEMS devices.

Chen Chen received his BEng and PhD degrees in instruments science and technology from the Harbin Institute of Technology in 2015 and 2022, respectively. He is currently a postdoc at the Nanofabrication Facility of Suzhou Institute of Nano-Tech and Nano-Bionics (Chinese Academy of Sciences). His research interests include optical metasurfaces and diffractive optics.

Yiqun Wang received his BEng and MEng degrees from the Changchun University of Science and Technology in 2005 and 2008, respectively, and his PhD in instruments science and technology from the Harbin Institute of Technology in 2021. He is currently a professor at the nanofabrication facility of Suzhou Institute of Nano-Tech and Nano-Bionics (Chinese Academy of Sciences). His research interests include micro- and nano-fabrication and integrated packages.

Jie Lin received his BS, MS, and PhD degrees in optics from the Harbin Institute of Technology in 2002, 2004, and 2007, respectively. He is currently a professor at the School of Physics, Harbin Institute of Technology. He has authored or coauthored more than 100 publications in international technical journals and conference proceedings. His research interests include the theory and application of micro-nano optics, metasurface, optical sensing, and imaging.

Lei Wang received his BEng, MEng, and PhD degrees in instrument science and technology from the Harbin Institute of Technology in 2000, 2002, and 2005, respectively. He is currently a professor at the School of Instrumentation and Engineering, Harbin Institute of Technology. His current research interests include vibration isolation, ultraprecision motion control, measurement and instrumentation, and sensors and actuators.

Peng Jin received his BS degree in physics from Jilin University in 1994 and his MEng and PhD degrees in instrument science and technology from the Harbin Institute of Technology in 2001. He is currently a professor at the School of Instrumentation and Engineering, Harbin Institute of Technology. His research interests include fabrication and applications of micro-electro-mechanical systems, microwave passive components integration, and advanced processing technology for RF/microwave components.

References

[1] A. Krizhevsky, I. Sutskever, G. Hinto. ImageNet classification with deep convolutional neural networks. Commun. ACM, 60, 84-90(2017).

[2] J. Hirschberg, C. Manning. Advances in natural language processing. Science, 349, 261-266(2015).

[3] G. Litjens et al. A survey on deep learning in medical image analysis. Med. Image Anal., 42, 60-88(2017).

[4] I. Kruglov, O. Mishulina, M. Bakirv. Quantile based decision making rule of the neural networks committee for ill-posed approximation problems. Neurocomputing, 96, 74-82(2012).

[5] M. M. Waldrop. The semiconductor industry will soon abandon its pursuit of Moore’s Law. Now things could get a lot more interesting. Nature, 530, 144-147(2016).

[6] Q. M. Zhang et al. Artificial neural networks enabled by nanophotonics. Light Sci. Appl., 8, 42(2019).

[7] G. Wetzstein et al. Inference in artificial intelligence with deep optics and photonics. Nature, 588, 39-47(2021).

[8] B. J. Shastri et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photonics, 15, 102-114(2021).

[9] D. Perez et al. Multipurpose silicon photonics signal processor core. Nat. Commun., 8, 636(2017).

[10] L. Mennel et al. Ultrafast machine vision with 2D material neural network image sensors. Nature, 579, 62-66(2020).

[11] J. M. Wu et al. Analog optical computing for artificial intelligence. Engineering, 10, 133-145(2022).