Free-space optical neural network based on thermal atomic nonlinearity

Albert Ryou; James Whitehead; Maksym Zhelyeznyakov; Paul Anderson; Cem Keskin; Michal Bajcsy; Arka Majumdar

doi:10.1364/PRJ.415964

Abstract

As artificial neural networks (ANNs) continue to make strides in wide-ranging and diverse fields of technology, the search for more efficient hardware implementations beyond conventional electronics is gaining traction. In particular, optical implementations potentially offer extraordinary gains in terms of speed and reduced energy consumption due to the intrinsic parallelism of free-space optics. At the same time, a physical nonlinearity—a crucial ingredient of an ANN—is not easy to realize in free-space optics, which restricts the potential of this platform. This problem is further exacerbated by the need to also perform the nonlinear activation in parallel for each data point to preserve the benefit of linear free-space optics. Here, we present a free-space optical ANN with diffraction-based linear weight summation and nonlinear activation enabled by the saturable absorption of thermal atoms. We demonstrate, via both simulation and experiment, image classification of handwritten digits using only a single layer and observed 6% improvement in classification accuracy due to the optical nonlinearity compared to a linear model. Our platform preserves the massive parallelism of free-space optics even with physical nonlinearity, and thus opens the way for novel designs and wider deployment of optical ANNs.

1. INTRODUCTION

A

One of the biggest hurdles for an optical implementation of an ANN, however, is the lack of physical optical nonlinearity. While the parallelism of FSO naturally lends itself to carrying out linear operations, the lack of corresponding parallel nonlinearity without requiring high-powered lasers or active optical components has led to a multitude of non-FSO workaround solutions. Shen et al. demonstrated an electronic-optical hybrid neural network, in which the output of an integrated photonic mesh was outsourced to an external computer for nonlinear processing [4]. Nevertheless, it was shown by Colburn et al. that the benefits of such a design with repeated data conversions between the optical and the electronic domains were severely limited due to large power consumption and latency incurred during signal transduction [5]. Furthermore, an integrated photonics platform foregoes the intrinsic parallelism of 2D FSO. For example, Feldmann et al. demonstrated a fully optical spiking network with on-chip phase-change materials; but scaling up the number of neurons beyond a few waveguides remains technically challenging [6]. While wavelength division multiplexing (WDM) has been theoretically proposed as a promising route to mitigating the challenges for scaling the number of waveguides [7], such methods need to stabilize high-Q ring resonators under thermal fluctuations, leading to excess energy consumption. Moreover, a large number of additional control operations are needed to serialize the 2D image data stream and multiplex those data to encode in wavelengths, all of which will need an excess amount of energy. Another recent promising research direction is to completely avoid nonlinearity and employ multiple diffractive layers for classification [8,9]. While such an approach provided impressive classification results for the MNIST data set for a linear network combined with logistic regression, the lack of nonlinearity poses a serious question about its generalizability to solving more complicated tasks.

Recently, Zuo et al. presented an FSO neural network where the nonlinearity comes from the electromagnetically induced transparency in ultracold atoms [10]. Besides extensive laboratory setup for trapping and cooling atoms, the need to hand off the data from one laser to another prevents the extension of this method to having multiple hidden layers. In a similar vein, the quantum well exciton-polariton-based nonlinear activation requires a cryostat and is difficult to scale [11].

Sign up for Photonics Research TOC. Get the latest issue of Photonics Research delivered right to you！Sign up now

\sim 33 %

2. OPTICAL NEURAL NETWORK ARCHITECTURE

A. Overview

A typical deep neural network consists of multiple layers of neurons. Except for those in the first layer, each neuron receives input signals from neurons in the previous layer. Excluding batch normalization, the neuron takes the sum of the signals multiplied by adjustable weights and performs a nonlinear operation, the output of which subsequently becomes an input signal for one or more neurons in the following layer.

Many variations in the neural network architectures exist, along with different training algorithms for specific applications. For a typical image classification task under supervised learning, the network is presented with a set of training data and corresponding labels. By repeatedly comparing the result of the output against the labels, the network can gradually adjust its weights until finally the weights converge on an optimum solution.

Our optical neural network follows a similar architecture: a 2D, monochromatic wavefront containing the input data propagates sequentially through a series of linear and nonlinear layers before being imaged on a camera. However, as explained earlier, due to a limited number of available SLMs, we only implemented one single layer that combines the input and one layer of neurons. Below, we describe each component and its physical implementation.

B. Input Layer

The input layer is the direct representation of 2D data encoded as spatially varying intensity of light, or an image. In order to convert electronic data into optical images, we use an SLM, which can manipulate the amplitude, phase, or both of an incident laser beam’s wavefront. The use of coherent, monochromatic light is crucial for the reported optical network, since we utilize diffraction and light–matter interaction to perform both linear and nonlinear operations, as will be described next.

C. Linear Layer

In a generic ANN, the role of a linear layer is to perform summation of signals from a previous layer with adjustable weights before passing it off to a nonlinear layer. A direct implementation of matrix–matrix multiplications in FSO is possible but complex and requires many optical elements [3]. Instead, we adopt an alternative approach, in which the linear layer is implemented by first element-wise multiplying an image with a phase mask and then letting the image propagate in free space. The first step is enabled by the SLM, which can directly display the product of an input image with the phase mask. The second step allows the signals of neighboring pixels of the image to mix due to diffraction. Such a diffractive model was demonstrated for several phase masks in the terahertz regime [8]. The amount of mixing depends on the propagation distance, the wavelength of the image, and the spatial frequency spectrum of the image. We note that it is difficult to map such a phase-mask-based approach to a traditional convolutional layer or fully connected layers used in ANNs. However, as the pixels mix with the neighboring pixels, the operation is effectively a convolution operation, the kernel of which depends on the propagation length. However, using a stack of diffractive optics [8], metasurfaces [13], or even more than one SLM, multiple layers can be implemented.

D. Nonlinear Layer

The nonlinear layer is implemented by an evacuated vapor cell filled with rubidium atoms. The phenomenon of saturable absorption is briefly outlined here and further detailed in the Appendix. When a near-resonant photon is incident on an atom, the atom absorbs the photon and reaches an excited state. After a brief time that is inversely proportional to the atomic linewidth, the excited atom emits a photon and returns to the ground state. The emitted photon travels in a random direction and is “lost” from the undisturbed wavefront, which continues to propagate in the original direction. Thus, for a fixed density of atoms, a low-intensity beam passing through the gas becomes attenuated. On the other hand, a high-intensity beam can excite all the available atoms, saturating the medium. The input-output curve of an optical beam of varying intensity thus exhibits a nonlinear shape, similar to the nonlinear activation function type “SmoothReLU” commonly used in machine learning. The key to our nonlinear layer is the fact that the saturation of atoms is a local effect, and thus, different parts of an incident image, which can be viewed as a collection of multiple beams with each beam denoting one pixel, undergo the nonlinear activation independently.

E. Output Layer

The optical signal after the vapor cell is imaged on a CCD camera. The intensity pattern of the captured image becomes a direct representation of the final output of the neural network. For an image classification task with multiple categories, we can predefine certain physical locations on the camera plane to correspond to those categories. These locations then can be read by either a human or a computer to identify the categories.

We note that the absolute squaring operator inherent in taking the intensity is in itself a nonlinear process; however, as it is bound to the final measurement, we take it as part of the output layer and only refer to the independent saturable absorption layer as our nonlinearity.

3. SIMULATION OF A TWO-LAYER OPTICAL NEURAL NETWORK FOR IMAGE CLASSIFICATION

While atomic vapors provide a nonlinear input-output relationship, it is not clear a priori whether such a nonlinear function will be useful for an optical neural network, especially given that there is no energy gain in the system, only loss. To probe the efficacy of the saturable absorption nonlinearity in thermal atoms, we first simulate a two-layer optical neural network: one linear layer (to be implemented by an SLM) and one nonlinear layer (to be implemented by the saturable absorption nonlinearity). We focused on the classic image classification of handwritten digits from the MNIST database. The goal is to define an optical model, train it entirely offline, and implement the trained neural network as closely as possible in experiment. In this section, we describe the training procedure, including the use of the atomic nonlinearity, and discuss the simulation results.

28 \times 28

z_{o}

After propagation, the image undergoes a nonlinear activation. The nonlinearity is a function of the optical intensity, so we take the absolute square of the image field, apply a nonlinear function, and take the square root, all while preserving the phase of the original wavefront. The functional form of the nonlinearity is derived in the Appendix; the nonlinear parameters were determined by a calibration process described in Section 4.B .

Finally, for detection, the intensity of the output of the nonlinear layer is element-wise multiplied by a special detector layer that defines where the light of a given MNIST digit should go. In our simulation, the detector layer consists of ten circles that are equidistant from the center. The result of the matrix multiplication is a list of ten numbers, each of which is the sum of the image intensity values within the circle. The maximum number, indicating the location with the highest intensity, is the final output of the neural network for the given sample image.

Figure 1.Trained optical neural network (ONN). (a) The detector layer determines the location, where the light from the individual digits should be focused. The layout of the layer is a hyperparameter in our training. Here, each label corresponds to one bright circle ( $radius = 100 μm$ ) located 1 mm from the center of the image. The “0” label is on the positive $x$ axis, and the rest of the labels are located sequentially counterclockwise on a circle. (b) Trained phase mask; (c) sample input image; (d) output of the neural network for the sample input shown in (c). For training, the neural network calculates the intensity at each label location and returns the highest-intensity label as its prediction. All images have dimensions of $600 \times 600$ pixels, which correspond to $4.8 \times 4.8 mm$ .

Figure 2.Accuracy versus epoch for the linear model (blue dot) and the nonlinear model (red cross).

4. EXPERIMENTAL RESULTS

A. Setup

Figure 3.Experimental setup. (a) Cartoon layout of the setup. The focal lengths of the lenses are: $L 1, 50 mm$ ; $L 2, 300 mm$ ; $L 3, 150 mm$ ; $L 4, 150 mm$ ; $L 5, 100 mm$ . M indicates a flat mirror. (b) Photograph of the experiment.

B. Nonlinearity

I_{out} = I_{in} \exp [- N_{sat} / (1 + I_{in} / I_{sat})],

For experiment, it is very difficult to exactly implement the simulated model of the neural network directly due to the attenuation by many optical elements as well as the fact that the vapor cell itself has a finite length on the order of many centimeters. The latter presents a serious challenge, since the simulation assumed that the nonlinear effect took place in a single plane, whereas in the experiment, the nonlinearity occurs over a continuous distance such that a propagating image would be a continuously changing attenuation.

Figure 4.Nonlinear function showing the input–output curve for the incident intensity. The $x$ axis is proportional to the input power, or the average pixel valve on the CCD camera without the vapor cell. The $y$ axis is proportional to the output power, or the average pixel value on the CCD camera with the vapor cell in place. Inset, zoom-in plot showing the curve fit.

C. Results

As described before, we trained a new neural network with the nonlinear parameters that were derived directly from the camera, using 10,000 training images and 1000 test images, 100 per digit, which necessitated the adjustment of the input intensity in terms of pixel values rather than milliwatts. The resulting simulation with the experimental parameters yielded a similar-looking phase mask to that of the simulation with the ideal parameters shown in Section 3 . However, the predicted accuracy dropped to 66.4% and 66.6% for the linear and the nonlinear networks, respectively, and thus, there was virtually no difference between the two networks in terms of accuracy. While it is possible in theory to achieve the original simulation regime by calibrating each optical element and reconciling simulation and experiment with more advanced techniques such as split-step nonlinear angular propagation [15], the required experimental effort and computational resources would be too great, and so we decided to proceed with the experiment.

In our experiment, we used as input the same 1000 test images that were modified as outlined in Section 3 . Because our SLM is a phase-only modulator, it cannot directly display an intensity-varying image or a complex field that is the product of the image with a phase mask; hence, we resorted to holography, which allows us to make the complex field in a conjugate plane [16] using only phase control. For detection, we calibrated the CCD camera for image magnification and rotation with separate calibration images. First, we tested the neural network that contained no phase mask. The overall accuracy was 14.7% for the linear network and 14.2% for the nonlinear network. As expected, without the phase mask, there is no significant difference between the two networks, and the accuracy is offset by a small bias near 10% (the baseline accuracy of random prediction).

Next, we repeated the test, this time incorporating the phase mask via the SLM in the neural network. The overall accuracy was measured to be 26.7% for the linear network and 33.0% for the nonlinear network. We attribute the overall reduction in accuracy compared to the simulation results to the imperfect experimental system, including fixing the length between optics, phase error in SLMs, and the finite length of the vapor cell. However, it is surprising that the accuracy is greater with incorporation of the nonlinearity, whereas the simulation shows similar performance with and without the nonlinearity. We attribute this to the robustness of the nonlinear network to the experimental noise. There is a large body of ongoing research in the machine-learning community on the effect of noise in training deep neural networks [17 - 19], and the exact nature of the robustness of our nonlinear optical neural network remains to be investigated. Table 1 summarizes all the accuracy results for both simulation and experiment. Table 1.

Summary of ONN Accuracy in Percentage

	Linear Network	Nonlinear Network
Simulation with ideal parameters	74.2	84.2
Simulation with experimental parameters	66.4	66.6
Experiment without phase mask	14.7	14.2
Experiment with phase mask	26.7	33.0

We note that the simulated and experimentally measured efficiencies are significantly lower than that of the state-of-the-art neural network. However, we have only one layer in our neural network, and we expect the accuracy to increase with a larger number of layers. Currently, our experiment is limited by available resources, i.e., a single SLM, which, while commercially available, is a significant laboratory expenditure. On the other hand, we note that creating multiple layers has several technical challenges of its own, including optical loss in each layer. The reported optical nonlinearity can be tunable by changing the temperature of the atoms, and thus can be tuned for each layer. Moreover, as we are using thermal atoms and we do not rely on cold atoms, the setup is significantly simple. However, optical regeneration techniques will be needed if the depth of the network is too large [20]. Finally, an electronic back end can be used with the optical front end to enhance the classification accuracy. We emphasize that such an electronic back end requires only one-time transduction and does not add to the overall latency, as is needed for repeated signal transduction.

D. Speed and Power Performance

\sim 100 Hz

5. CONCLUSION

We have shown that an atomic vapor cell can perform a local nonlinear activation in two dimensions, and consequently, a fully optical ANN can be implemented for image recognition of handwritten digits. Such a network can handle a large amount of data in parallel. Furthermore, except for the input and the output that are fed and detected by the SLM and the CCD camera, respectively, all data processing occurs in the time light takes to traverse the physical distance of the network. Although the model accuracy of 33% is rather low, our proof-of-concept demonstrates the feasibility of using a simple, off-the-shelf atomic vapor cell as the source of fully parallel optical nonlinearity. Along with another commercially available device, the SLM, the vapor cell solves the enduring challenge of the missing optical nonlinearity that fully exploits the intrinsic massive parallelism of free-space light in two dimensions. Our work is a first step towards creating an all-optical neural network that can handle a massive amount of data and surpass the performance of an electronic neural network.

Acknowledgment

Acknowledgment. A. R. acknowledges support from the IC Fellowship. M. Z. acknowledges support from the NSF Graduate Research Fellowship. P. A. acknowledges support from Canada First Excellence Research Fund.

APPENDIX A: SATURABLE ABSORPTION

Saturable absorption is a general phenomenon that appears in many different physical systems with discrete energy levels with finite lifetimes. Here we consider a simple system of two-level atoms; a detailed derivation can be found in several resources [23].

N

κ = (N_{g} ? N_{e}) σ

I_{out} = I_{in} ? \exp [? N_{total} σ L / (1 + I_{in} / I_{sat})]

References

[1] Y. LeCun, Y. Bengio, G. Hinton. Deep learning. Nature, 521, 436-444(2015).

[2] V. Sze, Y. Chen, J. Emer, A. Suleiman, Z. Zhang. Hardware for machine learning: challenges and opportunities. IEEE Custom Integrated Circuits Conference, 1-8(2017).

[3] J. Goodman. Introduction to Fourier Optics(2005).

[4] Y. Shen, N. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, M. Soljacic. Deep learning with coherent nanophotonic circuits. Nat. Photonics, 11, 441-446(2017).

[5] S. Colburn, Y. Chu, E. Shilzerman, A. Majumdar. Optical frontend for a convolutional neural network. Appl. Opt., 58, 3179-3186(2019).

[6] J. Feldmann, N. Youngblood, C. Wright, H. Bhaskaran, W. Pernice. All-optical spiking neurosynaptic networks with self-learning capabilities. Nature, 569, 208-214(2019).

[7] V. Bangari, B. Marquez, H. Miller, A. Tait, M. Nahmias, T. Lima, H. Peng, P. Prucnal, B. Shastri. Digital electronics and analog photonics for convolutional neural networks (DEAP-CNNs). IEEE J. Sel. Top. Quantum Electron., 26, 7701213(2020).

[8] X. Lin, Y. Rivenson, N. Yardimci, M. Veli, Y. Luo, M. Jarrahi, A. Ozcan. All-optical machine learning using diffractive deep neural networks. Science, 361, 1004-1008(2018).

[9] J. Chang, V. Sitzmann, X. Dun, W. Heidrich, G. Wetzstein. Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification. Sci. Rep., 8, 12324(2018).

[10] Y. Zuo, B. Li, Y. Zhao, Y. Jiang, Y. Chen, P. Chen, G. Jo, J. Liu, S. Du. All-optical neural network with nonlinear activation functions. Optica, 6, 1132-1137(2019).

[11] D. Ballarini, A. Gianfrate, R. Panico, A. Opala, S. Ghosh, L. Dominici, V. Ardizzone, G. Milena, G. Lerario, G. Gigli, T. Liw, M. Matuszewski, D. Sanvitto. Polaritonic neuromorphic computing outperforms linear classifiers. Nano Lett., 20, 3506-3512(2020).

[12] A. Ryou, S. Colburn, A. Majumdar. Image enhancement in a miniature self-imaging degenerate optical cavity. Phys. Rev. A, 101, 013824(2020).

[13] S. Colburn, A. Zhan, A. Majumdar. Varifocal zoom imaging with large area focal length adjustable metalenses. Optica, 5, 825-831(2018).

[14] K. Matsushima, T. Shimobaba. Band-limited angular spectrum method for numerical simulation of free-space propagation in far and near fields. Opt. Express, 17, 19662-19673(2009).

[15] G. Agrawal. Nonlinear Fiber Optics(2013).

[16] J. Davis, D. Cottrell, J. Campos, M. Yzuel, I. Moreno. Encoding amplitude information onto phase-only filters. Appl. Opt., 38, 5004-5013(1999).

[17] H. Noh, J. Mun, B. Han. Regularizing deep neural networks by noise: its interpretation and optimization. Advances in Neural Information Processing Systems, 5109-5118(2017).

[18] N. Nagabushan, N. Satish, S. Raghuram. Effect of injected noise in deep neural networks. IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), 1-5(2016).

[19] B. Poole, J. Sohl-Dickstein, S. Ganguli. Analyzing noise in autoencoders and deep networks(2014).

[20] L. Li, P. Patki, Y. Kwon, V. Stelmakh, B. Campbell, M. Annamalai, T. Lakoba, M. Vasilyev. All-optical regenerator of multi-channel signals. Nat. Commun., 8, 884(2017).

[21] O. Tzang, E. Niv, S. Singh, S. Labouesse, G. Myatt, R. Piestun. Wavefront shaping in complex media with a 350 kHz modulator via a 1D-to-2D transform. Nat. Photonics, 13, 788-793(2019).

[22] G. Gallego, T. Delbruck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. Davison, J. Conradt, K. Daniilidis, D. Scaramuzza. Event-based vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell.(2020).

[23] C. Foot. Atomic Physics(2005).