As artificial neural networks (ANNs) continue to make strides in wide-ranging and diverse fields of technology, the search for more efficient hardware implementations beyond conventional electronics is gaining traction. In particular, optical implementations potentially offer extraordinary gains in terms of speed and reduced energy consumption due to the intrinsic parallelism of free-space optics. At the same time, a physical nonlinearity—a crucial ingredient of an ANN—is not easy to realize in free-space optics, which restricts the potential of this platform. This problem is further exacerbated by the need to also perform the nonlinear activation in parallel for each data point to preserve the benefit of linear free-space optics. Here, we present a free-space optical ANN with diffraction-based linear weight summation and nonlinear activation enabled by the saturable absorption of thermal atoms. We demonstrate, via both simulation and experiment, image classification of handwritten digits using only a single layer and observed 6% improvement in classification accuracy due to the optical nonlinearity compared to a linear model. Our platform preserves the massive parallelism of free-space optics even with physical nonlinearity, and thus opens the way for novel designs and wider deployment of optical ANNs.
- Photonics Research
- Vol. 9, Issue 4, B128 (2021)
Abstract
1. INTRODUCTION
Artificial neural networks (ANNs) have recently proven phenomenally successful in tasks such as image, sound, and language recognition and translation [1]. The increasing deployment of ANNs, from facial recognition on smartphones to self-driving cars, has brought new attention to improving their hardware implementation in terms of speed, energy consumption, and latency [2]. In contrast to conventional electronics-based platforms, optical implementations stand out due to light’s intrinsically massive parallelism. For instance, the ability of a simple lens to carry out a two-dimensional (2D) Fourier transform with zero energy has long been utilized in optical signal processing [3]. Especially, free-space optics (FSO) with an aperture area and wavelength can potentially provide an extremely large number of information channels , thanks to the availability of two spatial dimensions.
One of the biggest hurdles for an optical implementation of an ANN, however, is the lack of physical optical nonlinearity. While the parallelism of FSO naturally lends itself to carrying out linear operations, the lack of corresponding parallel nonlinearity without requiring high-powered lasers or active optical components has led to a multitude of non-FSO workaround solutions. Shen
Recently, Zuo
Sign up for Photonics Research TOC. Get the latest issue of Photonics Research delivered right to you!Sign up now
In this paper, we propose and demonstrate a fully optical ANN that utilizes the optical nonlinearity from thermal atomic vapor. Specifically, we exploit the saturable absorption behavior of room-temperature rubidium atoms housed in a vapor cell. We observed the nonlinearity in a single pass without any cavity, which allows point-by-point nonlinear activation of an incident image [12]. For the linear operations, we employ the diffractive model, where phase masks directly set the trainable weights of the neural network [8]. We emphasize that both the linear and the nonlinear components of our neural network operate on a “pixel-by-pixel” basis, within the diffraction-induced limit set by the propagation length, thus preserving and fully exploiting the intrinsic massive parallelism of FSO. Via numerical simulations, we observed an increase in classification accuracy in a single linear layer ANN by 10% due to the atomic nonlinearity. Following the training of our optical neural network in simulation using experimentally relevant parameters, we experimentally demonstrate an image recognition task of handwritten digits using a spatial light modulator (SLM). We observed an increase in classification accuracy by 6% in experiment with the addition of the nonlinear layer. We attribute the moderate classification accuracy () of our experimental system to using only a single diffractive linear operation, currently limited by the number of SLMs in our setup. Our work, combining machine learning with optics and atomic physics, opens a new front in the ongoing effort to advance optical ANN theory and hardware.
2. OPTICAL NEURAL NETWORK ARCHITECTURE
A. Overview
A typical deep neural network consists of multiple layers of neurons. Except for those in the first layer, each neuron receives input signals from neurons in the previous layer. Excluding batch normalization, the neuron takes the sum of the signals multiplied by adjustable weights and performs a nonlinear operation, the output of which subsequently becomes an input signal for one or more neurons in the following layer.
Many variations in the neural network architectures exist, along with different training algorithms for specific applications. For a typical image classification task under supervised learning, the network is presented with a set of training data and corresponding labels. By repeatedly comparing the result of the output against the labels, the network can gradually adjust its weights until finally the weights converge on an optimum solution.
Our optical neural network follows a similar architecture: a 2D, monochromatic wavefront containing the input data propagates sequentially through a series of linear and nonlinear layers before being imaged on a camera. However, as explained earlier, due to a limited number of available SLMs, we only implemented one single layer that combines the input and one layer of neurons. Below, we describe each component and its physical implementation.
B. Input Layer
The input layer is the direct representation of 2D data encoded as spatially varying intensity of light, or an image. In order to convert electronic data into optical images, we use an SLM, which can manipulate the amplitude, phase, or both of an incident laser beam’s wavefront. The use of coherent, monochromatic light is crucial for the reported optical network, since we utilize diffraction and light–matter interaction to perform both linear and nonlinear operations, as will be described next.
C. Linear Layer
In a generic ANN, the role of a linear layer is to perform summation of signals from a previous layer with adjustable weights before passing it off to a nonlinear layer. A direct implementation of matrix–matrix multiplications in FSO is possible but complex and requires many optical elements [3]. Instead, we adopt an alternative approach, in which the linear layer is implemented by first element-wise multiplying an image with a phase mask and then letting the image propagate in free space. The first step is enabled by the SLM, which can directly display the product of an input image with the phase mask. The second step allows the signals of neighboring pixels of the image to mix due to diffraction. Such a diffractive model was demonstrated for several phase masks in the terahertz regime [8]. The amount of mixing depends on the propagation distance, the wavelength of the image, and the spatial frequency spectrum of the image. We note that it is difficult to map such a phase-mask-based approach to a traditional convolutional layer or fully connected layers used in ANNs. However, as the pixels mix with the neighboring pixels, the operation is effectively a convolution operation, the kernel of which depends on the propagation length. However, using a stack of diffractive optics [8], metasurfaces [13], or even more than one SLM, multiple layers can be implemented.
D. Nonlinear Layer
The nonlinear layer is implemented by an evacuated vapor cell filled with rubidium atoms. The phenomenon of saturable absorption is briefly outlined here and further detailed in the Appendix. When a near-resonant photon is incident on an atom, the atom absorbs the photon and reaches an excited state. After a brief time that is inversely proportional to the atomic linewidth, the excited atom emits a photon and returns to the ground state. The emitted photon travels in a random direction and is “lost” from the undisturbed wavefront, which continues to propagate in the original direction. Thus, for a fixed density of atoms, a low-intensity beam passing through the gas becomes attenuated. On the other hand, a high-intensity beam can excite all the available atoms, saturating the medium. The input–output curve of an optical beam of varying intensity thus exhibits a nonlinear shape, similar to the nonlinear activation function type “SmoothReLU” commonly used in machine learning. The key to our nonlinear layer is the fact that the saturation of atoms is a local effect, and thus, different parts of an incident image, which can be viewed as a collection of multiple beams with each beam denoting one pixel, undergo the nonlinear activation independently.
E. Output Layer
The optical signal after the vapor cell is imaged on a CCD camera. The intensity pattern of the captured image becomes a direct representation of the final output of the neural network. For an image classification task with multiple categories, we can predefine certain physical locations on the camera plane to correspond to those categories. These locations then can be read by either a human or a computer to identify the categories.
We note that the absolute squaring operator inherent in taking the intensity is in itself a nonlinear process; however, as it is bound to the final measurement, we take it as part of the output layer and only refer to the independent saturable absorption layer as our nonlinearity.
3. SIMULATION OF A TWO-LAYER OPTICAL NEURAL NETWORK FOR IMAGE CLASSIFICATION
While atomic vapors provide a nonlinear input–output relationship, it is not clear
The raw input data are 8-bit, pixels images of handwritten digits. Before feeding them to the model, we make the following modifications. First, in order that the image remains reasonably collimated during tens-of-centimeters-long free-space propagation in the experiment, we rescale the dimensions from the original pixels to pixels. Second, we further embed the pixels image within a larger pixels image, with the area outside the image having zero value. This larger dimension allows us to directly employ the angular spectrum method without applying any band limit [14]. Finally, all the values of the image-pixels are normalized so that the maximum pixel value is 1. The size of the pixel is set to 8 μm to match the physical pitch of our SLM. The first operation on the modified input is element-wise multiplication by a phase mask, which consists of an array of complex numbers whose magnitude is unity and whose phase is a trainable variable.
After the phase mask, the image is propagated along the optical axis by a distance , which is a hyperparameter for our neural network, via the angular propagation method. The angular propagation method consists of decomposing a given wavefront into plane waves traveling in different directions, applying a -dependent transfer function to each plane wave, and finally reconstructing the new wave. Computationally, the process involves a pair of forward and inverse fast Fourier transforms along with a Hadamard product with a matrix in between the pair [12].
After propagation, the image undergoes a nonlinear activation. The nonlinearity is a function of the optical intensity, so we take the absolute square of the image field, apply a nonlinear function, and take the square root, all while preserving the phase of the original wavefront. The functional form of the nonlinearity is derived in the Appendix; the nonlinear parameters were determined by a calibration process described in Section 4.B.
Finally, for detection, the intensity of the output of the nonlinear layer is element-wise multiplied by a special detector layer that defines where the light of a given MNIST digit should go. In our simulation, the detector layer consists of ten circles that are equidistant from the center. The result of the matrix multiplication is a list of ten numbers, each of which is the sum of the image intensity values within the circle. The maximum number, indicating the location with the highest intensity, is the final output of the neural network for the given sample image.
Figure 1.Trained optical neural network (ONN). (a) The detector layer determines the location, where the light from the individual digits should be focused. The layout of the layer is a hyperparameter in our training. Here, each label corresponds to one bright circle (
Figure 2.Accuracy versus epoch for the linear model (blue dot) and the nonlinear model (red cross).
4. EXPERIMENTAL RESULTS
A. Setup
Figure 3.Experimental setup. (a) Cartoon layout of the setup. The focal lengths of the lenses are:
B. Nonlinearity
Here we describe the calibration process used to derive the nonlinear parameters for both simulation and experiment. The nonlinear input–output curve (see Appendix A) can be given by
For experiment, it is very difficult to exactly implement the simulated model of the neural network directly due to the attenuation by many optical elements as well as the fact that the vapor cell itself has a finite length on the order of many centimeters. The latter presents a serious challenge, since the simulation assumed that the nonlinear effect took place in a single plane, whereas in the experiment, the nonlinearity occurs over a continuous distance such that a propagating image would be a continuously changing attenuation.
Figure 4.Nonlinear function showing the input–output curve for the incident intensity. The
C. Results
As described before, we trained a new neural network with the nonlinear parameters that were derived directly from the camera, using 10,000 training images and 1000 test images, 100 per digit, which necessitated the adjustment of the input intensity in terms of pixel values rather than milliwatts. The resulting simulation with the experimental parameters yielded a similar-looking phase mask to that of the simulation with the ideal parameters shown in Section 3. However, the predicted accuracy dropped to 66.4% and 66.6% for the linear and the nonlinear networks, respectively, and thus, there was virtually no difference between the two networks in terms of accuracy. While it is possible in theory to achieve the original simulation regime by calibrating each optical element and reconciling simulation and experiment with more advanced techniques such as split-step nonlinear angular propagation [15], the required experimental effort and computational resources would be too great, and so we decided to proceed with the experiment.
In our experiment, we used as input the same 1000 test images that were modified as outlined in Section 3. Because our SLM is a phase-only modulator, it cannot directly display an intensity-varying image or a complex field that is the product of the image with a phase mask; hence, we resorted to holography, which allows us to make the complex field in a conjugate plane [16] using only phase control. For detection, we calibrated the CCD camera for image magnification and rotation with separate calibration images. First, we tested the neural network that contained no phase mask. The overall accuracy was 14.7% for the linear network and 14.2% for the nonlinear network. As expected, without the phase mask, there is no significant difference between the two networks, and the accuracy is offset by a small bias near 10% (the baseline accuracy of random prediction).
Next, we repeated the test, this time incorporating the phase mask via the SLM in the neural network. The overall accuracy was measured to be 26.7% for the linear network and 33.0% for the nonlinear network. We attribute the overall reduction in accuracy compared to the simulation results to the imperfect experimental system, including fixing the length between optics, phase error in SLMs, and the finite length of the vapor cell. However, it is surprising that the accuracy is greater with incorporation of the nonlinearity, whereas the simulation shows similar performance with and without the nonlinearity. We attribute this to the robustness of the nonlinear network to the experimental noise. There is a large body of ongoing research in the machine-learning community on the effect of noise in training deep neural networks [17–19], and the exact nature of the robustness of our nonlinear optical neural network remains to be investigated. Table 1 summarizes all the accuracy results for both simulation and experiment. Summary of ONN Accuracy in PercentageLinear Network Nonlinear Network Simulation with ideal parameters 74.2 84.2 Simulation with experimental parameters 66.4 66.6 Experiment without phase mask 14.7 14.2 Experiment with phase mask 26.7 33.0
We note that the simulated and experimentally measured efficiencies are significantly lower than that of the state-of-the-art neural network. However, we have only one layer in our neural network, and we expect the accuracy to increase with a larger number of layers. Currently, our experiment is limited by available resources, i.e., a single SLM, which, while commercially available, is a significant laboratory expenditure. On the other hand, we note that creating multiple layers has several technical challenges of its own, including optical loss in each layer. The reported optical nonlinearity can be tunable by changing the temperature of the atoms, and thus can be tuned for each layer. Moreover, as we are using thermal atoms and we do not rely on cold atoms, the setup is significantly simple. However, optical regeneration techniques will be needed if the depth of the network is too large [20]. Finally, an electronic back end can be used with the optical front end to enhance the classification accuracy. We emphasize that such an electronic back end requires only one-time transduction and does not add to the overall latency, as is needed for repeated signal transduction.
D. Speed and Power Performance
Our reported optical ANN uses a commercial liquid crystal-based SLM with 1 million pixels, each pixel with 8-bit precision. The refresh rate of the SLM is , making the effective supported bit rate in the optical ANN as 800 Mbps. However, using a grating light valve type of a mechanical SLM, we can increase the data rate to [21]. At that speed, however, we need to ensure a faster detector, e.g., an event-based camera to accommodate -level detector response time [22]. Power consumption of the reported optical ANN primarily comes from the SLM, which is on the order of . However, as we implement only the inference, we can use a fixed diffractive phase mask, reducing that energy to zero. For using thermal atom-based nonlinearity, we do not spend any extra energy on either active preparation or maintenance of the nonlinearity. Additionally, the reported optical ANN exploits the full potential of the parallelism offered by FSO, and thus does not require any excess energy needed for time/wavelength multiplexing. To actuate the nonlinearity, we need a light intensity of , and the required optical power will depend on the SLM pixel size and the optics used to guide the light through the nonlinear thermal atomic vapor. We estimate the average pixel size inside the atomic vapor to be , making the total required optical power for a million pixels . By reducing the channel area to a diffraction-limited spot (), this power can be reduced to .
5. CONCLUSION
We have shown that an atomic vapor cell can perform a local nonlinear activation in two dimensions, and consequently, a fully optical ANN can be implemented for image recognition of handwritten digits. Such a network can handle a large amount of data in parallel. Furthermore, except for the input and the output that are fed and detected by the SLM and the CCD camera, respectively, all data processing occurs in the time light takes to traverse the physical distance of the network. Although the model accuracy of 33% is rather low, our proof-of-concept demonstrates the feasibility of using a simple, off-the-shelf atomic vapor cell as the source of fully parallel optical nonlinearity. Along with another commercially available device, the SLM, the vapor cell solves the enduring challenge of the missing optical nonlinearity that fully exploits the intrinsic massive parallelism of free-space light in two dimensions. Our work is a first step towards creating an all-optical neural network that can handle a massive amount of data and surpass the performance of an electronic neural network.
Acknowledgment
Acknowledgment. A. R. acknowledges support from the IC Fellowship. M. Z. acknowledges support from the NSF Graduate Research Fellowship. P. A. acknowledges support from Canada First Excellence Research Fund.
APPENDIX A: SATURABLE ABSORPTION
Saturable absorption is a general phenomenon that appears in many different physical systems with discrete energy levels with finite lifetimes. Here we consider a simple system of two-level atoms; a detailed derivation can be found in several resources [
Consider a beam of photons passing through a medium with atoms per unit volume. If the thickness of the medium is , then the number of atoms per unit area is given by . If we now assign an absorption cross section to each atom, then is the fraction of the target area covered by the atoms. It is also the probability that an incident photon will be absorbed by the atoms, or, in the case of many photons, the total fraction of photons that are absorbed. The change in the beam intensity is then , which, upon integration, yields Beer’s law: , where the absorption coefficient .
If we assume that the atoms have two levels, the ground state and the excited state, then the absorption is given by . Imposing the conservation of atom number and the conservation of energy [] where the spontaneous decay rate , we arrive at the steady-state population difference: , where we have defined the saturation intensity .
Thus, the output intensity as a function of the input intensity is given by , where is the effective interaction length of the vapor cell. We use and as our nonlinear parameters. We note that for the atomic vapor system, the variable can be controlled by changing the temperature of the cell. Thus, the demonstrated nonlinearity is tunable, which can be exploited for a multilayer optical neural network, where the value will be gradually decreased to accommodate the signal loss in each layer.
References
[1] Y. LeCun, Y. Bengio, G. Hinton. Deep learning. Nature, 521, 436-444(2015).
[2] V. Sze, Y. Chen, J. Emer, A. Suleiman, Z. Zhang. Hardware for machine learning: challenges and opportunities. IEEE Custom Integrated Circuits Conference, 1-8(2017).
[3] J. Goodman. Introduction to Fourier Optics(2005).
[4] Y. Shen, N. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, M. Soljacic. Deep learning with coherent nanophotonic circuits. Nat. Photonics, 11, 441-446(2017).
[5] S. Colburn, Y. Chu, E. Shilzerman, A. Majumdar. Optical frontend for a convolutional neural network. Appl. Opt., 58, 3179-3186(2019).
[6] J. Feldmann, N. Youngblood, C. Wright, H. Bhaskaran, W. Pernice. All-optical spiking neurosynaptic networks with self-learning capabilities. Nature, 569, 208-214(2019).
[7] V. Bangari, B. Marquez, H. Miller, A. Tait, M. Nahmias, T. Lima, H. Peng, P. Prucnal, B. Shastri. Digital electronics and analog photonics for convolutional neural networks (DEAP-CNNs). IEEE J. Sel. Top. Quantum Electron., 26, 7701213(2020).
[8] X. Lin, Y. Rivenson, N. Yardimci, M. Veli, Y. Luo, M. Jarrahi, A. Ozcan. All-optical machine learning using diffractive deep neural networks. Science, 361, 1004-1008(2018).
[9] J. Chang, V. Sitzmann, X. Dun, W. Heidrich, G. Wetzstein. Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification. Sci. Rep., 8, 12324(2018).
[10] Y. Zuo, B. Li, Y. Zhao, Y. Jiang, Y. Chen, P. Chen, G. Jo, J. Liu, S. Du. All-optical neural network with nonlinear activation functions. Optica, 6, 1132-1137(2019).
[11] D. Ballarini, A. Gianfrate, R. Panico, A. Opala, S. Ghosh, L. Dominici, V. Ardizzone, G. Milena, G. Lerario, G. Gigli, T. Liw, M. Matuszewski, D. Sanvitto. Polaritonic neuromorphic computing outperforms linear classifiers. Nano Lett., 20, 3506-3512(2020).
[12] A. Ryou, S. Colburn, A. Majumdar. Image enhancement in a miniature self-imaging degenerate optical cavity. Phys. Rev. A, 101, 013824(2020).
[13] S. Colburn, A. Zhan, A. Majumdar. Varifocal zoom imaging with large area focal length adjustable metalenses. Optica, 5, 825-831(2018).
[14] K. Matsushima, T. Shimobaba. Band-limited angular spectrum method for numerical simulation of free-space propagation in far and near fields. Opt. Express, 17, 19662-19673(2009).
[15] G. Agrawal. Nonlinear Fiber Optics(2013).
[16] J. Davis, D. Cottrell, J. Campos, M. Yzuel, I. Moreno. Encoding amplitude information onto phase-only filters. Appl. Opt., 38, 5004-5013(1999).
[17] H. Noh, J. Mun, B. Han. Regularizing deep neural networks by noise: its interpretation and optimization. Advances in Neural Information Processing Systems, 5109-5118(2017).
[18] N. Nagabushan, N. Satish, S. Raghuram. Effect of injected noise in deep neural networks. IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), 1-5(2016).
[19] B. Poole, J. Sohl-Dickstein, S. Ganguli. Analyzing noise in autoencoders and deep networks(2014).
[20] L. Li, P. Patki, Y. Kwon, V. Stelmakh, B. Campbell, M. Annamalai, T. Lakoba, M. Vasilyev. All-optical regenerator of multi-channel signals. Nat. Commun., 8, 884(2017).
[21] O. Tzang, E. Niv, S. Singh, S. Labouesse, G. Myatt, R. Piestun. Wavefront shaping in complex media with a 350 kHz modulator via a 1D-to-2D transform. Nat. Photonics, 13, 788-793(2019).
[23] C. Foot. Atomic Physics(2005).
Set citation alerts for the article
Please enter your email address