Search by keywords or author

- Photonics Research
- Vol. 7, Issue 8, 823 (2019)

Abstract

1. INTRODUCTION

Artificial neural networks (ANNs) have shown exciting potential in a wide range of applications, but they also require ever-increasing computing power. This has prompted an effort to search for alternative computing methods that are faster and more energy efficient. One interesting approach is optical neural computing ^{[1–7]}. This analog computing method can be passive, with minimal energy consumption, and more importantly, its intrinsic parallelism can greatly accelerate computing speed.

Most optical neural computing follows the architecture of digital ANNs, using a layered feed-forward network, as shown in Fig. ^{[4,8]} or integrated waveguides ^{[1,3,9]} are used as the connections between layered activation units. Similar to digital signals in an ANN, optical signals pass through optical networks in the forward direction once (light reflection propagating in the backward direction is avoided or neglected). However, it is the reflection that provides the feedback mechanism, which gives rise to rich wave physics. It holds the key to the miniaturization of optical devices such as laser cavities ^{[10]}, photonic crystals ^{[11]}, metamaterials ^{[12]}, and ultracompact beam splitters ^{[13–15]}. Here we show that by leveraging optical reflection, it is possible to go beyond the paradigm of layered feed-forward networks to realize artificial neural computing in a continuous and layer-free fashion. Figure ^{[16]}, where they allow signals with intensities above a threshold to pass and block signals with intensities below that threshold. In order to better illustrate this behavior, an implementation of such a nonlinear material has been shown in Fig. ^{[17]}). Although the value for the threshold is chosen arbitrarily here, based on the properties of the saturable absorber that is used in practice, this threshold can be calculated using the method explained in Ref. ^{[1]}. This threshold also determines the minimum energy that we have to use for our device in practice.

Figure 1.(a) Conventional ANN architecture where the information propagates only in the forward direction (depicted by the green line that goes through the nodes from input to output); (b) proposed NNM. Passive neural computing is performed by light passing through the nanostructured medium with both linear and nonlinear scatterers. (c) Full-wave simulation of light scattered by nanostructures, which spatially redistribute the optical energy in different directions. (d) The behavior of the implementation of such a nonlinear material in one dimension. The output intensity of light with wavelength

2. IMPLEMENTATION

Figure ^{[18]}) method. The size of the NNM is

Sign up for ** Photonics Research** TOC. Get the latest issue of

Figure 2.(a) NNM trained to recognize handwritten digits. The input wave encodes the image as the intensity distribution. On the right side of the NNM, the optical energy concentrates to different locations depending on the image’s classification labels. (b) Two samples of the digit 2 and their optical fields inside the NNM. As can be seen, although the field distributions differ for the images of the same digit, they are classified as the same digit. (c) The same as (b) but for two samples of the digit 8. Also, in both (b) and (c), the boundaries of the trained medium have been shown with black borderlines (see

Nonlinear nanophotonic media can provide ultra-high density by tapping into sub-wavelength features. In theory, every atom in this medium can be varied to influence the wave propagation. In practice, a change below 10 nm would be considered too challenging for fabrication. Even at this scale, the potential number of weights exceeds 10 billion parameters per square millimeter for a 2D implementation. This is much greater computing density than both free-space ^{[8,19]} and on-chip optical neural networks ^{[1,3]}. In addition, NNM has a few other attractive features. It has stronger expressive power than layered optical networks. In fact, layered networks are a subset of NNM, as a medium can be shaped into connected waveguides as a layered network. Furthermore, it does not have the issue of diminishing gradients in deep neural networks. Maxwell’s equations, as the governing principle, guarantee that the underlying linear operation is always unitary, which does not have diminishing or exploding gradients ^{[20]}. Lastly, NNM does not have to follow any specific geometry, and thus it can be easily shaped and integrated into existing vision or communication devices as the first step of optical preprocessing.

3. TRAINING PROCESS

We now discuss the training of NNM. Although one could envision ^{[3]}, here we focus on training in the digital domain and use NNM only for inference. The underlying dynamics of the NNM are governed by the nonlinear Maxwell’s equations, which, in the frequency domain, can be written as where ^{[13]}. However, each gradient calculation requires solving full-wave nonlinear Maxwell’s equations. It is prohibitively costly for NNM, which could easily have millions of gradients. Here, we use the adjoint state method (ASM) to compute all gradients in one step: Here ^{[21]}: The training process, as illustrated in Fig.

Figure 3.(a) Training starts by encoding an image as a vector of current source densities in the FDFD simulation. This step is followed by an iterative process to solve for the electric field in a nonlinear medium. Next, we use the ASM to calculate the gradient, which is then used to update the level-set function and consequently, the medium itself. Here we use mini-batch SGD (explained in the supplementary materials section of Ref. [17]). In training with mini-batches, we sum the cost functions calculated for different images in the same batch and compute the gradients. (b)–(d) show an NNM in training after 1, 33, and 66 training iterations, respectively. (After iteration 66, the medium has already seen each of the training samples at least once, since we are using batches of 100 images.) At each step, the boundary between the host material and the inclusions is shown, along with the field distribution for the same randomly selected digit 8. Also, the accuracy of the medium on the test set can be seen for that particular stage in training.

The above process is repeated again, but for the next different image in the training queue, instead of the same image. This gradient descent process is stochastic, which is quite different from the typical use of ASM in nanophotonics ^{[14,15]}, where gradient descent is performed repeatedly for very few inputs until the loss function converges. In these traditional optimizations, the device needs to function for only those few specific inputs. If such processes were used here, the medium would do extremely well for particular images but fail to generalize and recognize other images.

The gradient descent process treats the dielectric constant as a continuous variable, but in practice, its value is discrete, depending on the material used at the location. For example, in the case of a medium with ^{[22]}. Here, we need to take special care to further constrain the optimization process. This is done by using a level-set function ^{[23]}, where each of the two materials (host material and the linear inclusion material) is assigned to each of the two levels in the level-set function ^{[14,15]}: The training starts with randomly distributed inclusions, both linear and nonlinear, throughout the host medium. The boundaries between two materials evolve in the training. Specifically, the level-set function is updated by -^{[24–26]}.

As a specific example, we now discuss the training of the 2D medium shown in Fig.

Next, we show another example based on a three-dimensional (3D) medium, whose size is ^{[27]}. Thus, here we allow the dielectric constant to vary continuously. To save on computational resources, we allow 5% variation. In experimental realization, a smaller variation range can always be compensated for by using larger media. The 3D trained NNM had an accuracy of about 84% for the test set; the confusion matrix is shown in Fig.

Figure 4.(a) 3D NNM case. Different colors illustrate varying values of permittivity. The input image is projected onto the top surface. Computing is performed while the wave propagates through the 3D medium. The field distribution on the bottom surface is used to recognize the image. Full-wave simulation shows the optical energy is concentrated on the location with the correct class label, in this case 6. (b) The confusion matrix. The rows on the matrix show true labels of the images that have been presented as input, and the columns depict the labels that the medium has classified each input. Therefore, the diagonal elements show the number of correct classifications out of every 10 samples (see

4. CONCLUSION

Here we show that the wave dynamics in Maxwell’s equations is capable of performing highly sophisticated computing. There is an intricate connection between differential equations that governs many physical phenomena and neural computing (see more discussion in supplementary materials), which could be further explored. From the perspective of optics, the functions of most nanophotonic devices can be described as mode mapping ^{[28]}. In traditional nanophotonic devices, mode mapping mostly occurs between eigenmodes. For example, a polarization beam splitter ^{[13]} maps each polarization eigenmode to a spatial eigenmode. Here, we introduce a class of nanophotonic media that can perform complex and nonlinear mode mapping equivalent to artificial neural computing. The neural computing media shown here have an appearance of disorder media. It would be also interesting to see how disorder media, which support rich physics such as Anderson localization, could provide a new platform for neural computing. Combined with ultrahigh computing density, NNM could be used in a wide range of information devices as the analog preprocessing unit.

Acknowledgment

**Acknowledgment**. The authors thank W. Shin for his help on improving the computational speed of the implementation of this method. E. Khoram and Z. Yu thank Shanhui Fan and Momchil Minkov for helpful discussion on nonlinear Maxwell’s equations.

References

[5] P. R. Prucnal, B. J. Shastri**. Neuromorphic Photonics(2017)**.

[6] H. G. Chen, S. Jayasuriya, J. Yang, J. Stephen, S. Sivaramakrishnan, A. Veeraraghavan, A. Molnar. ASP vision: optically computing the first layer of convolutional neural networks using angle sensitive pixels**. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 903-912(2016)**.

[9] M. Hermans, T. Van Vaerenbergh**. “Towards trainable media: Using waves for neural network-style training,”(2015)**.

[11] J. D. Joannopoulos, S. G. Johnson, J. N. Winn, R. D. Meade**. Photonic Crystals: Molding the Flow of Light(2011)**.

[12] W. Cai, V. Shalaev**. Optical Metamaterials: Fundamentals and Applications(2009)**.

[16] V. Nair, G. E. Hinton. Rectified linear units improve restricted Boltzmann machines**. Proceedings of the 27th International Conference on Machine Learning (ICML-10), 807-814(2010)**.

[17] E. Khoram, A. Chen, D. Liu, Q. Wang, Z. Yu, L. Ying**. Nanophotonic media for artificial neural inference(2018)**.

[20] L. Jing, Y. Shen, T. Dubcek, J. Peurifoy, S. Skirlo, Y. LeCun, M. Tegmark, M. Soljačić. Tunable efficient unitary neural networks (EUNN) and their application to RNNs**. Proceedings of the 34th International Conference on Machine Learning (PMLR), 1733-1741(2017)**.

[22] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, Y. Bengio. Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or -1**(2016)**.

[24] J. Bergstra, Y. Bengio. Random search for hyper-parameter optimization**. J. Mach. Learn. Res., 13, 281-305(2012)**.

[25] J. Snoek, H. Larochelle, R. P. Adams. Practical Bayesian optimization of machine learning algorithms**. Advances in Neural Information Processing Systems, 2951-2959(2012)**.

[26] S. Saxena, J. Verbeek. Convolutional neural fabrics**. Advances in Neural Information Processing Systems, 4053-4061(2016)**.

Erfan Khoram, Ang Chen, Dianjing Liu, Lei Ying, Qiqi Wang, Ming Yuan, Zongfu Yu. Nanophotonic media for artificial neural inference[J]. Photonics Research, 2019, 7(8): 823

Download Citation

Set citation alerts for the article

Please enter your email address