Over the past decades, photonics has transformed many areas in both fundamental research and practical applications. In particular, we can manipulate light in a desired and prescribed manner by rationally designed subwavelength structures. However, constructing complex photonic structures and devices is still a time-consuming process, even for experienced researchers. As a subset of artificial intelligence, artificial neural networks serve as one potential solution to bypass the complicated design process, enabling us to directly predict the optical responses of photonic structures or perform the inverse design with high efficiency and accuracy. In this review, we will introduce several commonly used neural networks and highlight their applications in the design process of various optical structures and devices, particularly those in recent experimental works. We will also comment on the future directions to inspire researchers from different disciplines to collectively advance this emerging research field.
- Photonics Research
- Vol. 9, Issue 4, B135 (2021)
Abstract
1. INTRODUCTION
Novel optical devices consisting of elaborately designed structures have become an extremely dynamic and fruitful research area because of their capability of manipulating light flow down to the nanoscale. Thanks to the advanced numerical simulation, fabrication, and characterization techniques, people are able to design, fabricate, and demonstrate dielectric and metallic micro- and nano-structures with sophisticated geometries and arrangements. For instance, metamaterials and metasurface comprising subwavelength structures, called meta-atoms, can show extraordinary properties beyond those of natural materials [1]. Many metadevices have been reported that offer enormous opportunities for technology breakthroughs in a wide range of applications including light steering [2–5], holography [6–9], imaging [10–14], sensing [15–17], and polarization control [18–21].
At present, we can handle most of the photonic design problems by accurately solving Maxwell’s equations using numerical algorithms such as the finite element method (FEM) and finite-difference time-domain (FDTD) method. However, those methods often require plenty of time and computational resources, especially when it comes to the inverse design problem aiming to retrieve the optimal structure from target optical responses and functionalities. In the conventional procedure, we normally start with full-wave simulations of an initial design based on the empirical knowledge and then adjust the geometric/material parameters iteratively to approach the customer-specific requirements. Such a trial-and-error process is time consuming, even for most experienced researchers. The initial design strongly relies on our experience and cognition, and usually some basic structures are chosen, including split-ring resonators [22,23], helix [24], cross [25], bowtie [26], L-shape [2], and H-shape [27,28] structures. Although it is known that a specific type of structures can produce a certain optical response (e.g., strong magnetic resonance from split-ring resonators and chiroptical response from helical structures), sometimes the well-established knowledge may limit our aspiration to seek an entirely new design that is suitable for the same applications or even more complicated ones when the traditional approach is not applicable.
Artificial neural networks (ANNs) provide a new and powerful approach for photonic designs [29–37]. ANNs can build an implicit relationship between the input (i.e., geometric/materials parameters) and the output (i.e., optical responses), mimicking the nonlinear nerve conduction process in the human body. With the help of well-trained ANNs, we can bypass the complicated and time-consuming design process that heavily relies on numerical simulations and optimization. The functions of most ANN models for photonic designs are twofold: the forward prediction and inverse design. The forward prediction network is used to determine the optical responses from the geometric/material parameters, and it can serve as a substitute for full-wave simulations. The inverse design network aims to efficiently retrieve the optimal structure from given optical responses, which is usually more important and challenging in the design process. One main advantage of the ANN models is the speed. For example, producing the spectrum of a meta-atom from a well-trained forward prediction model only takes a few milliseconds, orders of magnitude faster than typical full-wave simulations based on FEM or FDTD [38–40]. In the meantime, the accuracy of the ANN models is comparable with rigorous simulations. For instance, the mean squared loss of spectrum prediction is typically on the order of to [40,41]. Moreover, ANNs can unlock the nonintuitive and nonunique relationship between the physical structure and the optical response, and hence potentially enlighten the researchers with an entirely new class of structures.
Sign up for Photonics Research TOC. Get the latest issue of Photonics Research delivered right to you!Sign up now
Solving the photonic design problem by ANNs is a data-driven approach, which means a large amount of training sets with both geometric/material parameters and optical responses are needed. Once the ANN model works well on the training data set, it can be tested on a test set or real problem. The test and training data sets should be in the same design framework but contain completely different data. The general workflow for a forward prediction network includes four steps. First, a large number of input structures and output optical responses are generated from either simulations or experiments. In most of the published works, the amount of data is on the order of . It is noted that the performance of the neural networks depends on both the size and quality of data. To improve the quality of training data, some researchers have applied rule-based optimization methods in the generation of initial training data [42] or attempted to progressively increase the dimension of the training data with the new ones from the trained model [43]. Then we design the ANNs with a certain network structure, such as fully connected layers (FCLs)-based neural networks or convolutional neural networks (CNNs). Next, the training data set is fed into the network, and we optimize the weight and bias for each node. Finally, the well-trained ANNs can be used to predict the response of other input structures that are outside the training and test data sets. As for the inverse design problem, one can simply reverse the input and output and use a similar network structure. However, for some problems, it requires complex methods and algorithms.
This review is devoted to the topic of designing photonic structures and devices with ANNs. We will focus on very recent works on this topic, especially the experimental demonstrations, after introducing the widely used ANNs. The remaining part of the review is organized as follows. In Section 2, we will discuss the basic FCLs and their application in the prediction of design parameters. Then, in Section 3, we will focus on the CNNs that are used in the retrieval of much more complicated structures described by pixelated images. In Section 4, other useful and efficient hybrid algorithms by combining deep learning and conventional optimization methods for photonic design will be discussed. In the last section, we will conclude the review by discussing the achievements, current challenges, and outlooks in the future.
2. PHOTONIC DESIGN BY FULLY CONNECTED NEURAL NETWORK
A. Introduction of FCLs
Figure 1.(a) Illustration of a biological neuron. (b) FCLs-based neural network, in which all neurons in adjacent layers are connected. (c) Three widely used activation functions: Sigmoid, tanh, and ReLU.
The training process of the fully connected neural network is quite straightforward. The training set contains an input vector and an output vector ( can be a vector of complex/real values for regression problems or vector of discrete integers as labels for classification problems). The performance of the model is highly dependent on the quantity and quality of the training data set. During the training process, the network first takes the vector as input and calculates the output through the tensor operation and activation from left to right. Then a loss function (or cost function) is defined and needs to be minimized in order to calculate the performance of the neural network. For instance, we can use mean-squared-error (MSE) for regression problems and cross-entropy loss for classification problems. The next step, the backpropagation of error, is the most critical part of ANNs. In the ANN, there are a series of learnable parameters to be optimized, i.e., the weight and bias of each layer. We can then derive the partial derivative of the loss with respect to each parameter . To calculate those values, we need to apply the chain rule layer by layer from the end of the ANN to the front. This is why the process is called “backpropagation.” Finally, all the parameters are optimized by the stochastic gradient descent method:
B. Design Parameterized Structure by FCLs-Based ANNs
Figure 2.(a) Top: Schematic of the tandem neural network and
Subsequent works have further confirmed the good performance of the tandem network architecture. For instance, S. So
Besides the tandem network, other approaches have been introduced to improve the performance of the FCLs-based neural network. In 2019, Y. Chen
Figure 3.(a) Left: Schematic illustration of the metasurface, the unit cell, and matrix encoding method. Right: Predicted S-parameter and absorptivity with the REACTIVE method. (b) Illustration of the neural network architecture consisting of BaseNet and TransferNet. (c) The trend of spectrum error when
Due to the data-driven nature of deep learning, the performance of a well-trained ANN highly relies on the training set, and the prediction loss is likely to increase as the inputs deviate from the training set. Therefore, a challenge in the deep-learning-aided inverse design lies in extending the capability of ANNs to an alternated data set that is very different from the training data. Usually, one needs to generate an entirely new training set for similar but different physical scenarios. In this context, reducing the demand for computational data is an efficient way to accelerate the training of deep learning models. Y. Qu
The FCLs have also been utilized in reinforcement learning [50–53], which is another hot area of machine learning, for the inverse design problem. Reinforcement learning has already achieved great performance in robotics, system control, and game-playing (AlphaGo). Instead of predicting the optimized geometry, the ANNs in reinforcement learning behave as an iterative optimization method. In each step, an action to optimize the geometry parameters is predicted. For instance, the action can be increasing or decreasing several parameters by a certain value. The advantage of this approach is that it can be adaptive to specific problems, and it can provide guidance for conventional trial-and-error optimization methods.
Figure 4.(a) Left: Architecture of the proposed neural network for nonlinear layers. Right: Predicted, simulated, and measured transmission spectra of two gold nanostructures under different polarization conditions. (b) Left: Illustrations of MANN used for reconstruction of 3D vectorial field. Right: Experimental approach and characterizations of 3D vectorial holography based on a vectorial hologram. (c) Left: Schematic of a deep-learning-enabled self-adaptive metasurface cloak. Right: Demonstration of the self-adaptive cloak response subject to random backgrounds and incidence with varied angles and frequencies. (a) is reproduced from Ref. [54] with permission; (b) is reproduced from Ref. [63] with permission; (c) is reproduced from Ref. [64] with permission.
In addition to spectrum prediction [55,56], the FCLs-based ANNs have also been used in the inverse design to realize other functionalities and benefit real-world applications [57–62]. Holographic images, for example, can be optimized by ANNs to achieve a wide viewing angle and three-dimensional vectorial field as recently demonstrated by H. Ren
Another exciting work enabled by ANNs is a self-adaptive cloak that can respond within milliseconds to ever-changing incident waves and surrounding environments without human intervention [64]. A pretrained ANN was adopted to achieve the function. As schematically illustrated on the left panel of Fig. 4(c), at the surface of the cloak, a single layer of active meta-atoms was applied, and the reflection spectrum of each varactor diode was controlled by DC bias voltage independently. To achieve the invisibility cloak function, the bias voltage was determined by the pretrained ANN with the incident wave characteristics (such as the incident angle, frequency, and reflection amplitude) as the input. The temporal response of the cloak was simulated, and an extremely fast transient response of 16 ms can be observed in the simulation. The authors then conducted the experiment, where a p-polarized Gaussian beam illuminated at an angle on a chameleon object covered by the cloak. Two detectors were used to extract the signals from the background and the incident wave to characterize the cloak. The right panel of Fig. 4(c) shows the experimental results at two incident angles (9° and 21°) and two frequencies (6.7 and 7.4 GHz). The magnetic field distribution in the case of a cloaked object is similar to that when only the background is present, while it is distinctly different from the bare object case. Differential radar cross-section (RCS) measurement further confirmed the performance of the cloak.
3. RETRIEVE COMPLEX STRUCTURES BY CONVOLUTIONAL NEURAL NETWORKS
A. Introduction of CNNs
The desired designs and structures are oftentimes hard to parameterize, especially when the structure of interest contains many basic shapes [41,65] or is freeform [66,67]. In some cases, we need to deal with complex optical responses as the input [68]. Therefore, converting the structure to a 2D or 3D image is usually a good approach in these studies. Moreover, it can offer much larger degrees of freedom in the design process. However, preprocessing is required to handle the image input if we still want to use the FCLs-based model. Reshaping the image to a one-dimensional vector and applying feature extraction with linear embeddings, such as principal component analysis and random projection, are two effective ways to preprocess the image so that the input is compatible with the FCLs. However, the performance is usually not satisfactory. The reason is that these conversions will either break down the correlation of the nearest pixels in the vertical direction within an individual image or miss part of the information describing the integrality of the whole image. An extremely large dimension of the input is another big issue, which will increase the number of connections between layers quadratically. For conventional parameter input, the input dimension is usually a few tens or hundreds, while for a vectorized image, even an image with pixels will result in a 4096-dimensional input vector. CNNs are very suitable to deal with such circumstances. CNNs accept an image input without preprocessing, and then several filters move along the horizontal and vertical directions of the image to extract different features. Each filter has a certain weight to perform a convolutional operation at each subarea of the image, that is, the summation of the pointwise multiplication between the value of the subarea and the weight of the filter.
Figure 5.(a) Schematic of the convolution operation, in which the filters map the subarea in the input image to a single value in the output image. (b) Schematic of the pooling operation, in which the subarea in the input image is pooled into a single value in the output according to the maximum or mean value. (c) The workflow of a conventional CNN. The input images pass through several CNNs, and then the extracted features are passed into the FCLs to predict the response (e.g., transmission, reflection, and absorption spectra).
B. Design Complex Photonic Structures by CNNs
Figure 6.(a) Top: Examples of cDCGAN-suggested images and the simulation results. Bottom: Entirely new structures suggested by the cDCGAN for desired spectra. (b) Top: The proposed deep generative model for metamaterial design, which consists of the prediction, recognition, and generation models. Bottom: Evaluation of the proposed model. The desired spectra either generated with user-defined function or simulated from an existing structure are plotted in the first column. The reconstructed structures with the simulated spectra are plotted in the second and third columns. (c) Left: Flowchart of the VAE-ES framework. Right: Test results of designed photonic structures from the proposed model and the simulated spectra. (a) is reproduced from Ref. [69] with permission; (b) is reproduced from Ref. [41] with permission; (c) is reproduced from Ref. [79] with permission.
W. Ma
Z. Liu
Figure 7.(a) Left: One example of 1-bit coding elements with regular phase differences. Right: Comparison of the simulated and measured results of the dual- and triple-beam coding metasurfaces. (b) Schematic of the proposed 3D CNN model to characterize the near-field and far-field properties of arbitrary dielectric and plasmonic nanostructures. (c) Left: Sketch of the nanostructure geometry and the 1D CNN-based ANNs. Right: Training convergence and readout accuracy of the ANNs. (d) Left: The workflow of designing the DMD pattern for light control through scattering media with ANNs. Right: The structures of the FCLs-based single-layer neural network and the CNNs, together with the simulated and measured results for the focusing effect. (a) is reproduced from Ref. [80] with permission; (b) is reproduced from Ref. [81] with permission; (c) is reproduced from Ref. [86] with permission; (d) is reproduced from Ref. [87] with permission.
CNNs are widely applied in 2D image processing. The significance of CNNs is attributed to their ability to keep the local segment of the input as a whole, which can theoretically work in an arbitrary dimension. Taking advantage of this property, P. R. Wiecha and O. L. Muskens built a model with 3D CNNs to predict the near-field and far-field electric/magnetic response of arbitrary nanostructures [81]. They pixelated the dielectric or plasmonic nanostructure of interest into a 3D image and fed the image into several layers of 3D CNNs. Then an output 3D image with the same size as the input was predicted, representing the electric field under a fixed wavelength and polarization in the same coordination system as shown in Fig. 7(b). The residual connections and shortcut connections in the network are known as the residual learning [82] and U-Net [83] blocks, which can help to stabilize the gradient of the networks and make the network deeper without compromising its performance [84,85]. From the predicted near-field response, other physical quantities, such as far-field scattering patterns, energy flux, and electromagnetic chirality, can then be deduced. The authors studied two cases: 2D gold nanostructures with random polygonal shapes and 3D silicon structures consisting of several pillars. Each scheme was trained by simulation data of 30,000 distinct geometries. With the well-trained model, the authors reproduced several nano-optical effects from the near-field prediction from the 3D CNNs, like antenna behavior of gold nanorods and Kerker-type scattering of Si nanoblocks. The model can potentially serve as an extremely fast tool to replace the current full-wave simulation methods, with the trade-off of slightly decreased accuracy.
In parallel, a one-dimensional (1D) CNN was also introduced to analyze the scattering spectra of silicon nanostructures for optical information storage as demonstrated by P. R. Wiecha
CNNs are not always the best choice for image inputs as found by A. Turpin
4. OTHER INTELLIGENT ALGORITHMS FOR PHOTONIC DESIGNS
Figure 8.(a) Left: Illustration of meta-molecules. Right: Fabricated samples and the measured and simulated results of polarization conversion. (b) Top: Schematic of a silicon metagrating that deflects light to a certain angle. Bottom: The proposed conditional GLOnet for metagrating optimization. (c) Top: Schematic of structure refinement and filtering for the high-efficiency thermal emitter. Bottom: The efficiency, emissivity, and normalized emission of the well-optimized thermal emitter. (d) Top: Illustration of the unit cell consisting of three metallic patches connected via PIN diodes and a photograph of the fabricated metasurface. Bottom: Experimental results for reconstructing human body imaging. (a) is reproduced from Ref. [95] with permission; (b) is reproduced from Ref. [100] with permission; (c) is reproduced from Ref. [42] with permission; (d) is reproduced from Ref. [104] with permission.
Another widely used optimization algorithm for the inverse design is gradient-based topology optimization [21,96–103]. In the optimization process, the design space is discretized into pixels whose properties (i.e., refractive index) can be represented by a parameter set . The parameter set will be optimized for a prescribed target response by maximizing (minimizing) a user-defined objective function . Starting from an initial parameter set, both a forward simulation and an adjoint simulation are performed to calculate the gradient of the objective function with respect to each parameter. Then the parameters are updated according to the gradient ascent (descent) method. This iterative process is continued until the objective function is well optimized. Taking advantage of the topology optimization, J. Jiang
Combining topology optimization and ANNs, Z. A. Kudyshev
Conventional machine learning methods, such as Bayesian learning [106], clustering [107], and manifold learning [104], are also very helpful in solving photonic design problems. In 2019, L. Li
5. CONCLUSION AND OUTLOOK
In this review, we have introduced the basic idea of applying ANNs and other advanced algorithms to accelerate and optimize photonic designs, including plasmonic nanostructures and metamaterials. We have highlighted some representative works in this field and discussed the performance and applications of the proposed models. In the inverse design problem, the neural network is usually built upon FCLs and CNNs, integrated with other neural network units like ResNets and RNNs. It is beneficial to incorporate ANNs with conventional optimization methods such as genetic algorithm and topology optimization because the conventional optimization methods can help to perform global optimization and provide feedback to further improve the ANNs. The emergence of all the methods offers a great opportunity to increase the structural complexity in the devices, which can realize much more complex and novel functionalities.
Figure 9.(a) Top: Comparison between the all-optical
The ANNs are typically considered a “black box” since the relationship between inputs and outputs learned by the ANNs is usually implicit. In some published works, researchers can visualize the output of each individual layer to provide some information on what feature is learned (or what function is done) by each layer [40], which is a good attempt. However, if we can further extract the relation explicitly from the well-trained ANNs, it will be very helpful to find new structure groups that lie out of the conventional geometry groups (like H-shape, C-shape, bowtie). At the same time, it will also provide guidelines or insights for the design of optical devices. Another important direction is to extend the generality of the ANNs models. When applying ANNs to solve the traditional tasks, such as image recognition and natural language processing, we want the neural networks to learn the information and distribution that lie inside the natural images or languages themselves and try to reconstruct or approximate these distributions. The ANNs have been proven to work well in learning and summarizing the distributions from the images or languages. At the same time, it is relatively easy to extend the model to deal with other kinds of images or languages. However, the inverse design tasks in photonics are more complicated. The reason is that the ANNs need to learn the implicit physical rules (such as Maxwell’s equations) between the structures and their optical responses, instead of the information and distribution associated with the structures themselves. Therefore, extending the capability of a well-trained neural network in the inverse design problems remains a challenge. Most of the ANNs described in this review paper are only specified for a certain design platform or application. It is true that a model can be fine-tuned to handle different tasks, but the model needs to be retrained and, at the same time, an additional training data set is required. When the original training set contains all kinds of training data for multiple tasks, multiple design rules are likely to be involved and learned by the ANNs. The performance of the model will not be satisfactory for each individual task compared to the model trained with only a specific data set for this task, because the rules for other tasks will serve as perturbation or noise in this case. It is very important to find the trade-off.
Over the past decades, photonics and artificial intelligence have been evolving largely as two separate research disciplines. The intersection and combination of these two topics in recent years have brought exciting achievements. On one hand, the innovative ANN models provide a powerful tool to accelerate the optical design and implementation process. Some nonintuitive structures and phenomena have been discovered by this new strategy. On the other hand, the developed optical designs are expected to produce a variety of real-world applications, such as optical imaging, holography, communications, and information encryption, with high efficiency, fidelity, and robustness. Toward this goal, we need to include the practical fabrication constraints and underlying material properties into the design space in order to globally optimize the devices and systems. We believe that the field of interfacing photonics and artificial intelligence will significantly move forward as more researchers from different backgrounds join this effort.
References
Set citation alerts for the article
Please enter your email address