• Advanced Photonics
  • Vol. 5, Issue 1, 016003 (2023)
Jingxi Li1、2、3, Tianyi Gan1、3, Bijie Bai1、2、3, Yi Luo1、2、3, Mona Jarrahi1、3, and Aydogan Ozcan1、2、3、*
Author Affiliations
  • 1University of California, Electrical and Computer Engineering Department, Los Angeles, California, United States
  • 2University of California, Bioengineering Department, Los Angeles, California, United States
  • 3University of California, California NanoSystems Institute, Los Angeles, California, United States
  • show less
    DOI: 10.1117/1.AP.5.1.016003 Cite this Article Set citation alerts
    Jingxi Li, Tianyi Gan, Bijie Bai, Yi Luo, Mona Jarrahi, Aydogan Ozcan. Massively parallel universal linear transformations using a wavelength-multiplexed diffractive optical network[J]. Advanced Photonics, 2023, 5(1): 016003 Copy Citation Text show less

    Abstract

    Large-scale linear operations are the cornerstone for performing complex computational tasks. Using optical computing to perform linear transformations offers potential advantages in terms of speed, parallelism, and scalability. Previously, the design of successive spatially engineered diffractive surfaces forming an optical network was demonstrated to perform statistical inference and compute an arbitrary complex-valued linear transformation using narrowband illumination. We report deep-learning-based design of a massively parallel broadband diffractive neural network for all-optically performing a large group of arbitrarily selected, complex-valued linear transformations between an input and output field of view, each with Ni and No pixels, respectively. This broadband diffractive processor is composed of Nw wavelength channels, each of which is uniquely assigned to a distinct target transformation; a large set of arbitrarily selected linear transformations can be individually performed through the same diffractive network at different illumination wavelengths, either simultaneously or sequentially (wavelength scanning). We demonstrate that such a broadband diffractive network, regardless of its material dispersion, can successfully approximate Nw unique complex-valued linear transforms with a negligible error when the number of diffractive neurons (N) in its design is ≥2NwNiNo. We further report that the spectral multiplexing capability can be increased by increasing N; our numerical analyses confirm these conclusions for Nw > 180 and indicate that it can further increase to Nw ∼ 2000, depending on the upper bound of the approximation error. Massively parallel, wavelength-multiplexed diffractive networks will be useful for designing high-throughput intelligent machine-vision systems and hyperspectral processors that can perform statistical inference and analyze objects/scenes with unique spectral properties.

    1 Introduction

    Computing plays an increasingly vital role in constructing intelligent, digital societies. The exponentially growing power consumption of digital computers brings some important challenges for large-scale computing. Optical computing can potentially provide advantages in terms of power efficiency, processing speed, and parallelism. Motivated by these, we have witnessed various research and development efforts on advancing optical computing over the last few decades.132 Synergies between optics and machine learning have enabled the design of novel optical components using deep-learning-based optimization,3344 while also allowing the development of advanced optical/photonic information processing platforms for artificial intelligence.5,2032,45

    Among different optical computing designs, diffractive optical neural networks represent a free-space-based framework that can be used to perform computation, statistical inference, and inverse design of optical elements.22 A diffractive neural network is composed of multiple transmissive and/or reflective diffractive layers (or surfaces), which leverage light–matter interactions to jointly perform modulation of the input light field to generate the desired output field. These passive diffractive layers, each containing thousands of spatially engineered diffractive features (termed as “diffractive neurons”), are designed (optimized) in a computer using deep-learning tools, e.g., stochastic gradient descent and error backpropagation. Once the training process converges, the resulting diffractive layers are fabricated to form a passive, free-space optical processing unit that does not consume any power except the illumination light. This framework is also scalable, since it can adapt to changes in the input field of view (FOV) or data dimensions by adjusting the size and/or the number of diffractive layers. Diffractive networks can directly access the 2D/3D input information of a scene or object and process the optical information encoded in the amplitude, phase, spectrum, and polarization of the input light, making them highly suitable as intelligent optical front ends for machine-vision systems.

    Diffractive neural networks have been used to perform various optical information processing tasks, including, object classification,22,4657 image reconstruction,52,58,59 all-optical phase recovery and quantitative image phase imaging,60 class-specific imaging,61 super-resolution image displays,62 and logical operations.6365 Employing successive spatially engineered diffractive surfaces as the backbone for inverse design of deterministic optical elements also enabled numerous applications, such as spatially controlled wavelength demultiplexing,66 pulse engineering,67 and orbital angular momentum multiplexing/demultiplexing.68

    In addition to these task-specific applications, diffractive networks also serve as general-purpose computing modules that can be used to create compact, power-efficient all-optical processors. Recent efforts have shown that a diffractive network can be used to all-optically perform an arbitrarily selected, complex-valued linear transformation between its input and output FOVs with a negligible error when the number of trainable diffractive neurons (N) approaches NiNo, where Ni and No represent the number of pixels at the input and output FOVs, respectively.69 Using nontrainable, predetermined polarizer arrays within an isotropic diffractive network, a polarization-encoded diffractive processor was also demonstrated to accurately perform a group of Np=4 distinct complex-valued linear transformations using a single system with NNpNiNo=4NiNo; in this case, each one of these four optical transformations can be accessed through a different combination of the input/output polarization states.70 This polarization-encoded diffractive system is limited to a multiplexing factor of Np=4, since an additional desired transformation matrix that can be assigned to a new combination of input–output polarization states can be written as a linear combination of the four linear transforms that are already learned by the diffractive processor.70 These former works involved monochromatic diffractive networks where a single illumination wavelength encoded the input information channels.

    In this paper, we rigorously address and analyze the following question. Let us imagine an optical black-box (composed of diffractive surfaces and/or reconfigurable spatial light modulators): how can that black-box be designed to simultaneously implement, e.g., Nw>1000 independent linear transformations corresponding to >1000 different matrix multiplications (with >1000 different independent matrices) at Nw>1000 different unique wavelengths? More specifically, here we report the use of a wavelength multiplexing scheme to create a broadband diffractive optical processor, which massively increases the throughput of all-optical computing by performing a group of distinct linear transformations in parallel using a single diffractive network. By encoding the input/output information of the target linear transforms using Nw different wavelengths (i.e., λ1,λ2,,λNw), we created a single-broadband diffractive network to simultaneously perform a group of Nw arbitrarily selected, complex-valued linear transforms with negligible error. We demonstrate that N2NwNiNo diffractive neurons are required to successfully implement Nw complex-valued linear transforms using a broadband diffractive processor, where the thickness values of its diffractive neurons constitute the only variables optimized during the deep-learning-based training process. Without loss of generality, we numerically demonstrate wavelength-multiplexed universal linear transformations with Nw>180, which can be further increased to Nw2000 based on the approximation error threshold that is acceptable. We also demonstrate that these wavelength-multiplexed universal linear transformations can be implemented even with a flat material dispersion, where the refractive index (n) of the material at the selected wavelength channels is the same, i.e., n(λ)no for λ{λ1,λ2,,λNw}. The training process of these wavelength-multiplexed diffractive networks was adaptively balanced across different wavelengths of operation such that the all-optical linear transformation accuracies of the different channels were similar to each other, without introducing a bias toward any wavelength channel or the corresponding linear transform.

    It is important to emphasize that the goal of this work is not to train the broadband diffractive network to implement the correct linear transformations for only a few input–output field pairs. We are not aiming to use the diffractive layers as a form of metamaterial that can output different images or optical fields at different wavelengths. Instead, our goal is to generalize the performance of our broadband diffractive processor to infinitely many pairs of input and output complex fields that satisfy the target linear transformation at each spectral channel, thus achieving universal all-optical computing of multiple complex-valued matrix–vector multiplications, accessed by a set of illumination wavelengths (Nw1).

    Moreover, we would like to clarify that the wavelength multiplexing scheme used for our framework in this paper should not be confused with other efforts that integrated wavelength-division multiplexing (WDM) technologies to optical neural computing, such as in Refs. 7173. In these earlier work, WDM was utilized to encode the 1D input/output information to perform a vector–matrix multiplication operation, where the optical network was designed to perform only one linear transformation based on a single input data vector, producing a single output vector that is spectrally encoded. However, in our work, we use the wavelength multiplexing to perform multiple independent linear transformations (Nw1) within a single optical network architecture, where each of these complex-valued linear transformations can be accessed at distinct wavelengths (simultaneously or sequentially). Also the input and output fields of each one of these linear transformations in our framework are spatially encoded in 2D at the input/output FOVs using the same wavelength, rather than being spectrally encoded, as demonstrated in earlier WDM-based designs.7173 This unique feature allows our diffractive network to all-optically perform a large group of independent linear transformations in parallel by sharing the same 2D input/output FOVs.

    Compared to the previous literature, this paper has various unique aspects: (1) this is the first demonstration of a spatially engineered diffractive system to achieve spectrally multiplexed universal linear transformations; (2) the level of massive multiplexing that is reported through a single wavelength-multiplexed diffractive network (e.g., Nw>180) is significantly larger compared to other channels of multiplexing, including polarization diversity,70 and this number can be further increased to Nw2000 with more diffractive neurons (N) used in the network design; (3) deep-learning-based training of the diffractive layers used adaptive spectral weights to equalize the performances of all the linear transformations assigned to Nw different wavelengths; (4) the capability to perform multiple linear transformations using wavelength multiplexing does not require any wavelength-sensitive optical elements to be added into the diffractive network design, except for wavelength scanning or broadband illumination with demultiplexing filters; and (5) this wavelength-multiplexed diffractive processor can be implemented using various materials with different dispersion properties (including materials with a flat dispersion curve) and is widely applicable to different parts of the electromagnetic spectrum, including the visible band. Furthermore, we would like to emphasize that since each dielectric feature of this wavelength-multiplexed diffractive processor is based on material thickness variations, it simultaneously modulates all the wavelengths within the spectrum of interest. This means that each wavelength channel within the set of Nw unique wavelengths has a different error gradient with respect to the optical transformation that is assigned to it, and therefore, the diffractive layer optimization spanning Nw wavelengths deviates from the ideal optimization path of an individual wavelength. Since the diffractive layers considered here do not possess any spectral selectivity, we used a training loss function, simultaneously taking into account all the wavelength channels that were used to find a locally optimal intersection set among all the Nw wavelengths to accurately perform all the desired Nw transformations. This behavior is quite different from the earlier generations of monochromatic diffractive processors69 that optimized the phase profiles of the diffractive layers for only one wavelength assigned to one optical transformation.

    Based on the massive parallelism exhibited by this broadband diffractive network, we believe that this platform and the underlying concepts can be used to develop optical processors operating at different parts of the spectrum with extremely high computing throughput. Its throughput can be further increased by expanding the range and/or the number of encoding wavelengths as well as by combining wavelength multiplexing with other multiplexing schemes such as polarization encoding. The reported framework would be valuable for the development of multicolor and hyperspectral machine-vision systems that perform statistical inference based on the spatial and spectral information of an object or a scene, which may find applications in various fields, including biomedical imaging, remote sensing, analytical chemistry, and material science.

    2 Results

    2.1 Design of Wavelength-Multiplexed Diffractive Optical Networks for Massively Parallel Universal Linear Transformations

    Throughout this paper, the terms “diffractive deep neural network,” “diffractive neural network,” “diffractive optical network,” and “diffractive network” are used interchangeably. Figure 1 illustrates the schematic of our broadband diffractive optical network design for massively parallel, wavelength-multiplexed all-optical computing. The broadband diffractive network, composed of eight successive diffractive layers, contains in total N diffractive neurons with their thickness values as learnable variables, which are jointly trained to perform a group of Nw linear transformations between the input and output FOVs through Nw parallel wavelength channels. More details about this diffractive architecture, its optical forward model, and training details can be found in Sec. 4. To start with, a group of Nw different wavelengths, λ1,λ2,,λNw, are selected to be used as the wavelength channels for the broadband diffractive processor to encode different input complex fields and perform different target transformations (see Fig. 1). For the implementation of the broadband diffractive designs in this paper, we fixed the mean value λm of this group of wavelengths {λ1,λ2,,λNw}, i.e., λm=1Nww=1Nwλw and assigned these wavelengths to be equally spaced between λ1=0.9125λm and λNw=1.0875λm. Unless otherwise specified, we chose λm to be 0.8 mm in our numerical simulations, as it aligns with the terahertz band that was experimentally used in several of our previous works.50,52,58,59,61,62,66,67 Without loss of generality, the wavelengths used for the design of the broadband diffractive processors can also be selected at other parts of the electromagnetic spectrum, such as the visible band, for which the related simulation results and analyses can be found in Sec. 3 to follow. Based on the scalar diffraction theory, the broadband optical fields propagating in the diffractive system are simulated at these selected wavelengths using a sampling period of 0.5λm along both the horizontal and vertical directions. We also select 0.5λm as the size of the individual neurons on the diffractive layers. With these selections, we include in our optical forward model all the propagating modes that are transmitted through the diffractive layers.

    Schematic of massively parallel, wavelength-multiplexed diffractive optical computing. Optical layout of the wavelength-multiplexed diffractive neural network, where the diffractive layers are jointly trained to perform Nw different arbitrarily selected, complex-valued linear transformations between the input field i and the output field o′ using wavelength multiplexing. The optical fields at the input FOV, i1,i2,…,iNw, are encoded at a predetermined set of distinct wavelengths λ1,λ2,…,λNw, respectively, using a wavelength multiplexing (“MUX”) scheme. At the output FOV of the broadband diffractive network, wavelength demultiplexing (“DEMUX”) is performed to extract the diffractive output fields o1′,o2′,…,oNw′ at the corresponding wavelengths λ1,λ2,…,λNw, respectively, which represent the all-optical estimates of the target output fields o1,o2,…,oNw, corresponding to the target linear transformations (A1,A2,…,ANw). Through this diffractive architecture, Nw different arbitrarily selected complex-valued linear transformations can be all-optically performed at distinct wavelengths, running in parallel channels of the broadband diffractive processor.

    Figure 1.Schematic of massively parallel, wavelength-multiplexed diffractive optical computing. Optical layout of the wavelength-multiplexed diffractive neural network, where the diffractive layers are jointly trained to perform Nw different arbitrarily selected, complex-valued linear transformations between the input field i and the output field o using wavelength multiplexing. The optical fields at the input FOV, i1,i2,,iNw, are encoded at a predetermined set of distinct wavelengths λ1,λ2,,λNw, respectively, using a wavelength multiplexing (“MUX”) scheme. At the output FOV of the broadband diffractive network, wavelength demultiplexing (“DEMUX”) is performed to extract the diffractive output fields o1,o2,,oNw at the corresponding wavelengths λ1,λ2,,λNw, respectively, which represent the all-optical estimates of the target output fields o1,o2,,oNw, corresponding to the target linear transformations (A1,A2,,ANw). Through this diffractive architecture, Nw different arbitrarily selected complex-valued linear transformations can be all-optically performed at distinct wavelengths, running in parallel channels of the broadband diffractive processor.

    Let i and o be the complex-valued, vectorized versions of the 2D input and output broadband complex fields at the input and output FOVs of the diffractive network, respectively, as shown in Fig. 1. We denote iw and ow as the complex fields generated by sampling the optical fields at the wavelength λw(w{1,2,,Nw}) within the input and output FOVs, respectively, and then vectorizing the resulting 2D matrices in column-major order. According to this notation, iw and ow represent the input and output of the wth wavelength channel in our wavelength-multiplexed diffractive network, respectively. In the following analyses, without loss of generality, the number of pixels at the input and output FOVs is selected to be the same, i.e., Ni=No.

    To implement Nw target linear transformations, we randomly generated Nw complex-valued matrices A1,A2,,ANw, each composed of Ni×No entries, to serve as a group of unique arbitrary linear transformations to be all-optically implemented using a wavelength-multiplexed diffractive processor. All these matrices, A1,A2,,ANw, are generated using unique random seeds to ensure that they are different; we further confirmed the differences between these randomly generated matrices by calculating the cosine similarity values between any two combinations of the matrices in a given set (see e.g., Fig. S1 in the Supplementary Material). For each unique matrix Aw{A1,A2,,ANw}, we randomly generated a total of 70,000 complex-valued input field vectors {iw} and created the corresponding output field vectors {ow} by calculating ow=Awiw. We separated these input–output complex field pairs into three individual sets for training, validation, and testing, each containing 55,000, 5000, and 10,000 samples, respectively. By increasing the size of these training data sets to >100,000 input–output pairs of randomly generated complex fields, it is possible to further improve the transformation accuracy of the trained broadband diffractive networks; since this does not change the general conclusions of this work, it is left as future work. More details on the generation of the training and testing data can be found in Sec. 4.

    Based on the notations introduced above, the objective of training our wavelength-multiplexed diffractive processor is that, for any of its wavelength channels operating at λw(w{1,2,,Nw}), the diffractive output fields {ow} computed from any given inputs {iw} should provide a match to the output ground-truth (target) fields {ow}. If this can be achieved for any arbitrary choice of {iw}, this means that the all-optical transformations Aw performed by the trained broadband diffractive system at different wavelength channels constitute an accurate approximation to their ground-truth (target) transformation matrices Aw, where w{1,2,,Nw}.

    As the first step of our analysis, we selected the input/output field size to be Ni=No=8×8=64 and started to train broadband diffractive processors with Nw=2, 4, 8, 16, and 32 wavelength channels. Results and analysis of implementing more wavelength channels (e.g., Nw>100) through a single diffractive processor will be provided in later sections. For this task, we randomly generated a set of 32 different matrices with dimensions of 64×64, i.e., A1,A2,,A32, with their first eight visualized (as examples) in Fig. 2(a) with their amplitude and phase components. Figure S1a in the Supplementary Material also reports the cosine similarity values between these randomly generated 32 matrices, confirming that they are all very close to 0. For each Nw mentioned above, we also trained several broadband diffractive designs with different numbers of trainable diffractive neurons, i.e., N ∈ {3900; 8200; 16,900; 32,800; 64,800; 131,100; 265,000}, all using the same training data sets {(iw, ow)}, randomly generated based on the target transformations {Aw} (w{1,2,,Nw}) and the same number of training epochs.

    All-optical transformation performances of broadband diffractive networks using different numbers of wavelength channels. (a) As examples, we show the amplitude and phase of the first eight matrices in {A1,A2,…,A32} that were randomly generated, serving as the ground truth (target) for the diffractive all-optical transformations. See Fig. S1 in the Supplementary Material for the cosine similarity values calculated between any two combinations of these 32 target linear transformation matrices. (b) The mean values of the normalized MSE between the ground-truth transformation matrices (Aw) and the corresponding all-optical transforms (Aw′) across different wavelength channels are reported as a function of the number of diffractive neurons N. The results of the diffractive networks using different numbers of wavelength channels (Nw) are encoded with different colors, and the space between the simulation data points is linearly interpolated. Nw ∈ {1, 2, 4, 8, 16, and 32}, N ∈ {3.9k, 8.2k, 16.9k, 32.8k, 64.8k, 131.1k, 265.0k} and Ni=No=82. (c) Same as (b) but the cosine similarity values between the all-optical transforms and their ground truth are reported. (d) Same as (b) but the MSE values between the diffractive network output fields and the ground-truth output fields are reported.

    Figure 2.All-optical transformation performances of broadband diffractive networks using different numbers of wavelength channels. (a) As examples, we show the amplitude and phase of the first eight matrices in {A1,A2,,A32} that were randomly generated, serving as the ground truth (target) for the diffractive all-optical transformations. See Fig. S1 in the Supplementary Material for the cosine similarity values calculated between any two combinations of these 32 target linear transformation matrices. (b) The mean values of the normalized MSE between the ground-truth transformation matrices (Aw) and the corresponding all-optical transforms (Aw) across different wavelength channels are reported as a function of the number of diffractive neurons N. The results of the diffractive networks using different numbers of wavelength channels (Nw) are encoded with different colors, and the space between the simulation data points is linearly interpolated. Nw ∈ {1, 2, 4, 8, 16, and 32}, N ∈ {3.9k, 8.2k, 16.9k, 32.8k, 64.8k, 131.1k, 265.0k} and Ni=No=82. (c) Same as (b) but the cosine similarity values between the all-optical transforms and their ground truth are reported. (d) Same as (b) but the MSE values between the diffractive network output fields and the ground-truth output fields are reported.

    To benchmark the performance of these wavelength-multiplexed diffractive networks for each N, we also trained monochromatic diffractive networks without using any wavelength multiplexing as our baseline, which can approximate only one target linear transformation using a single wavelength (i.e., Nw=1). Here we simply select λm as the operating wavelength of this baseline monochrome diffractive network used for comparison.

    During the training of these diffractive networks, mean squared error (MSE) loss is calculated per wavelength channel to make the diffractive output fields come as close to the ground-truth (target) fields as possible. However, in the wavelength-multiplexed diffractive models, treating all these channels equally in the final loss function would result in the all-optical transformation accuracies being biased, since longer wavelengths present lower spatial resolution. To address this issue and equalize the all-optical transformation accuracies of all the wavelengths within the selected channel set, we devised a strategy by adaptively adjusting the weight coefficients applied to the loss terms of these channels during the training process (see Sec. 4 for details).

    After the deep-learning-based training of the broadband diffractive designs introduced above is completed, the resulting all-optical diffractive transformations of these models are summarized in Figs. 2(b)2(d). We quantified the generalization performance of these broadband diffractive networks on the blind testing data set for each transformation using three different metrics: (1) the normalized transformation MSE (MSETransformation), (2) the cosine similarity (CosSim) between the all-optical transforms and the target transforms, and (3) the MSE between the diffractive network output fields and their ground-truth output fields (MSEOutput).53,69 More details about the definitions of these performance metrics are provided in Sec. 4. For the diffractive designs with different numbers of wavelength channels (Nw=1, 2, 4, 8, 16, and 32), we report these performance metrics in Figs. 2(b)2(d) as a function of the number of trainable diffractive neurons (N). These performance metrics reported in Fig. 2 refer to the mean values calculated across all the wavelength channels, whereas the results of the individual wavelength channels are shown in Fig. 3.

    All-optical transformation performances of the individual wavelength channels in broadband diffractive network designs with N≈2NwNiNo and Ni=No=82. The output field errors (MSEOutput) for the all-optical linear transforms achieved by the wavelength-multiplexed diffractive network models with (a) 2-channel wavelength multiplexing (Nw=2), N≈4NiNo; (b) 4-channel wavelength multiplexing (Nw=4), N≈8NiNo; (c) 8-channel wavelength multiplexing (Nw=8), N≈16NiNo; (d) 16-channel wavelength multiplexing (Nw=16), N≈32NiNo; and (e) 32-channel wavelength multiplexing (Nw=32), N≈64NiNo. The standard deviations (error bars) of these metrics are calculated across the entire testing data set.

    Figure 3.All-optical transformation performances of the individual wavelength channels in broadband diffractive network designs with N2NwNiNo and Ni=No=82. The output field errors (MSEOutput) for the all-optical linear transforms achieved by the wavelength-multiplexed diffractive network models with (a) 2-channel wavelength multiplexing (Nw=2), N4NiNo; (b) 4-channel wavelength multiplexing (Nw=4), N8NiNo; (c) 8-channel wavelength multiplexing (Nw=8), N16NiNo; (d) 16-channel wavelength multiplexing (Nw=16), N32NiNo; and (e) 32-channel wavelength multiplexing (Nw=32), N64NiNo. The standard deviations (error bars) of these metrics are calculated across the entire testing data set.

    In Fig. 2(b), it can be seen that the transformation errors of all the trained diffractive models show a monotonic decrease as N increases, which is expected due to the increased degrees of freedom in the diffractive processor. Also the approximation errors of the regular diffractive networks without using wavelength multiplexing, i.e., Nw=1, approaches 0 as N approaches 2NiNo8200. This observation confirms the conclusion obtained in our previous work,69,70 i.e., a phase-only monochrome diffractive network requires at least 2NiNo diffractive neurons to approximate a target complex-valued linear transformation with negligible error. On the other hand, for the wavelength-multiplexed diffractive models with Nw different wavelength channels that are trained to approximate Nw unique linear transforms, we see in Fig. 2 that the approximation errors approach 0 as N approaches 2NwNiNo. This finding indicates that compared to a baseline monochrome diffractive model that can only perform a single transform, performing multiple distinct transforms using wavelength multiplexing within a single diffractive network requires its number of trainable neurons N to be increased by Nw-fold. This conclusion is further supported by the results of the other two performance metrics, CosSim and MSEOutput, as shown in Figs. 2(c) and 2(d): as N approaches 2NwNiNo, CosSim and MSEOutput of the wavelength-multiplexed diffractive models approach 1 and 0, respectively.

    To reveal the linear transformation performance of the individual wavelength channels in our wavelength-multiplexed diffractive processors, in Fig. 3, we show the channel-wise output field errors (MSEOutput) of the wavelength-multiplexed diffractive networks with Nw=2, 4, 8, 16, and 32 and N=2NwNiNo. Figure 3 indicates that the MSEOutput of these individual channels are very close to each other in all the designs with different Nw, demonstrating no significant performance bias toward any specific wavelength channel or target transform. For comparison, we also show in Fig. S2 in the Supplementary Material, the resulting MSEOutput of the diffractive model with Nw=8 and N=2NwNiNo=16NiNo when our channel balancing training strategy with adaptive weights was not used (see Sec. 4). There appears to be a large variation at the output field errors among the different wavelength channels if adaptive weights were not used during the training; in fact, the channels assigned to longer wavelengths tend to show much inferior transformation performance, which highlights the significance of using our balancing strategy during the training process. Stated differently, unless a channel balancing strategy is employed during the training phase, longer wavelengths suffer from relatively lower spatial resolution and face increased all-optical transformation errors compared to the shorter wavelength channels.

    To visually demonstrate the success of our broadband diffractive system in performing a group of linear transformations using wavelength multiplexing, in Fig. 4, we show examples of the ground-truth transformation matrices (i.e., Aw) and their all-optical counterparts (i.e., Aw) resulting from the diffractive designs with Nw=8 and N{2NwNiNo=16NiNo=64,800;4NwNiNo=32NiNo=131,100}. The amplitude and phase absolute errors between the two (Aw and Aw) are also reported in the same figure. Moreover, in Fig. 5 and Fig. S3 in the Supplementary Material, we present some exemplary complex-valued input–output optical fields from the same set of diffractive designs with N=4NwNiNo=131,100 and N=2NwNiNo=64,800, respectively. These results, summarized in Figs. 4 and 5 and Fig. S3 in the Supplementary Material, reveal that, when N2NwNiNo, the all-optical transformation matrices and the output complex fields of all the wavelength channels match their ground-truth targets very well with negligible error, which is also in line with our earlier observations in Fig. 2.

    All-optical transformation matrices estimated by two different wavelength-multiplexed broadband diffractive networks with Nw=8 and Ni=No=82. The first broadband diffractive network has N≈2NwNiNo=16NiNo=64,800 trainable diffractive neurons. The second broadband diffractive network has N≈4NwNiNo=32NiNo=131,100 trainable diffractive neurons. The absolute differences between these all-optical transformation matrices and the corresponding ground-truth (target) matrices are also shown in each case. N=131,100 diffractive design achieves a much smaller and negligible absolute error due to the increased degrees of freedom.

    Figure 4.All-optical transformation matrices estimated by two different wavelength-multiplexed broadband diffractive networks with Nw=8 and Ni=No=82. The first broadband diffractive network has N2NwNiNo=16NiNo=64,800 trainable diffractive neurons. The second broadband diffractive network has N4NwNiNo=32NiNo=131,100 trainable diffractive neurons. The absolute differences between these all-optical transformation matrices and the corresponding ground-truth (target) matrices are also shown in each case. N=131,100 diffractive design achieves a much smaller and negligible absolute error due to the increased degrees of freedom.

    Examples of the input/output complex fields for the ground-truth (target) transformations along with the all-optical output fields resulting from the 8-channel wavelength-multiplexed diffractive design using N≈4NwNiNo=32NiNo=131,100. Absolute errors between the ground-truth output fields and the all-optical diffractive network output fields are negligible. Note that |∠o−∠o′^|π indicates the wrapped phase difference between the ground-truth output field o and the normalized diffractive network output field o′^.

    Figure 5.Examples of the input/output complex fields for the ground-truth (target) transformations along with the all-optical output fields resulting from the 8-channel wavelength-multiplexed diffractive design using N4NwNiNo=32NiNo=131,100. Absolute errors between the ground-truth output fields and the all-optical diffractive network output fields are negligible. Note that |oo^|π indicates the wrapped phase difference between the ground-truth output field o and the normalized diffractive network output field o^.

    2.2 Limits of Nw: Scalability of Wavelength-Multiplexing in Diffractive Networks

    We have so far demonstrated that a single broadband diffractive network can be designed to simultaneously perform a group of Nw arbitrary complex-valued linear transformations, with Nw=2, 4, 8, 16, and 32 (Figs. 2 and 3). Next, we explore the feasibility of implementing a significantly larger number of wavelength channels in our system to better understand the limits of Nw. Due to our limited computational resources available, to simulate the behavior of larger Nw values, we selected Ni=No=5×5 and Nw{1,2,4,8,16,32,64,128,and184}. Accordingly, we generated a new set of 184 different arbitrarily selected complex-valued matrices with dimensions of 25×25, i.e., A1,A2,,A184, as the target linear transformations to be all-optically implemented. The cosine similarity values between these randomly generated matrices are reported in Fig. S1b in the Supplementary Material, confirming that they are all very close to 0. We also created training, validation, and testing data sets based on these new target transformation matrices following the same approach as in the previous section: for each transformation matrix, we randomly generated 55,000, 5000, and 10,000 field samples for the training, validation, and testing data sets, respectively. Then using the training field data sets, we trained broadband diffractive designs with Nw different wavelength channels, where the Nw target transforms were taken from the first Nw matrices in the randomly generated set {A1,A2,,A184}. For each Nw choice, we also trained diffractive models with different numbers of diffractive neurons, including N=1.5NwNiNo, N=2NwNiNo, and N=3NwNiNo.

    The all-optical transformation performance metrics of the resulting diffractive networks on the testing data sets are shown in Fig. 6 as a function of Nw. Figures 6(a)6(c) reveal that the all-optical transformations of the diffractive designs with different N show some increased error as Nw increases. For the diffractive models with N=3NwNiNo, the all-optical transformation errors (MSETransformation) at smaller Nw appear to be extremely small and do not exhibit the same performance degradation with increasing Nw; only after Nw>10 we see an error increase in the all-optical transformations for N=3NwNiNo. By comparing the linear transformation performance of the models with different N, Fig. 6 clearly reveals that adding more diffractive neurons to a broadband diffractive network design can greatly improve its transformation performance, which is especially important to operate at a large Nw.

    Exploration of the limits of the number of wavelength channels (Nw) that can be implemented in a broadband diffractive network. (a) The mean values of the normalized MSE between the ground-truth transformation matrices (Aw) and the all-optical transforms (Aw′) across different wavelength channels are reported as a function of Nw∈{1,2,4,8,16,32,64,128,184}. The results of the broadband diffractive networks using different numbers of diffractive neurons (N) are presented with different colors: N∈{1.5NwNiNo,2NwNiNo,3NwNiNo}. Dotted lines are fitted based on the data points whose diffractive networks share the same N. (b) Same as (a) but the cosine similarity values between the all-optical transforms and their ground truth are reported. (c) Same as (a) but the MSE values between the diffractive network output fields and the ground-truth output fields are reported. Ni=No=52.

    Figure 6.Exploration of the limits of the number of wavelength channels (Nw) that can be implemented in a broadband diffractive network. (a) The mean values of the normalized MSE between the ground-truth transformation matrices (Aw) and the all-optical transforms (Aw) across different wavelength channels are reported as a function of Nw{1,2,4,8,16,32,64,128,184}. The results of the broadband diffractive networks using different numbers of diffractive neurons (N) are presented with different colors: N{1.5NwNiNo,2NwNiNo,3NwNiNo}. Dotted lines are fitted based on the data points whose diffractive networks share the same N. (b) Same as (a) but the cosine similarity values between the all-optical transforms and their ground truth are reported. (c) Same as (a) but the MSE values between the diffractive network output fields and the ground-truth output fields are reported. Ni=No=52.

    By having a linear fit to the data points shown in Figs. 6(a) and 6(c), we can extrapolate to larger Nw values and predict an all-optical transformation error bound as a function of Nw. With these fitted (dashed) lines shown in Figs. 6(a) and 6(c), we get a coarse prediction of the linear transformation performance of a broadband diffractive model with a significantly larger number of wavelength channels Nw that is challenging to simulate due to our limited computer memory and speed. Interestingly, these three fitted lines (corresponding to diffractive designs with N=1.5NwNiNo, N=2NwNiNo, and N=3NwNiNo) intersect with each other at a point around Nw=10,000 with an MSETransformation of 0.2 and an MSEOutput of 0.03. This level of transformation error coincides with the error levels observed at the beginning of our training, implying that a broadband diffractive model with Nw=10,000, even after training, would only exhibit a performance level comparable to an untrained model. These analyses indicate that, for a broadband diffractive network trained with N3NwNiNo and a training data set of 55,000 optical field pairs, there is an empirical multiplexing upper bound of Nw=10,000.

    However, before reaching this Nw=10,000 ultimate limit discussed above, practically the desired level of approximation accuracy will set the actual limit of Nw. For example, based on visual inspection and the calculated peak signal-to-noise ratio (PSNR) values, one can empirically choose a blind testing error of MSEOutput103 as a threshold for the diffractive network’s all-optical approximation error; this threshold corresponds to a mean PSNR value of 20  dB, calculated for the diffractive network output fields against their ground truth (see Fig. S4 in the Supplementary Material). We marked this MSEOutput-based performance threshold in Fig. 6(c) using a black dashed line, which also corresponds to a transformation error (MSETransformation) of 9×103, which was also marked in Fig. 6(a) with a black dashed line. Based on these empirical performance thresholds set by MSEOutput103 and PSNR20  dB, we can infer that a broadband diffractive processor with N=3NwNiNo can accommodate up to Nw2000 wavelength channels, where 2000 different linear transformations can be performed through a single broadband diffractive processor within the performance bounds shown in Figs. 6(a) and 6(c) (see the purple dashed lines). The same analysis reveals a reduced upper bound of Nw600 for the diffractive network designs with N=2NwNiNo (see the green dashed lines).

    2.3 Impact of Material Dispersion and Losses on Wavelength-Multiplexed Diffractive Networks

    In the previous section, we showed that a broadband diffractive processor can be designed to implement >180 different target linear transforms simultaneously, and this number can be further extended to Nw2000 based on an all-optical approximation error threshold of MSEOutput103. In this section, we provide additional analyses on material-related factors that have an impact on the accuracy of wavelength-multiplexed computing through broadband diffractive networks. For example, the selection of materials with different dispersion properties (i.e., the real and imaginary parts of the refractive index as a function of the wavelength) will impact the light–matter interactions at different illumination wavelengths. To numerically explore the impact of material dispersion and related optical losses, we took the broadband diffractive network design shown in Fig. 6 with Nw=128 and N=3NwNiNo and retrained it using different materials. The first material we selected is a lossy polymer that is widely employed as a 3D printing material; this material was used to fabricate diffractive networks that operate at the terahertz part of the spectrum.52,66,67 The dispersion curves of this lossy material are shown in Fig. S5a in the Supplementary Material, which were also used in the design of the diffractive networks reported in the previous sections (with λm=0.8  mm). As a second material choice for comparison, we selected a lossless dielectric material, for which we took N-BK7 glass as an example and used its dispersion to simulate our wavelength-multiplexed diffractive processor design at the visible wavelengths with λm=530  nm; the dispersion curves of this material are reported in Fig. S5b in the Supplementary Material. As a third material choice for comparison, we considered a hypothetical scenario where the material of the diffractive layers had a flat dispersion at around λm=0.8  mm, with no absorption and a constant refractive index (1.72) across all the selected wavelength channels of interest; see the refractive index curve of this “dispersion-free” material in Fig. S5c in the Supplementary Material.

    After the training of the diffractive network models using these different materials selected for comparison, we summarized their all-optical linear transformation performance in Figs. 7(a)7(c) (see the purple bars). These results reveal that all three diffractive models with different material choices achieved negligible all-optical transformation errors, regardless of their dispersion characteristics. This confirms the feasibility of extending our wavelength-multiplexed diffractive processor designs to other spectral bands with vastly different material dispersion features.

    The impact of material dispersion and losses on the all-optical transformation performance of wavelength-multiplexed broadband diffractive networks. (a) The mean values of the normalized MSE between the ground-truth transformation matrices (Aw) and the all-optical transforms (Aw′) across different wavelength channels are reported as a function of the material of the diffractive layers. The results of the diffractive networks trained with and without diffraction efficiency penalty are presented in yellow and purple colors, respectively. Nw=128, N=3NwNiNo, and Ni=No=52. (b) Same as (a) but the cosine similarity values between the all-optical transforms and their ground truth are reported. (c) Same as (a) but the MSE values between the diffractive network output fields and the ground-truth fields are reported. (d) The mean diffraction efficiencies of the presented diffractive models across all the wavelength channels. (e) Diffraction efficiency of the individual wavelength channels for the broadband diffractive network model presented in (a)–(d) that uses the dielectric material without the diffraction efficiency-related penalty term in its loss function. (f) Same as (e), but the diffractive network was trained using a loss function with the diffraction efficiency-related penalty term.

    Figure 7.The impact of material dispersion and losses on the all-optical transformation performance of wavelength-multiplexed broadband diffractive networks. (a) The mean values of the normalized MSE between the ground-truth transformation matrices (Aw) and the all-optical transforms (Aw) across different wavelength channels are reported as a function of the material of the diffractive layers. The results of the diffractive networks trained with and without diffraction efficiency penalty are presented in yellow and purple colors, respectively. Nw=128, N=3NwNiNo, and Ni=No=52. (b) Same as (a) but the cosine similarity values between the all-optical transforms and their ground truth are reported. (c) Same as (a) but the MSE values between the diffractive network output fields and the ground-truth fields are reported. (d) The mean diffraction efficiencies of the presented diffractive models across all the wavelength channels. (e) Diffraction efficiency of the individual wavelength channels for the broadband diffractive network model presented in (a)–(d) that uses the dielectric material without the diffraction efficiency-related penalty term in its loss function. (f) Same as (e), but the diffractive network was trained using a loss function with the diffraction efficiency-related penalty term.

    In addition to the all-optical transformation accuracy, the output diffractive efficiency (η) of these diffractive network models is also practically important. As shown in Fig. 7(d), due to the absorption by the layers, the diffractive network model using the lossy polymer material presents a very poor output diffraction efficiency η compared to the other two diffractive models that used lossless materials. In addition to the absorption of light through the diffractive layers, a wavelength-multiplexed diffractive network also suffers from optical losses due to the propagating waves that leak out of the diffractive processor volume. This second source of optical loss within a diffractive network can be strongly mitigated through the incorporation of diffraction efficiency-related penalty terms52,66,67,69 into the training loss function (see Sec. 4 for details). The results of using such a diffraction-efficiency-related penalty term during training are presented in Figs. 7(a)7(d) (yellow bars), which indicate that the output diffraction efficiencies of the corresponding models were improved by >589 to 1479-fold compared to their counterparts that were trained without using such a penalty term [see Fig. 7(d)]. We also show in Figs. 7(e) and 7(f), the output diffraction efficiencies of the individual wavelength channels trained without and with the diffraction-efficiency penalty term, respectively. These results also revealed that the diffraction-efficiency-related penalty term used during training not only improved the overall output efficiency of the diffractive processor design but also helped to mitigate the imbalance of diffraction efficiencies among different wavelength channels [see Figs. 7(e) and 7(f)]. These improvements also come at an expense; as shown in Figs. 7(a)7(c), there is some degradation in the all-optical transformation performance of the diffractive networks that are trained with a diffraction-efficiency-related penalty term. However, this relative degradation in the all-optical transformation performance is still acceptable, since a cosine similarity value of >0.996 to 0.998 is maintained in each case [see Fig. 7(b), yellow bars].

    2.4 Impact of Limited Bit Depth on the Accuracy of Wavelength-Multiplexed Diffractive Networks

    The bit depth of a broadband diffractive network refers to the finite number of thickness levels that each diffractive neuron can have on top of a common base thickness of each diffractive layer. For example, in a broadband diffractive network with a bit depth of 8, its diffractive neurons will be trained to have at most 28=256 different thickness values that are distributed between a predetermined minimum thickness and a maximum thickness value. To mechanically support each diffractive layer, the minimum thickness is always positive, acting as the base thickness of each layer. To analyze the impact of this bit depth on the linear transformation performance and accuracy of our wavelength-multiplexed diffractive networks, we took the Nw=184 channel diffractive design reported in the previous sections (trained using a data format with 32-bit depth) and retrained it from scratch under different bit depths, including 4, 8, and 12. Based on the same test data set, the all-optical linear transformation performance metrics of the resulting diffractive networks are reported in Fig. 8 as a function of N. Figure 8 reveals that a 12-bit depth is practically identical to using a 32-bit depth in terms of the all-optical transformation accuracy that can be achieved for the Nw=184 target linear transformations. Furthermore, a bit depth of 8 can also be used for a broadband diffractive network design to maintain its all-optical transformation performance with a relatively small error increase, which can be compensated for with an increase in N, as illustrated in Fig. 8. These observations from Fig. 8 highlight (1) the importance of having a sufficient bit depth in the design and fabrication of a broadband diffractive processor and (2) the importance of N as a way to boost the all-optical transformation performance under a limited diffractive neuron bit depth.

    All-optical transformation performance of broadband diffractive network designs with Nw=184, reported as a function of N and the bit depth of the diffractive neurons. (a) The mean values of normalized MSE between the ground-truth transformation matrices (Aw) and the all-optical transforms (Aw′) across different wavelength channels are reported as a function of N. The results of the diffractive networks using different bit depths of the diffractive neurons, including 4, 8, 12, and 32, are encoded with different colors, and the space between the data points is linearly interpolated. N∈{0.5NwNiNo=56,000,NwNiNo=115.000,2NwNiNo=231,000,4NwNiNo=461,000}, and Ni=No=52. (b) Same as (a) but the cosine similarity values between the all-optical transforms and their ground truth are reported. (c) Same as (a) but the MSE values between the diffractive network output fields and the ground-truth output fields are reported.

    Figure 8.All-optical transformation performance of broadband diffractive network designs with Nw=184, reported as a function of N and the bit depth of the diffractive neurons. (a) The mean values of normalized MSE between the ground-truth transformation matrices (Aw) and the all-optical transforms (Aw) across different wavelength channels are reported as a function of N. The results of the diffractive networks using different bit depths of the diffractive neurons, including 4, 8, 12, and 32, are encoded with different colors, and the space between the data points is linearly interpolated. N{0.5NwNiNo=56,000,NwNiNo=115.000,2NwNiNo=231,000,4NwNiNo=461,000}, and Ni=No=52. (b) Same as (a) but the cosine similarity values between the all-optical transforms and their ground truth are reported. (c) Same as (a) but the MSE values between the diffractive network output fields and the ground-truth output fields are reported.

    2.5 Impact of Wavelength Precision or Jitter on the Accuracy of Wavelength-Multiplexed Diffractive Networks

    Another possible factor that may cause systematic errors in our framework is the wavelength precision or jitter. To analyze the wavelength encoding related errors, we used the four-channel wavelength-multiplexed diffractive network model with N2NwNiNo=8NiNo and Ni=No=82 that was presented in Fig. 3(b). We deliberately shifted the illumination wavelength used for each encoding channel away from the preselected wavelength used during the training (i.e., λ1=0.9125λm, λ2=0.9708λm, λ3=1.0292λm, and λ4=1.0875λm). The resulting linear transformation performance of the Nw=4 channels using different performance metrics is summarized in Figs. 9(a)9(c) as a function of the illumination wavelength. All of these results in Fig. 9 show that as the illumination wavelengths used for each encoding channel gradually deviate from their designed/assigned wavelengths (used during the training of the wavelength-multiplexed diffractive network), their all-optical transformation accuracy begins to degrade. To shed more light on this, we used the previous performance threshold based on MSEOutput103 as an empirical criterion to estimate the tolerable range of illumination wavelength errors, which revealed an acceptable bandwidth of 0.002λm for each one of the encoding wavelength channels. Stated differently, when a given illumination wavelength is within ±0.001λm of the corresponding preselected wavelength assigned for that spectral channel, the degradation of the linear transformation accuracy at the output of the wavelength-multiplexed diffractive network will satisfy MSEOutput103. In practical applications, this level of spectral precision can be routinely achieved by using high-performance wavelength scanning sources74,75 (e.g., swept-source lasers) or narrow passband thin-film filters.

    The impact of the encoding wavelength error on the all-optical linear transformation performance of a wavelength-multiplexed broadband diffractive network; Nw=4, N≈2NwNiNo=8NiNo, and Ni=No=82. (a) The normalized MSE values between the ground-truth transformation matrices (Aw) and the all-optical transforms (Aw′) for the four different wavelength channels are reported as a function of the wavelengths used during the testing. The results of the different wavelength channels are shown with different colors, and the space between the simulation data points is linearly interpolated. (b) Same as (a) but the cosine similarity values between the all-optical transforms and their ground truth are reported. (c) Same as (a) but the MSE values between the diffractive network output fields and the ground-truth output fields are reported. The shaded areas indicate the standard deviation values calculated based on all the samples in the testing data set.

    Figure 9.The impact of the encoding wavelength error on the all-optical linear transformation performance of a wavelength-multiplexed broadband diffractive network; Nw=4, N2NwNiNo=8NiNo, and Ni=No=82. (a) The normalized MSE values between the ground-truth transformation matrices (Aw) and the all-optical transforms (Aw) for the four different wavelength channels are reported as a function of the wavelengths used during the testing. The results of the different wavelength channels are shown with different colors, and the space between the simulation data points is linearly interpolated. (b) Same as (a) but the cosine similarity values between the all-optical transforms and their ground truth are reported. (c) Same as (a) but the MSE values between the diffractive network output fields and the ground-truth output fields are reported. The shaded areas indicate the standard deviation values calculated based on all the samples in the testing data set.

    2.6 Permutation-Based Encoding and Decoding Using Wavelength-Multiplexed Diffractive Networks

    So far, we have demonstrated the design of wavelength-multiplexed diffractive processors that can allow a massive number of unique complex-valued linear transformations to be computed, all in parallel, within a single diffractive optical network. To exemplify some of the potential applications of this broadband diffractive processor design, here we demonstrate the permutation matrix-based optical transforms, which have significance for telecommunications (e.g., channel routing and interconnects), information security, and data processing (see Fig. 10). Similar to the approaches introduced earlier, we randomly generated eight permutation matrices, P1,P2,,P8 [see Fig. 10(b)] and trained a wavelength-multiplexed diffractive network with Nw=8 and N=2NwNiNo=16NiNo=64,800; this architecture has the same configuration as the one shown in Fig. 3(c), and Fig. 4 (middle column), except it uses these new permutation matrices as the target transforms. After its training, in Fig. 10(a), we show examples of permutation-based encoding of input images using the trained broadband diffractive network. After being all-optically processed by our wavelength-multiplexed diffractive network design, all the input images (iw) are simultaneously permuted (encoded) according to the permutation matrices assigned to the corresponding wavelength channels, resulting in the output fields ow, which very well match their ground truth ow [see Fig. 10(a)]. Stated differently, the trained wavelength-multiplexed diffractive processor can successfully synthesize the correct output field ow=Pwiw for all the possible input fields iw, since it presents an all-optical approximation of Pw for w{1,2,,8}.

    An example of a wavelength-multiplexed diffractive network (Nw=8, N≈2NwNiNo=16NiNo=64,800) that all-optically performs eight different permutation (encoding) operations between its input and output FOVs, with each target permutation matrix assigned to a unique wavelength. (a) Input/output examples. Each one of the Nw=8 wavelength channels in the diffractive processor is assigned to a different permutation matrix Pw. The absolute differences between the diffractive network output fields and the ground-truth (target) permuted (encoded) output fields are also shown in the last column. (b) Arbitrarily generated permutation matrices P1,P2,…,P8 that serve as the ground truth (target) for the wavelength-multiplexed diffractive permutation transformations shown in (a).

    Figure 10.An example of a wavelength-multiplexed diffractive network (Nw=8, N2NwNiNo=16NiNo=64,800) that all-optically performs eight different permutation (encoding) operations between its input and output FOVs, with each target permutation matrix assigned to a unique wavelength. (a) Input/output examples. Each one of the Nw=8 wavelength channels in the diffractive processor is assigned to a different permutation matrix Pw. The absolute differences between the diffractive network output fields and the ground-truth (target) permuted (encoded) output fields are also shown in the last column. (b) Arbitrarily generated permutation matrices P1,P2,,P8 that serve as the ground truth (target) for the wavelength-multiplexed diffractive permutation transformations shown in (a).

    Similarly, we present in Fig. S6 in the Supplementary Material that the same wavelength-multiplexed permutation transformation network can be used to all-optically decode the encoded/permuted patterns. In this case, the input encoded fields are generated by transforming (permuting) the same input images using the inverse of the permutation matrices P1,P2,,P8. The results shown in Fig. S6 in the Supplementary Material indicate that the wavelength-multiplexed diffractive network can all-optically perform simultaneous decoding of all the input images, very well matching their ground truth.

    2.7 Experimental Validation of a Wavelength-Multiplexed Diffractive Network

    Next, we performed a proof-of-concept experimental validation of our diffractive network using wavelength-multiplexed permutation operations. With a frequency-tunable continuous-wave terahertz (THz) setup shown in Fig. 11(a) (see Sec. 4 for its implementation details), we tested a wavelength-multiplexed diffractive network design with Nw=2 and Ni=No=32, where the two wavelength channels were chosen as λ1=0.667  mm and λ2=0.698  mm. Each one of these two wavelength channels in this experimental design is assigned to a unique, arbitrarily generated target permutation matrix (P1 and P2, see Fig. S7 in the Supplementary Material), such that any spatially structured pattern at the input FOV can be all-optically permuted by the diffractive optical network to form different desired patterns at the output FOV, performing P1 and P2 under λ1 and λ2 illumination, respectively. For this, we used a diffractive network architecture with three diffractive layers, with each layer having 120×120 diffractive features, each with a lateral size of 0.4 mm (0.59λm). The axial spacing between any two of the adjacent layers (including the input/output planes) in this design was set as 20 mm (29.3λm). During the training process, a total of 55,000 randomly generated input–output field pairs corresponding to the target permutation matrices (P1 and P2) were used to update the thickness values of these diffractive layers. After the training converged, the resulting diffractive layers were fabricated using a 3D printer and mechanically assembled to form a physical wavelength-multiplexed diffractive optical permutation processor, as shown in Figs. 11(b)11(d).

    Experimental validation of a wavelength-multiplexed diffractive network with Nw=2 and Ni=No=32. (a) Photograph of the experimental setup, including the schematic of the THz setup. (b) The fabricated wavelength-multiplexed diffractive processor. (c) The learned thickness profiles of the diffractive layers. (d) Photographs of the 3D-printed diffractive layers. (e) Experimental results of the diffractive processor for the two wavelength channels λ1=0.667 mm and λ2=0.698 mm using the fabricated diffractive layers, which reveal a good agreement with their numerical counterparts and the ground truth. λm=(λ1+λ2)/2=0.6825 mm.

    Figure 11.Experimental validation of a wavelength-multiplexed diffractive network with Nw=2 and Ni=No=32. (a) Photograph of the experimental setup, including the schematic of the THz setup. (b) The fabricated wavelength-multiplexed diffractive processor. (c) The learned thickness profiles of the diffractive layers. (d) Photographs of the 3D-printed diffractive layers. (e) Experimental results of the diffractive processor for the two wavelength channels λ1=0.667  mm and λ2=0.698  mm using the fabricated diffractive layers, which reveal a good agreement with their numerical counterparts and the ground truth. λm=(λ1+λ2)/2=0.6825  mm.

    To experimentally test the performance of this 3D-fabricated wavelength-multiplexed diffractive network, different input patterns from the blind testing set (never used in training) were also 3D-printed and used as the input test objects. The experimental test results are reported in Fig. 11(e), revealing that the output patterns for all these input patterns show a good agreement with their numerically simulated counterparts and the ground-truth images. The success of these experimental results further confirms the feasibility of our wavelength-multiplexed diffractive optical transformation networks.

    3 Discussion

    We demonstrated wavelength-multiplexed diffractive network designs that can perform massively parallel universal linear transformations through a single diffractive processor. We also quantified the limits of Nw and the impact of material dispersion, bit depth, and wavelength precision/jitter on the all-optical transformation performance of broadband diffractive networks. In addition to these, other factors may limit the performance of broadband diffractive processors, including the lateral and axial misalignments of diffractive layers, surface reflections, and other imperfections introduced during the fabrication. To mitigate some of these practical issues, various approaches, such as high-precision lithography and antireflection coatings can be utilized in the fabrication process of a diffractive network. As demonstrated in our previous work,50,52,61 it is also possible to mitigate the performance degradation resulting from some of these experimental factors by incorporating them as random errors into the physical forward model used during the training process, which is referred to as “vaccination” of the diffractive network.

    The reported wavelength-multiplexed diffractive processor represents a milestone in expanding the parallelism of diffractive all-optical computing, simultaneously covering a large group of complex-valued linear transformations. Compared to our previous work,70 where a monochromatic diffractive optical network was integrated with polarization-sensitive elements to achieve multiplexing of four independent linear transformations, the multiplexing factor (Nw) of a wavelength-multiplexed diffractive network is significantly increased to more than 180, and can further reach Nw2000, revealing a major improvement in the all-optical processing throughput. Moreover, the physical architecture of this wavelength-multiplexed computing framework is also relatively simple, since it does not rely on any additional optical modulation elements, e.g., spectral filters; it solely utilizes the different phase modulation values of the same diffractive layers at different wavelengths of light, also being compatible with different materials with various dispersion properties (including flat dispersion, as illustrated in Fig. 7). One could perhaps argue that, equivalent to a wavelength-multiplexed diffractive network that uses N trainable diffractive features to compute Nw independent target linear transformations, we could utilize a set of Nw separately optimized monochromatic diffractive networks, each assigned to perform one of the Nw target linear transforms using N/Nw diffractive features. However, such a multipath design involving Nw different monochromatic diffractive networks (one for each target transformation) would require bulky optical routing for fan-in/fan-out, which would introduce additional insertion losses, noise, and misalignment errors into the system, thus hurting the energy efficiency, performance, and compactness of the optical processor. Considering the fact that we covered Nw>180 in this work, such an approach of using Nw separate monochromatic diffractive networks is not a feasible strategy that can compete with a wavelength-multiplexed design. Furthermore, if additional multiplexing schemes other than the wavelength multiplexing reported here were to be used, such as temporal multiplexing, switching between different diffractive networks, they would also require the use of additional optoelectronic control elements, further increasing the hardware complexity of the system, which would not be feasible for a large Nw.

    It is worth further emphasizing that even if multiple separately optimized monochromatic diffractive networks could be trained to individually perform different target linear transforms at different wavelengths, it is not possible to directly combine the converged/optimized layers of these diffractive networks to match the broadband operation of the wavelength-multiplexed diffractive network presented here. Since these monochromatic networks are individually trained using only a single illumination wavelength, the optimized modulation of each wavelength under broadband illumination would produce destructive patterns to other wavelengths, and their transformation accuracies would be collectively hampered. This, once again, highlights the significance of our wavelength multiplexing scheme: a wavelength-multiplexed diffractive optical network can be realized through the engineering of the surface profiles of dielectric diffractive layers with arbitrary dispersion properties, whereas these profiles should be designed by simultaneously taking into account all the Nw wavelength channels, with phase modulation values that are mutually coupled to each other.

    To the best of our knowledge, there has not been a demonstration of a design for the all-optical implementation of a complex-valued, arbitrary linear transformation using metasurfaces or metamaterials. In principle, having different diffractive metaunits placed on the same substrate to perform different transformations at different wavelengths could be attempted as an alternative approach to what we presented in this paper. However, such an approach would face severe challenges since (1) at large spectral multiplexing factors (Nw1) shown in this work, the lateral period for each spectral metadesign will substantially increase per substrate, lowering the accuracy of each transformation; (2) at each illumination wavelength, the other metaunits designed for (assigned to) the other spectral components, will also introduce “cross-talk fields” that will severely contaminate the desired responses at each wavelength and cannot be neglected since Nw1; (3) the phase responses of the spectrally encoded metaunits, in general, cover a small angular range, leading to low numerical aperture (NA) solutions compared to the diffractive solutions reported in this work, where NA = 1 (in air); the low NA of metaunits fundamentally limits the space-bandwidth product of each transformation channel; and (4) if multiple layers of metasurfaces are used in a given design, all of these aforementioned sources of errors associated with spectral metaunits will accumulate and get amplified through the subsequent field propagation in a cascaded manner, causing severe degradations to the final output fields, compared to the desired fields. Perhaps due to these significant challenges outlined here, metasurface or metamaterial-based diffractive designs have not yet been reported as a solution to perform universal linear transformations—neither an arbitrary complex-valued linear transformation nor a group of linear transformations through some form of multiplexing.

    As we have shown in Sec. 2, a diffractive neuron number of N2NwNiNo is required for a wavelength-multiplexed diffractive network to successfully implement Nw different complex-valued linear transforms. Compared to the previous complex-valued monochrome (Nw=1) diffractive designs,69 the additional factor of 2 in N results from the fact that the only trainable degrees of freedom for a broadband wavelength-multiplexed diffractive design are the thickness values of the diffractive neurons, whereas the Nw different target transformations are all complex-valued. Stated differently, the resulting modulation values of different wavelengths through each diffractive neuron are mutually coupled through the dispersion of the material and depend on the neuron thickness.

    Finally, we would like to emphasize that this presented framework can operate at various parts of the electromagnetic spectrum, including the visible band, so that the set of wavelength channels used to perform the transformation multiplexing can match with the light source and/or the spectral signals emitted from or reflected by the objects. In practice, this massively parallel linear transformation capability can be utilized in an optical processor to perform distinct statistical inference tasks using different wavelength channels, bringing in additional throughput and parallelism to optical computing. This wavelength-multiplexed diffractive network design might also inspire the development of new multicolor and hyperspectral machine-vision systems, where all-optical information processing is performed simultaneously based on both the spatial and spectral features of the input objects. The resulting hyperspectral or multispectral diffractive output fields can enable new optical visual processing systems that can identify or encode input objects with unique spectral properties. As another possibility, novel multispectral display systems can be created using these wavelength-multiplexed diffractive output fields to reconstruct spectroscopic images or light fields from compressed or distorted input spectral signals.62 All these possibilities enabled by wavelength-multiplexed diffractive optical processors can inspire numerous applications in biomedical imaging, remote sensing, analytical chemistry, material science, and many other fields.

    4 Appendix: Materials and Methods

    4.1 Forward Model of the Broadband Diffractive Neural Network

    A wavelength-multiplexed diffractive network consists of successive diffractive layers that collectively modulate the incoming broadband optical fields. In the forward model of our numerical simulations, the diffractive layers are assumed to be thin optical modulation elements, where the mth feature on the kth layer at a spatial location (xm,ym,zm) represents a wavelength-dependent complex-valued transmission coefficient tk given by tk(xm,ym,zm,λ)=ak(xm,ym,zm,λ)exp(jϕk(xm,ym,zm,λ)),where a and ϕ denote the amplitude and phase coefficients, respectively. The diffractive layers are connected to each other by free-space propagation, which is modeled through the Rayleigh–Sommerfeld diffraction equation:22,46fmk(x,y,z)=zzir2(12πr+1jλ)exp(j2πrλ),where fmk(x,y,z,λ) is the complex-valued field on the mth pixel of the kth layer at (x,y,z) at a wavelength of λ, which can be viewed as a secondary wave generated from the source at (xm,ym,zm); and r=(xxm)2+(yym)2+(zzm)2 and j=1. For the kth layer (k1, assuming the input plane is the 0th layer), the modulated optical field Ek at location (xm,ym,zm) is given by Ek(xm,ym,zm,λ)=tk(xm,ym,zm)·nSEk1(xn,yn,zn,λ)·fmk1(xm,ym,zm),where S denotes all the diffractive neurons on the previous diffractive layer.

    For the diffractive models used for numerical analyses, we chose λm/2 as the smallest sampling period for the simulation of the complex optical fields and also used λm/2 as the smallest feature size of the diffractive layers. In the input and output FOVs, a 4×4 binning is performed on the simulated optical fields, resulting in a pixel size of 2λm for the input/output fields. The axial distance (d) between the successive layers (including the diffractive layers and the input/output planes) in our diffractive processor designs is empirically selected as d=0.5Dlayer, where Dlayer represents the lateral size of each diffractive layer.

    The diffractive thickness value h of each neuron of a diffractive layer is composed of two parts hlearnable and hbase as follows: h=hlearnable+hbase,where hlearnable denotes the learnable thickness parameters of each diffractive feature and is confined between hmin=0 and hmax=1.25λm for all the diffractive models used for numerical analyses in this paper. When a modulation with q-bit depth is applied to the diffractive model, hlearnable will be rounded to the nearest number that corresponds to one of 2q different equally spaced levels within the range of [0, hlearnable]. The additional base thickness hbase is a constant, which is chosen as 0.25λm to serve as substrate support for the diffractive neurons. To achieve the constraint applied to hlearnable, an associated latent trainable variable hv was defined using the following analytical form: hlearnable=hmax2·(sin(hv)+1).

    Note that before the training starts hv values of all the diffractive neurons were randomly initialized with a normal distribution (a mean value of 0 and a standard deviation of 1). Based on these definitions, the amplitude and phase components of the complex transmittance of mth, i.e., ak(xm,ym,zm,λ) and ϕk(xm,ym,zm,λ), can be written as a function of the thickness of each neuron hm and the incident wavelength λ: ak(xm,ym,zm,λ)=exp(2πκ(λ)hmkλ),ϕk(xm,ym,zm,λ)=(n(λ)nair)2πhmkλ,where the wavelength-dependent parameters n(λ) and κ(λ) are the refractive index and the extinction coefficient of the diffractive layer material corresponding to the real and imaginary parts of the complex-valued refractive index n˜(λ), i.e., n˜(λ)=n(λ)+jκ(λ).66 In the numerical analyses of this work, we considered three different materials to form the diffractive layers of a broadband diffractive processor, including a lossy polymer, a lossless dielectric, and a hypothetical lossless dispersion-free material. Among these, the lossy polymer material represents a UV-curable 3D printing material (VeroBlackPlus RGD875, Stratasys Ltd.), which was used in our previous work52,66,67 for 3D printing of diffractive networks. The lossless dielectric material, used for the diffractive models operating at the visible band, represents N-BK7 glass (Schott), ignoring the negligible absorption through thin layers. The dispersion-free material, on the other hand, assumed a lossless material with its refractive index n(λ) having a flat distribution with respect to the wavelength range of interest, i.e., n(λ)1.72. The final n(λ) and κ(λ) curves of different materials that were used for training the diffractive models reported in this paper are shown in Fig. S5 in the Supplementary Material.

    4.2 Preparation of the Linear Transformation Data Sets

    In this paper, the input and output FOVs of the diffractive networks are assumed to have the same size, which is set as 8×8, 5×5, or 3×3  pixels based on the assigned linear transformation tasks, i.e., iw,owC8×8, C5×5, or C3×3 (w{1,2,,Nw}). Accordingly, the size of the target complex-valued transformation matrices Aw is equal to 64×64, 25×25, or 9×9, respectively, i.e., AwC64×64 (w{1,2,,32}), AwC25×25(w{1,2,,184}), or AwC9×9 (w{1,2}). For arbitrary linear transformations, the amplitude and phase components of all these target matrices Aw were generated with a uniform (U) distribution of U[0,1] and U[0,2π], respectively, using the pseudorandom number generation function random.uniform() built-in NumPy. For the arbitrarily selected permutation transformations, all the target matrices Aw (also denoted as Pw) were generated by permuting an identity matrix of the same size as Pw using the pseudorandom matrix permutation function random.permutation() built-in NumPy. Different random seeds were used to generate these transformation matrices to ensure they were unique. For training a broadband diffractive network with Nw wavelength channels, the amplitude and phase components of the input fields iw (w{1,2,,Nw}) were randomly generated with a uniform (U) distribution of U[0,1] and U[0,2π], respectively. The ground-truth (target) fields ow (w{1,2,,Nw}) were generated by calculating ow=Awiw. For each Aw (w{1,2,,Nw}), we generated a total of 70,000 input/output complex optical fields to form a data set, which was then divided into three parts: training, validation, and testing, each containing 55,000, 5000, and 10,000 complex-valued optical field pairs, respectively.

    4.3 Training Loss Function

    For each wavelength channel, the normalized MSE loss function is defined as LMSE,w=E[1Non=1No|ow^[n]ow^[n]|2]=E[1Non=1No|σwow[n]σwow[n]|2],where E[·] denotes the average across the current batch, w stands for the wth wavelength channel that is being accessed, and [n] indicates the nth element of the vector. σw and σw are the coefficients used to normalize the energy of the ground-truth (target) field ow and the diffractive network output field ow, respectively, which are given by σw=1n=1No|ow[n]|2,σw=n=1Noσwow[n]ow*[n]n=1No|ow[n]|2.

    During the training of each broadband diffractive network, all the wavelength channels are simultaneously simulated, and the training data are fed into the channels at the same time. The wavelength-multiplexed diffractive network is trained based on the loss averaged across different wavelength channels. The total loss function L that we used can be written as L=1Nww=1NwαwLMSE,w,where αw is the adaptive spectral weight coefficient applied to the loss for the wth wavelength channel, which was used to balance the performance achieved by different wavelength channels during the optimization process. The initial values of αw for all the wavelength channels are set as 1. After the optimization begins, αw is adaptively updated after each training step using the following equation: αwmax(0.1×(LMSE,wLMSE,wref)+αw,0),where LMSE,wref represents the MSE loss of the wavelength channel that is chosen to be a reference to measure the difference in the loss of the other channels. This also means that αw for the wavelength channel selected as the reference remains unchanged at 1. For the trained broadband diffractive models presented in this paper, we chose the middle channel as the reference wavelength channel, i.e., wref=Nw/2. According to this approach, for a wavelength channel w that is not a reference channel, when the loss of the channel is small compared to that of the reference channel, αw will automatically decrease to reduce the weight of the corresponding channel. Conversely, when the loss of a specific wavelength channel is large compared to that of the reference channel, αw will automatically grow to increase the weight of the channel and thus enhance the subsequent penalty on the corresponding channel performance.

    In order to increase the output diffraction efficiencies of the diffractive networks, we incorporated an additional efficiency penalty term to the loss function of Eq. (11): L=1Nww=1Nw(αwLMSE,w+βLeff,w),where Leff,w represents the diffraction efficiency penalty loss applied to the wth wavelength channel, and β represents its weight, empirically set as 104. Leff,w is defined as Leff,w={ηthηw,if  ηthηw0,if  ηth<ηw,where ηw represents the mean output diffraction efficiency for the wth wavelength channel of the wavelength-multiplexed diffractive network, and ηth refers to a predetermined penalization threshold, which was taken as 3×105 (for diffractive models using the lossy polymer materials) or 3×104 (for the other diffractive models using lossless dielectric or dispersion-free materials). ηw is defined as ηw=E[n=1No|ow[n]|2n=1Ni|iw[n]|2].

    4.4 Performance Metrics Used for the Quantification of the All-Optical Transformation Errors

    To quantitatively evaluate the transformation results of the wavelength-multiplexed diffractive networks, four different performance metrics were calculated per wavelength channel of the diffractive designs using the blind testing data set: (1) the normalized transformation MSE (MSETransformation), (2) the cosine similarity (CosSim) between the all-optical transforms and the target transforms, (3) the normalized MSE between the diffractive network output fields and their ground truth (MSEOutput), and (4) the output diffraction efficiency [Eq. (15)]. The transformation error for the wth wavelength channel of the wavelength-multiplexed diffractive network MSETransformation,w is defined as MSETransformation,w=1NiNon=1NiNo|aw[n]mwaw[n]|2=1NiNon=1NiNo|aw[n]aw^[n]|2,where aw is the vectorized version of the ground-truth (target) transformation matrix assigned to the wth wavelength channel Aw, i.e., aw=vec(Aw). aw is the vectorized version of Aw, which is the all-optical transformation matrix performed by the trained diffractive network. mw is a scalar coefficient used to eliminate the effect of diffraction efficiency-related scaling mismatch between Aw and Aw, i.e., mw=n=1NiNoaw[n]aw*[n]n=1NiNo|aw[n]|2.

    The cosine similarity between the all-optical diffractive transform and its target (ground truth) for the wth wavelength channel CosSimw is defined as CosSimw=|awHa^w|n=1NiNo|aw[n]|2n=1NiNo|aw^[n]|2.

    The normalized MSE between the diffractive network outputs and their ground truth for the wth wavelength channel MSEOutput,w is defined using the same formula as in Eq. (8), except that E[·] is calculated across the entire testing set.

    4.5 Training-Related Details

    All the diffractive optical networks used in this work were trained using PyTorch (v1.11.0, Meta Platforms Inc.). We selected AdamW optimizer76,77 for training all the models, and its parameters were taken as the default values and kept identical in each model. The batch size was set as 8. The learning rate, starting from an initial value of 0.001, was set to decay at a rate of 0.5 every 10 epochs, respectively. The training of the diffractive network models was performed with 50 epochs. The best models were selected based on the MSE loss calculated on the validation data set. For the training of our diffractive models, we used a workstation with a GeForce RTX 3090 graphical processing unit (Nvidia Inc.) and Intel® Core™ i9-12900F central processing unit (Intel Inc.) and 64 GB of RAM, running Windows 11 operating system (Microsoft Inc.). The typical time required for training a wavelength-multiplexed diffractive network model with, e.g., Nw=128 and N=1.5NwNiNo is 50  h.

    4.6 Experimental Terahertz Setup

    The diffractive layers used in our experiments were fabricated using a 3D printer (PR110, CADworks3D). The input test objects and holders were also 3D-printed (Objet30 Pro, Stratasys). After the printing process, the input objects were coated with aluminum foil to define the light-blocking areas, leaving openings at specific positions to define the transmitted pixels of the input patterns. The designed holder was used to assemble the diffractive layers and objects to mechanically maintain their relative spatial positions in line with our numerical design.

    To test our fabricated wavelength-multiplexed diffractive network design, we adopted a THz continuous-wave scanning system, whose schematic is presented in Fig. 11(a). A WR2.2 modular amplifier/multiplier chain (AMC) followed by a compatible diagonal horn antenna (Virginia Diode Inc.) is used as the THz source. Each time, a 10-dBm RF input signal was set at 11.944 or 12.500 GHz (fRF1) at the input of AMC and multiplied 36 times to generate output radiation at 450 or 430 GHz, respectively, which corresponds to the illumination wavelengths λ1=0.667  mm and λ2=0.698  mm used for the two wavelength channels. A 1-kHz square wave was also generated to modulate the AMC output for lock-in detection. By placing the wavelength-multiplexed diffractive network 600 mm away from the exit aperture of the THz source, an approximately uniform plane wave was created, impinging on the input FOV of the diffractive network. The intensity distribution within the output FOV of the diffractive network was scanned at a step size of 2 mm by a single-pixel mixer/AMC (Virginia Diode Inc.) detector, which was mounted on an XY positioning stage formed by combining two linearly motorized stages (Thorlabs NRT100). For illumination at λ1=0.667  mm or λ2=0.698  mm, a 10-dBm sinusoidal signal was also generated at 11.917 or 12.472 GHz (fRF2), respectively, as a local oscillator and sent to the detector to downconvert the output signal to 1 GHz. After being amplified by a low-noise amplifier (Mini-Circuits ZRL-1150-LN+) with a gain of 80 dBm, the downconverted signal was filtered by a 1-GHz (±10  MHz) bandpass filter (KL Electronics 3C40-1000/T10-O/O) and attenuated by a tunable attenuator (HP 8495B) for linear calibration. This final signal was then measured by a low-noise power detector (Mini-Circuits ZX47-60), whose output voltage was read by a lock-in amplifier (Stanford Research SR830) using the 1-kHz square wave as the reference signal and calibrated to a linear scale. In our postprocessing, cropping and pixel binning were applied to each measurement of the intensity field to match the pixel size and position of the output FOV used in the design phase, resulting in the output measurement images shown in Fig. 11(e).

    Jingxi Li received his BS degree in optoelectronic information science and engineering from Zhejiang University, Hangzhou, Zhejiang, China, in 2018. Currently, he is working toward his PhD in the Electrical and Computer Engineering Department, University of California, Los Angeles, California, United States. His work focuses on optical computing and information processing using diffractive networks and computational optical imaging for biomedical applications.

    Tianyi Gan received his BS degree in physics from Peking University, Beijing, China, in 2021. He is currently a PhD student in the Electrical and Computer Engineering Department at the University of California, Los Angeles. His research interests are terahertz source and imaging.

    Bijie Bai received her BS degree in measurement, control technology, and instrumentation from Tsinghua University, Beijing, China, in 2018. She is currently working toward her PhD in the Electrical and Computer Engineering Department, University of California, Los Angeles, CA, USA. Her research focuses on computational imaging for biomedical applications and machine learning and optics.

    Yi Luo received his BS degree in measurement, control technology, and instrumentation from Tsinghua University, Beijing, China, in 2016. He is currently working toward his PhD in the Bioengineering Department, University of California, Los Angeles, CA, USA. His work focuses on the development of computational imaging and sensing platforms.

    Mona Jarrahi is a professor and a Northrop Grumman Endowed chair in the Electrical and Computer Engineering Department at the University of California Los Angeles and the director of Terahertz Electronics Laboratory. She has made significant contributions to the development of ultrafast electronic and optoelectronic devices and integrated systems for terahertz, infrared, and millimeter-wave sensing, imaging, computing, and communication systems by utilizing innovative materials, nanostructures, and quantum structures as well as innovative plasmonic and optical concepts.

    Aydogan Ozcan is the Chancellor’s professor and the Volgenau chair for engineering innovation at UCLA and an HHMI professor at the Howard Hughes Medical Institute. He is also the associate director of the California NanoSystems Institute. He is elected a fellow of the National Academy of Inventors and holds >60 issued/granted patents in microscopy, holography, computational imaging, sensing, mobile diagnostics, nonlinear optics, and fiber-optics. He is also the author of 1 book and the co-author of >950 peer-reviewed publications in leading scientific journals/conferences. He is elected a fellow of Optica, AAAS, SPIE, IEEE, AIMBE, RSC, APS and the Guggenheim Foundation and is a lifetime fellow member of Optica, NAI, AAAS, and SPIE.

    References

    [1] D. R. Solli, B. Jalali. Analog optical computing. Nat. Photonics, 9, 704-706(2015).

    [2] G. Wetzstein et al. Inference in artificial intelligence with deep optics and photonics. Nature, 588, 39-47(2020).

    [3] B. J. Shastri et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photonics, 15, 102-114(2021).

    [4] H. Zhou et al. Photonic matrix multiplication lights up photonic accelerator and beyond. Light Sci. Appl., 11, 30(2022).

    [5] D. Mengu et al. At the intersection of optics and deep learning: statistical inference, computing, and inverse design. Adv. Opt. Photonics, 14, 209-290(2022).

    [6] L. Cutrona et al. Optical data processing and filtering systems. IRE Trans. Inf. Theory, 6, 386-400(1960).

    [7] J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. U. S. A., 79, 2554-2558(1982).

    [8] D. Psaltis, N. Farhat. Optical information processing based on an associative-memory model of neural nets with thresholding and feedback. Opt. Lett., 10, 98-100(1985).

    [9] N. H. Farhat et al. Optical implementation of the Hopfield model. Appl. Opt., 24, 1469-1475(1985).

    [10] K. Wagner, D. Psaltis. Multilayer optical learning networks. Appl. Opt., 26, 5061-5076(1987).

    [11] D. Psaltis et al. Holography in artificial neural networks. Nature, 343, 325-330(1990).

    [12] K. Vandoorne et al. Parallel reservoir computing using optical amplifiers. IEEE Trans. Neural Networks, 22, 1469-1481(2011).

    [13] A. Silva et al. Performing mathematical operations with metamaterials. Science, 343, 160-163(2014).

    [14] K. Vandoorne et al. Experimental demonstration of reservoir computing on a silicon photonics chip. Nat. Commun., 5, 3541(2014).

    [15] J. Carolan et al. Universal linear optics. Science, 349, 711-716(2015).

    [16] J. Chang et al. Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification. Sci. Rep., 8, 12324(2018).

    [17] N. M. Estakhri, B. Edwards, N. Engheta. Inverse-designed metastructures that solve equations. Science, 363, 1333-1338(2019).

    [18] J. Dong et al. Optical reservoir computing using multiple light scattering for chaotic systems prediction. IEEE J. Sel. Top. Quantum Electron., 26, 7701012(2020).

    [19] U. Teğin et al. Scalable optical learning operator. Nat. Comput. Sci., 1, 542-549(2021).

    [20] Y. Shen et al. Deep learning with coherent nanophotonic circuits. Nat. Photonics, 11, 441-446(2017).

    [21] A. N. Tait et al. Neuromorphic photonic networks using silicon photonic weight banks. Sci. Rep., 7, 7430(2017).

    [22] X. Lin et al. All-optical machine learning using diffractive deep neural networks. Science, 361, 1004-1008(2018).

    [23] J. Bueno et al. Reinforcement learning in a large-scale photonic recurrent neural network. Optica, 5, 756-760(2018).

    [24] Y. Zuo et al. All-optical neural network with nonlinear activation functions. Optica, 6, 1132-1137(2019).

    [25] T. W. Hughes et al. Wave physics as an analog recurrent neural network. Sci. Adv., 5, eaay6946(2019).

    [26] J. Feldmann et al. All-optical spiking neurosynaptic networks with self-learning capabilities. Nature, 569, 208-214(2019).

    [27] M. Miscuglio, V. J. Sorger. Photonic tensor cores for machine learning. Appl. Phys. Rev., 7, 031404(2020).

    [28] H. Zhang et al. An optical neural chip for implementing complex-valued neural network. Nat. Commun., 12, 457(2021).

    [29] J. Feldmann et al. Parallel convolutional processing using an integrated photonic tensor core. Nature, 589, 52-58(2021).

    [30] X. Xu et al. 11 TOPS photonic convolutional accelerator for optical neural networks. Nature, 589, 44-51(2021).

    [31] L. G. Wright et al. Deep physical neural networks trained with backpropagation. Nature, 601, 549-555(2022).

    [32] F. Ashtiani, A. J. Geers, F. Aflatouni. An on-chip photonic deep neural network for image classification. Nature, 606, 501-506(2022).

    [33] D. Liu et al. Training deep neural networks for the inverse design of nanophotonic structures. ACS Photonics, 5, 1365-1369(2018).

    [34] W. Ma, F. Cheng, Y. Liu. Deep-learning-enabled on-demand design of chiral metamaterials. ACS Nano, 12, 6326-6334(2018).

    [35] J. Peurifoy et al. Nanophotonic particle simulation and inverse design using artificial neural networks. Sci. Adv., 4, eaar4206(2018).

    [36] I. Malkiel et al. “Plasmonic nanostructure design and characterization via deep learning. Light Sci. Appl., 7, 60(2018).

    [37] Z. Liu et al. Generative model for the inverse design of metasurfaces. Nano Lett., 18, 6570-6576(2018).

    [38] S. So, J. Rho. Designing nanophotonic structures using conditional deep convolutional generative adversarial networks. Nanophotonics, 8, 1255-1261(2019).

    [39] W. Ma et al. Probabilistic representation and inverse design of metamaterials based on a deep generative model with semi-supervised learning strategy. Adv. Mater., 31, 1901111(2019).

    [40] S. An et al. A deep learning approach for objective-driven all-dielectric metasurface design. ACS Photonics, 6, 3196-3207(2019).

    [41] J. Jiang et al. Free-form diffractive metagrating design based on generative adversarial networks. ACS Nano, 13, 8872-8878(2019).

    [42] C. Qian et al. Deep-learning-enabled self-adaptive microwave cloak without human intervention. Nat. Photonics, 14, 383-390(2020).

    [43] Z. Liu et al. Compounding meta-atoms into metamolecules with hybrid artificial intelligence techniques. Adv. Mater., 32, 1904790(2020).

    [44] H. Ren et al. Three-dimensional vectorial holography based on machine learning inverse design. Sci. Adv., 6, eaaz4261(2020).

    [45] C. Zuo, Q. Chen. Exploiting optical degrees of freedom for information multiplexing in diffractive neural networks. Light Sci. Appl., 11, 208(2022).

    [46] D. Mengu et al. Analysis of diffractive optical neural networks and their integration with electronic neural networks. IEEE J. Sel. Top. Quantum Electron., 26, 3700114(2020).

    [47] J. Li et al. Class-specific differential detection in diffractive optical neural networks improves inference accuracy. Adv. Photonics, 1, 046001(2019).

    [48] T. Yan et al. Fourier-space diffractive deep neural network. Phys. Rev. Lett., 123, 023901(2019).

    [49] D. Mengu, Y. Rivenson, A. Ozcan. Scale-, shift-, and rotation-invariant diffractive optical networks. ACS Photonics, 8, 324-334(2020).

    [50] D. Mengu et al. Misalignment resilient diffractive optical networks. Nanophotonics, 9, 4207-4219(2020).

    [51] M. S. S. Rahman et al. Ensemble learning of diffractive optical networks. Light Sci. Appl., 10, 14(2021).

    [52] J. Li et al. Spectrally encoded single-pixel machine vision using diffractive networks. Sci. Adv., 7, eabd7690(2021).

    [53] O. Kulce et al. All-optical information-processing capacity of diffractive surfaces. Light Sci. Appl., 10, 25(2021).

    [54] T. Zhou et al. Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit. Nat. Photonics, 15, 367-373(2021).

    [55] H. Chen et al. Diffractive deep neural networks at visible wavelengths. Engineering, 7, 1483-1491(2021).

    [56] C. Liu et al. A programmable diffractive deep neural network based on a digital-coding metasurface array. Nat. Electron., 5, 113-122(2022).

    [57] D. Mengu et al. Classification and reconstruction of spatially overlapping phase images using diffractive optical networks. Sci. Rep., 12, 8446(2022).

    [58] Y. Luo et al. Computational imaging without a computer: seeing through random diffusers at the speed of light. eLight, 2, 4(2022).

    [59] D. Mengu et al. Diffractive interconnects: all-optical permutation operation using diffractive networks. Nanophotonics(2022).

    [60] D. Mengu, A. Ozcan. All-optical phase recovery: diffractive computing for quantitative phase imaging. Adv. Opt. Mater., 10, 2200281(2022).

    [61] B. Bai et al. To image, or not to image: class-specific diffractive cameras with all-optical erasure of undesired objects. eLight, 2, 14(2022).

    [62] Ç. Işıl et al. Super-resolution image display using diffractive decoders. Sci. Adv., 8, eadd3433(2022).

    [63] C. Qian et al. Performing optical logic operations by a diffractive neural network. Light Sci. Appl., 9, 59(2020).

    [64] P. Wang et al. Orbital angular momentum mode logical operation using optical diffractive neural network. Photonics Res., 9, 2116-2124(2021).

    [65] Y. Luo, D. Mengu, A. Ozcan. Cascadable all-optical NAND gates using diffractive networks. Sci. Rep., 12, 7121(2022).

    [66] Y. Luo et al. Design of task-specific optical systems using broadband diffractive neural networks. Light Sci. Appl., 8, 112(2019).

    [67] M. Veli et al. Terahertz pulse shaping using diffractive surfaces. Nat. Commun., 12, 37(2021).

    [68] Z. Huang et al. All-optical signal processing of vortex beams with diffractive deep neural networks. Phys. Rev. Appl., 15, 014037(2021).

    [69] O. Kulce et al. All-optical synthesis of an arbitrary linear transformation using diffractive surfaces. Light Sci. Appl., 10, 196(2021).

    [70] J. Li et al. Polarization multiplexed diffractive computing: all-optical implementation of a group of linear transformations through a polarization-encoded diffractive network. Light Sci. Appl., 11, 153(2022).

    [71] T. Ishihara et al. An optical neural network architecture based on highly parallelized WDM-multiplier-accumulator. IEEE/ACM Workshop on Photonics-Opt. Technol. Oriented Networking, Inf. and Comput. Syst. (PHOTONICS), 15-21(2019).

    [72] R. Hamerly et al. Edge computing with optical neural networks via WDM weight broadcasting. Proc. SPIE, 11804, 118041R(2021).

    [73] A. Totovic et al. Programmable photonic neural networks combining WDM with coherent linear optics. Sci. Rep., 12, 5605(2022).

    [74] TSL-570|SANTEC CORPORATION: The photonics pioneer.

    [75] MEMS-VCSEL swept-wavelength laser sources.

    [76] D. P. Kingma, J. Ba. Adam: a method for stochastic optimization(2014).

    [77] I. Loshchilov, F. Hutter. Decoupled weight decay regularization, 18(2019).

    Jingxi Li, Tianyi Gan, Bijie Bai, Yi Luo, Mona Jarrahi, Aydogan Ozcan. Massively parallel universal linear transformations using a wavelength-multiplexed diffractive optical network[J]. Advanced Photonics, 2023, 5(1): 016003
    Download Citation