Advanced all-optical classification using orbital-angular-momentum-encoded diffractive networks

Kuo Zhang; Kun Liao; Haohang Cheng; Shuai Feng; Xiaoyong Hu

doi:10.1117/1.APN.2.6.066006

Abstract

As a successful case of combining deep learning with photonics, the research on optical machine learning has recently undergone rapid development. Among various optical classification frameworks, diffractive networks have been shown to have unique advantages in all-optical reasoning. As an important property of light, the orbital angular momentum (OAM) of light shows orthogonality and mode-infinity, which can enhance the ability of parallel classification in information processing. However, there have been few all-optical diffractive networks under the OAM mode encoding. Here, we report a strategy of OAM-encoded diffractive deep neural network (OAM-encoded D²NN) that encodes the spatial information of objects into the OAM spectrum of the diffracted light to perform all-optical object classification. We demonstrated three different OAM-encoded D²NNs to realize (1) single detector OAM-encoded D²NN for single task classification, (2) single detector OAM-encoded D²NN for multitask classification, and (3) multidetector OAM-encoded D²NN for repeatable multitask classification. We provide a feasible way to improve the performance of all-optical object classification and open up promising research directions for D²NN by proposing OAM-encoded D²NN.

Keywords

deep learning diffractive deep neural network optical classification orbital angular momentum multiplexing

1 Introduction

The exponential growth of information and data processing has led to bottlenecks in the continuous improvement of performance for traditional electronic hardware processors.1 To address this problem, all-optical computing using photons as information carriers has become a promising solution.2^–6 Compared with traditional electronic hardware computing, optical computing offers several advantages, including ultrafast computing speed,7^,8 ultralow energy consumption,9 and significant potential for parallel computing.10^,11 In recent years, with the rapid development of deep learning,12 optical computing based on deep learning with different implementation schemes has been increasingly applied to various tasks,13 such as vowel recognition,9 image classification,11^,14^–17 mathematical operations,7 and matrix operations.18^–25

A diffractive deep neural network ( $D^{2} NN$ ) is a series of successive diffractive layers designed in a computer using error backpropagation and stochastic gradient descent methods.11 Unlike machine vision systems that use conventional optics, the diffractive layer of $D^{2} NN$ consists of a series of two-dimensional passive pixel arrays. Each pixel point on the diffractive layer is a parameter that can be learned by the computer and can be used for independent complex-valued tuning of the light field. Based on its capabilities in optical information processing, $D^{2} NN$ has been applied to image recognition,11^,26^–40 optical logic operations,41^–43 terahertz pulse shaping,44 phase retrieval,45 and image reconstruction,15^,46^–48 etc.

Diffractive networks performed in passive optical elements have the advantages of fast processing speed and low energy consumption, while also enabling flexible utilization of various degrees of freedom of light in the network. For example, when using broadband light instead of monochromatic light to illuminate the diffractive networks, spectrally encoded machine vision applications,15^,38 parallel computing,39 snapshot multispectral imaging,48 and spatially controlled wavelength multiplexing/demultiplexing49 can be accomplished. In addition, the linear transformation of polarization multiplexing can be achieved by using the polarization properties of light in diffractive networks instead of being based on birefringence or polarization-sensitive materials,50 which fully demonstrates the classification and computational potential of diffractive networks in complex-valued matrix vector operations. So far, the phase, amplitude, polarization, and wavelength of light have been applied in different diffractive networks to perform the specific required computational tasks.

Sign up for Advanced Photonics Nexus TOC. Get the latest issue of Advanced Photonics Nexus delivered right to you！Sign up now

As another important property of light, the orbital angular momentum (OAM) modes carried by vortex beams (VBs) are widely used in various fields by virtue of the unique properties brought about by their wavefront structure.51^–60 In terms of the combination of the $D^{2} NN$ with OAM modes, multiplexing/demultiplexing of OAM modes,61^–63 optical logic gates,42 holography,64^,65 and spectral detection66 have been reported in recent years. These works show the great potential of $D^{2} NN$ in handling complex OAM modes. Since parallel object classification requires multiple independent channels as carriers for information processing, the orthogonality and near-infinite mode of the OAM can present significant pattern differentiation and recognition robustness during propagation, which is well suited for application in all-optical parallel classification. However, the near-infinite OAM mode has not yet been utilized in $D^{2} NN$ to achieve advanced all-optical classification.

Here, we report on the strategy of OAM-encoded diffractive deep neural networks (OAM-encoded $D^{2} NNs$ ) which encodes the spatial information of objects into OAM modes of light by using deep-learning-trained diffractive layers to perform recognition and classification in vortex light multiplexed by different OAM modes. We use a VB that multiplexes 10 OAM modes with different topological charges while maintaining equal weights. And the beam is used to illuminate handwritten digits, which then pass through five diffractive layers of $D^{2} NN$ . The modulated vortex light is obtained at the output, and its OAM spectrum is analyzed. The normalized intensity distribution of each OAM in the OAM spectrum is assigned to a digit/class.

(1) First, we demonstrate a single detector OAM-encoded $D^{2} NN$ for a single task classification. We achieve a blind accuracy of 85.43% for the Mixed National Institute of Standards and Technology (MNIST) data set.67 For comparison, the spectrally encoded single-pixel machine vision without image reconstruction achieved blind test accuracy of 84.02% for the same data sets.15 (2) In addition, we show a single detector OAM-encoded $D^{2} NN$ for multitask classification. To evaluate the discriminative criteria for multi-object classification, we propose the self-defined MNIST array data set and MNIST repeatable array data set (see Sec. 4.4). Most of the previous multitask classification works were performed on several different data sets for parallel recognition.16^,39 However, their accuracies were calculated separately and independently for each data set; few of them were computed in parallel on the same data set. The MNIST array data set and MNIST repeatable array data set will give some digits as a digit array for classification each time. When any one or more digits in the input are inferred incorrectly, we assume that the digit array is judged incorrectly. So, there are a large number of cases where the correct inference of just one digit in an array is attributed to misclassification. We achieved a blind test accuracy of 64.13% for the MNIST array data set. In fact, there are 45 inferred categories in the MNIST array data set, which is significantly larger than the 10 categories in the MNIST data set. (3) Moreover, we design a multidetector OAM-encoded $D^{2} NN$ for repeatable multitask classification. By measuring multiple OAM spectra of beams and comparing their intensities, we achieve parallel classification for two-digit, three-digit, and four-digit MNIST repeatable array data sets. Although using the MNIST array data set and the MNIST repeatable array data set instead of the MNIST data set undoubtedly increases the difficulty of judgment, the advantages of advanced parallel classification are highlighted by the process of promoting a single task into multiple tasks.

As shown in Table 1, this work achieves a breakthrough in parallel classification by utilizing the OAM degree of freedom compared to other existing $D^{2} NN$ designs. We believe that OAM-encoded $D^{2} NNs$ provide a powerful framework to further improve the capability of all-optical parallel classification and OAM-based machine vision tasks. In the near future, the development of OAM mode multiplexing/demultiplexing technology may enable the application of OAM combs consisting of hundreds of OAM modes.60 The advancement will be possible to introduce more OAM modes into the OAM-encoded $D^{2} NN$ , and thus break through to a higher degree of parallelism for solving more complex multitasking parallel classifications.


Reference	Degree of freedom	Footprint	Function	Performance	Parallel classification	Single detector
This work	OAM	$164.3 μ m \times 164.3 μ m$	Image recognition	Accuracy: 85.49%	Yes	Yes
11	—	$8 cm \times 8 cm$	Image recognition	Accuracy: 93.39%	No	No
15	Wavelength	$8 cm \times 8 cm$	Image recognition	Accuracy: 91.29% (84.02%)^a	No	Yes
38	Wavelength	$6 cm \times 6 cm$	Image recognition	Accuracy: 87.74%	No	Yes
39	Wavelength	$0.8 mm \times 0.8 mm$	Image recognition	Accuracies of four tasks are 92.8%, 83.0%, 81.0%, and 90.4%, respectively	Yes	No
48	Wavelength	$88.2 μ m \times 88.2 μ m$	Multispectral imaging	Filter transmission efficiency: $> 79 %$	—	—
49	Wavelength	$5 cm \times 5 cm$	Spectral filters	Process optical waves over a continuous, wide range of frequencies	—	—
16	Polarization	$11.2 μ m \times 11.2 μ m$	Image recognition	Accuracy: 93.75%	Yes	No
50	Polarization	$24 λ \times 24 λ$	Linear transformations	Perform multiple complex-valued, arbitrary linear transformations using polarization multiplexing	—	—
42	OAM	3 cm × 3 cm	Logic operation	Proposed an OAM logical operation	—	—
61	OAM	3 cm × 3 cm	Optical communication	The diffraction efficiency and mode conversion purity: $> 96 %$ .	—	—
The bit error rates: $< 10^{- 4}$
64	OAM	$2.5 μ m \times 2.5 μ m$	Holography	10 multiplexed OAM modes among five spatial depths in deep multiplexing holography	—	—


66	OAM	$100 λ \times 100 λ$	Spectral detection	Optical operations/electronic operations: $\sim 10^{3}$	—	—

Table 1. Comparison with other $D^{2} NN$ using more than three degrees of freedom.

View all Tables

2 Results

2.1 Design of OAM-Encoded $D^{2} NNs$

In this paper, we demonstrate an approach to incorporate OAM into $D^{2} NN$ , which encode the spatial information of objects into the OAM modes of light. Our approach is based on the Fresnel scalar diffraction theory, and we propose three different variants of OAM-encoded $D^{2} NNs$ , as shown in Fig. 1. The schematic diagram illustrates the OAM-encoded $D^{2} NN$ structures and highlights the similarities and differences between the proposed OAM-encoded $D^{2} NNs$ . The similarity among the proposed OAM-encoded $D^{2} NNs$ is that they are all composed of five diffractive layers, with a constant spacing of 1.55 mm between the input layer and the diffractive layer, as well as between the diffractive layers and between the diffractive layer and the output layer. The distance is determined by the qualifying conditions of the Fresnel scalar diffraction theory. The number of diffractive units per layer is $200 \times 200$ . These diffractive networks are trained to run independently without being coupled to other networks, although they have the same number of layers and neurons. At the input, an OAM mode is generated by using a Laguerre–Gaussian (LG) beam operating at 1550 nm, with a waist radius of $3 λ$ . Ten OAM modes with $m \in [- 5, + 5]$ are selected, each corresponding to one of the 10 categories of handwritten digits in the MNIST data set. The $+ 1$ to $+ 5$ OAM modes represent digits 0 to 4, while the $- 1$ to $- 5$ OAM modes represent digits 5 to 9. A VB multiplexes 10 OAM modes with equal weights to illuminate handwritten digits. The equation we employed for multiplexing LG beams carrying different OAM modes can be expressed as follows: $f_{multiple OAM} (r, φ, z) = f_{OAM 1} (r, φ, z) + f_{OAM 2} (r, φ, z) + \dots + f_{OAM m} (r, φ, z),$ (1)where $f_{OAM m} (r, φ, z)$ represents the input OAM beams and $m$ represents the topological charge of the OAM beams. After irradiating the digits, due to the different transmission light distribution of different digits, each OAM mode will generate independent post-transmission complex amplitude information and encode the spatial position information of the digits into the OAM mode information.

Figure 1.Schematic diagrams of the three types of the OAM-encoded $D^{2} NN$ . The OAM beams illuminating the digits are multiplexed by 10 OAM modes ranging from $- 5$ to $+ 5$ in equal proportions. The red numbers represent the topological charges of the OAM modes, while the black numbers in brackets correspond to the assumed digits associated with the OAM modes. The digit inputs are illuminated by the multiplexed OAM beams, and the predicted OAM beams are obtained in the output plane after modulation by the OAM-encoded $D^{2} NNs$ . The right side of the output plane shows the OAM spectra of the OAM beams. Three different configurations of OAM-encoded $D^{2} NNs$ have been described below: (a) single detector OAM-encoded $D^{2} NN$ for single-task classification, (b) single detector OAM-encoded $D^{2} NN$ for multitask classification, and (c) multidetector OAM-encoded $D^{2} NN$ for multitask classification.

The first scheme with the OAM-encoded $D^{2} NN$ demonstrated encoding of a single digit using the OAM mode, then transmitting it through diffractive layers. The results showed that the OAM beam generated in the output plane corresponded to the handwritten digit input, as shown in Fig. 1(a). The OAM-encoded $D^{2} NN$ was then used for parallel image recognition. As shown in Fig. 1(b), two different categories of digits were positioned in separate spatial locations, encoded with the OAM mode, and transmitted simultaneously through the diffractive networks. The result was an independent multiplexed OAM beam at the output, with the OAM modes corresponding to the two initial input digit categories. However, using a single detector for parallel detection resulted in an inability to distinguish between identical digits, as the single detector OAM-encoded $D^{2} NN$ lacked the ability to detect sequential information. To address this issue, a multidetector OAM-encoded $D^{2} NN$ was used to discriminate repeating digits [see Fig. 1(c)]. Compared to the single detector OAM-encoded $D^{2} NN$ , the ability of the multiple detectors to encode sequential information between repeating digits allows them to recognize the same digits while further increasing the parallel classification power of the diffractive network.

2.2 Single Detector OAM-Encoded $D^{2} NN$ for Single Task Classification

Here is a demonstration of the recognition of an OAM encoded digit “1” using the single detector (not a single-pixel detector) OAM-encoded $D^{2} NN$ . A multiplexed OAM beam is used to illuminate the MNIST handwritten digit “1” and then passes through the diffractive layers, resulting in a modulated OAM beam at the output receiver plane [see Fig. 2(b)]. The optical field distribution of the input OAM-encoded digit “1” in each layer after modulation by the trained diffractive networks is shown in Fig. 2(a). It can be seen that the input digit “1” exhibits a residual, which is caused by the uneven distribution of light intensity in the mixed OAM beam. By our comparison, this type of irradiation does not affect the accuracy of the blind test recognition. After the modulation of the diffractive layers, a second-order OAM beam is reconstructed at the output, which can show that our diffractive network is able to perform the given task relatively well. Although the output light contains non-single OAM modes due to modulation limitations and diffraction effects, the classification can still be inferred from the intensity distributions among different OAM modes. We obtained the normalized intensity distribution of each OAM mode by analyzing the OAM spectra of the OAM beams at the output (see Sec. 4.3). The category of the inferred digit is determined by the highest normalized intensity of the OAM mode. As shown in Fig. 2(c), the intensity of the OAM mode with $m = + 2$ , corresponding to the digit “1,” is 79.37%, which is significantly higher than that of other OAM modes, demonstrating effective filtering of the vortex light with other OAM modes.

$(a) The amplitude and phase distributions of the OAM beams are shown for the input plane, the diffractive layers, and the output plane. The input image is a handwritten digit “1” encoded as an OAM beam with +2 mode. (b) Schematic of the modulation of the light field by the single-detector OAM-encoded D2NN. (c) The OAM spectrum of the output OAM beams. The red plot corresponding to the OAM mode with the highest normalized intensity indicates the inferred category of the input digit. (d) The loss and accuracy functions for both the training and test sets. Three simulations were conducted for each set, and the corresponding results are represented by the three dashed lines. The solid lines represent the average results of the three function curves depicted by the dashed lines. (e) A confusion matrix summarizes the numerical classification results in the test set. The matrix provides a comprehensive overview of the performance of the single-detector OAM-encoded D2NN in recognizing the handwritten digits from the MNIST data set.$

Figure 2.(a) The amplitude and phase distributions of the OAM beams are shown for the input plane, the diffractive layers, and the output plane. The input image is a handwritten digit “1” encoded as an OAM beam with +2 mode. (b) Schematic of the modulation of the light field by the single-detector OAM-encoded $D^{2} NN$ . (c) The OAM spectrum of the output OAM beams. The red plot corresponding to the OAM mode with the highest normalized intensity indicates the inferred category of the input digit. (d) The loss and accuracy functions for both the training and test sets. Three simulations were conducted for each set, and the corresponding results are represented by the three dashed lines. The solid lines represent the average results of the three function curves depicted by the dashed lines. (e) A confusion matrix summarizes the numerical classification results in the test set. The matrix provides a comprehensive overview of the performance of the single-detector OAM-encoded $D^{2} NN$ in recognizing the handwritten digits from the MNIST data set.

During the training process, the single-detector OAM-encoded $D^{2} NN$ reduces the loss value by continuously updating and adjusting the phase and amplitude distribution of the diffractive layers. The loss and accuracy functions for both the training and testing phases are shown in Fig. 2(d), where the dashed lines represent the results of each recognition, and the solid lines represent the average of the results of the three recognitions. From the mean curves, it can be seen that the single-detector OAM-encoded $D^{2} NN$ experiences a sharp drop in the loss function at the beginning of the iterative process and then stabilizes after a few iterations. In addition, the test accuracy is slightly higher than the training accuracy, and the loss function exhibits smoother fluctuations during the test phase. The blind accuracy of the single-detector OAM-encoded $D^{2} NN$ for the MNIST data set was found to be 85.49% [as shown in Fig. 2(e)]. The accuracy of $D^{2} NN$ using OAM encoding is essentially the same as that of $D^{2} NN$ with wavelength encoding when compared to spectrally encoded single-pixel machine vision using diffractive networks that do not reconstruct images.15 It should be noted that the single detector of OAM-encoded $D^{2} NN$ is not a single-pixel detector, but rather a single interferometer-like detector (see Sec. 4.3). This shows that this single-detector OAM-encoded $D^{2} NN$ design can efficiently perform a single-digit recognition task.

2.3 Single-detector OAM-Encoded $D^{2} NN$ for Multitask Classification

Following our demonstration of single image classification using OAM-encoded $D^{2} NN$ , we present a more challenging application of the same framework: single-detector OAM-encoded $D^{2} NN$ for multitask classification. In Fig. 3(b), by simultaneously irradiating two different digits, “7” and “0,” with independent spatial distributions as an array to the input layer, OAM beams are generated at the center of the output layer, multiplexing the OAM modes with $m = 3$ and $m = + 1$ corresponding to each of the two digits. The OAM-encoded $D^{2} NN$ multiplexes the spatial information of both digits into the same OAM beams, effectively utilizing the orthogonality of the OAM modes. However, if the two input digits have the same label, using the highest normalized intensity measure may lead to indistinguishable outcomes. For example, whether we input two digits “2” as an array or one digit “2” combining an array of other digits, a single-detector OAM-encoded $D^{2} NN$ cannot accurately determine how many digits “2” are present at the input because only the highest intensity is considered as the judgment criterion, which can lead to a large error in the network. To address this issue, we utilize a modified MNIST array data set that prevents the inclusion of digits with the same label in a single array (see Sec. 4.4). In Fig. 3(a), the two input digits are modulated by the diffractive layers to produce the optical field in the output plane with the expected OAM modes. By detecting the OAM spectra of the OAM beams at the output, the two OAM modes with the highest normalized intensity represent the classes of the presumed digits [see Fig. 3(c)]. Among them, the normalized intensity of the OAM mode with $m = 3$ corresponding to the digit “7” is 38.97%, and the normalized intensity of the OAM mode with $m = + 1$ corresponding to the digit “0” is 35.57%, which far exceeds the other modes. Although OAM modes with the same proportional intensity distribution should theoretically be obtained at the output, the problem of different intensities between the two OAM modes is inevitable due to the limitations of the diffractive network modulation capability. However, this uneven distribution of intensities only slightly affects the accuracy of the inference (after testing, the accuracy error caused by this distribution does not exceed 1%).

$(a) The amplitude and phase distribution of the OAM beams in the input plane, diffractive layers, and output plane. The input handwritten digits are “7” and “0,” which correspond to the multiplexed OAM beams that produce “-3” and “+1” OAM modes. (b) Schematic of the light field modulation by single-detector OAM-encoded D2NN for multitask classification. The OAM beam encodes two handwritten digits as the input. After undergoing OAM-encoded D2NN modulation, it produces a new OAM beam corresponding to two modes at the same spatial location. (c) The OAM spectrum of the output OAM beams. The two OAM modes detected by the detector with the highest normalized intensity represent the assumed categories of the input digits, and their classes are indicated by the red bars. (d) Loss function and accuracy during training and testing. Solid lines indicate the average result of the three-function curve represented by the dashed line. (e) The confusion matrix summarizes the numerical classification result in the test set.$

Figure 3.(a) The amplitude and phase distribution of the OAM beams in the input plane, diffractive layers, and output plane. The input handwritten digits are “7” and “0,” which correspond to the multiplexed OAM beams that produce “-3” and “+1” OAM modes. (b) Schematic of the light field modulation by single-detector OAM-encoded $D^{2} NN$ for multitask classification. The OAM beam encodes two handwritten digits as the input. After undergoing OAM-encoded $D^{2} NN$ modulation, it produces a new OAM beam corresponding to two modes at the same spatial location. (c) The OAM spectrum of the output OAM beams. The two OAM modes detected by the detector with the highest normalized intensity represent the assumed categories of the input digits, and their classes are indicated by the red bars. (d) Loss function and accuracy during training and testing. Solid lines indicate the average result of the three-function curve represented by the dashed line. (e) The confusion matrix summarizes the numerical classification result in the test set.

After iterative training convergence, our single-detector OAM-encoded $D^{2} NN$ for multitask classification can achieve a blind measurement accuracy of 64.13% [see Fig. 3(d)]. The test results obtained indicate that the accuracy of the single-detector OAM-encoded $D^{2} NN$ , which performs parallel recognition of multiple digits, is lower compared to the previously reported $D^{2} NNs$ . In terms of accuracy requirements, the OAM-encoded $D^{2} NN$ must correctly recognize all digits in the input array. As can be seen from the confusion matrix, there are actually 45 categories to be recognized in the MNIST array data set, which is significantly larger than the 10 categories in the MNIST data set [see Fig. 3(e)]. It is the substantial increase in task complexity that causes the plummeting of our blind test accuracy for multitask classification compared to that for single-task classification.

2.4 Multidetector OAM-Encoded $D^{2} NN$ for Repeatable Multitask Classification

Next, when considering the ability of OAM-encoded $D^{2} NN$ to perform parallel recognition of large batches of images, it is necessary to load the sequence of digits into the light field. In addition, the reason we use multiple detectors is to simultaneously measure the OAM spectrum of multiplexed OAM beams at the output plane, which cannot be realized by using a single detector. If we separate the OAM beams at the output and utilize multiple detectors for OAM detection, we can enhance the capability of the OAM-encoded $D^{2} NN$ to process multiple images and introduce multiple digits at the input for multitask classification. In addition, we can use the positional information between different detectors to encode the sequential information of the same digits in an array and achieve parallel recognition of repeatable digit tasks.

Therefore, we propose a multidetector OAM-encoded $D^{2} NN$ for repeatable multitask classification that can encode repeatable numerical order using spatial information to enhance the parallel ability of the diffractive network to process more complex information. Unlike the first two schemes that generate a single multiplexed OAM beam at the central location, multiple OAM beams are generated at discrete spatial locations in the output plane. The number of generated OAM beams is equal to the number of digits in the input array, facilitating the use of multiple detectors for identification and classification. Figure 4(b) shows a schematic demonstration of the four-detector OAM-encoded $D^{2} NN$ . When the four digits are modulated by the diffractive layers, they will produce OAM beams with the corresponding OAM modes at the specified spatial locations in the output layer. Figure 4(a) shows the amplitude and phase of the input two, three, and four digits at different positions in the input layer, diffractive layers, and output layer, respectively. It can be seen that the intensities of different output OAM beams are not uniformly distributed, which is similar to the problem encountered in single-detector OAM-encoded $D^{2} NN$ for single-task classification, and is caused by the limitation of the diffractive network’s own modulation capability. In addition, it is shown in Fig. 4(a) that there is only a logical correspondence between our input and output layers for digital recognition, and no direct correspondence in the optical path propagation. When the digits “6” and “0” are entered, the intensity of the generated OAM mode $m = - 2$ and $+ 1$ corresponding to their digit classification accounts for 46.55% and 69.77% of the OAM beam, respectively. When the arrays “6,” “1,” and “3” with repeatable digits are input, the normalized intensities of the corresponding OAM modes $m = - 2$ , $+ 2$ , and $+ 4$ are 51.78%, 40.98%, and 45.20% of the output, respectively. And the OAM modes $m = + 3$ , $+ 2$ , $- 3$ , and $- 4$ corresponding to the array containing repeatable digits “2,” “1,” “7,” and “3” account for 46.77%, 42.27%, 38.84%, and 34.73% of the total intensity, respectively. These proportions exceed the intensity accounted for by the other OAM modes [see Fig. 4(c)]. It can be seen that the multidetector OAM-encoded $D^{2} NN$ can handle the parallel recognition task excellently when spatially separated OAM beams are generated at the output and jointly detected by the same number of detectors.

$(a) From top to bottom, the multidetector OAM-encoded D2NN provides recognition for two digits, three digits, and four-digits, respectively. The amplitude and phase distribution of the OAM beams in the input plane, diffractive layers, and output plane. (b) Schematic of the light field modulation by four-detector OAM-encoded D2NN for multitask classification. Each input OAM beam at different positions encodes only one digit and generates the corresponding OAM mode of that digit at the output, which is detected by a detector at a fixed position. (c) The OAM spectrum of the output OAM beams. The two blue OAM spectra correspond to the OAM beams generated by the two-detector OAM-encoded D2NN, from top to bottom, respectively. The green OAM spectrum in the first row corresponds to the separate OAM beam in the first row of the three-detector OAM-encoded D2NN, and the green OAM spectra in the second and third rows correspond to the two OAM beams from left to right in the second row, respectively. The four red OAM spectra are arranged in a sequential relationship from left to right and from top to bottom.$

Figure 4.(a) From top to bottom, the multidetector OAM-encoded $D^{2} NN$ provides recognition for two digits, three digits, and four-digits, respectively. The amplitude and phase distribution of the OAM beams in the input plane, diffractive layers, and output plane. (b) Schematic of the light field modulation by four-detector OAM-encoded $D^{2} NN$ for multitask classification. Each input OAM beam at different positions encodes only one digit and generates the corresponding OAM mode of that digit at the output, which is detected by a detector at a fixed position. (c) The OAM spectrum of the output OAM beams. The two blue OAM spectra correspond to the OAM beams generated by the two-detector OAM-encoded $D^{2} NN$ , from top to bottom, respectively. The green OAM spectrum in the first row corresponds to the separate OAM beam in the first row of the three-detector OAM-encoded $D^{2} NN$ , and the green OAM spectra in the second and third rows correspond to the two OAM beams from left to right in the second row, respectively. The four red OAM spectra are arranged in a sequential relationship from left to right and from top to bottom.

The accuracy curves obtained from successive iterative tests show that the multidetector OAM-encoded $D^{2} NN$ achieves blind test accuracies of 70.94%, 52.41%, and 40.13% for two-digit, three-digit, and four-digit MNIST repeatable array data sets [see Fig. 5(a)]. Facing the same challenge as the single-detector OAM-encoded $D^{2} NN$ for multitask classification, the rapid increase in the number of labels in the repeatable array data set further degrades the blind testing accuracy of the network. The two-digit, three-digit, and four-digit data sets have 100, 1000, and 10,000 labels, respectively. The difficulty is much higher than that of the original MNIST data set because it requires correctly classifying every digit in the array. In three-detector and four-detector OAM-encoded $D^{2} NNs$ , there are too many labels consisting of different digits, and it is not feasible to display a pixel map of this size within the limited space for the inserted image. However, if we only capture a portion of the confusion matrix, we would sacrifice the comprehensiveness of all the data. Therefore, we choose a scaled-down version of the confusion matrix for the inserted image while employing a localized zoom approach [Fig. 5(b)]. In addition, the results of the multidetector OAM-encoded $D^{2} NN$ for repeatable multitask classification show that using more digits for parallel classification within the same array leads to a further decrease in classification accuracy. The ability of the OAM-encoded $D^{2} NN$ to handle more digits can be improved by adopting certain approaches, such as increasing the size of the diffractive layer and expanding the number of neurons used for recognition.

Figure 5.(a) The loss function and accuracy function of the two-detector, three-detector, and four-detector OAM-encoded $D^{2} NNs$ in training and testing are arranged from left to right. The solid line represents the average result of the function curves for the three simulations, which is represented by the dashed line. Their average accuracy in the test set is 70.94%, 52.41%, and 40.13%, respectively. (b) Confusion matrices of the three multidetector OAM-encoded $D^{2} NNs$ , summarizing the numerical classification results of the test set. Due to the large number of pixel points in the confusion matrices of the three-detector and four-detector OAM-encoded $D^{2} NNs$ , the confusion matrices are reduced and localized zoomed-in images are inserted.

3 Discussion and Conclusions

Experimental implementation of $D^{2} NN$ typically uses a spatial light modulator to modulate the light source and 3D printing to fabricate metasurfaces designed by an electronic computer. Limited by the precision size of 3D printing, this fabrication method is typically only available for terahertz bands. There are two main challenges in building OAM encoded $D^{2} NNs$ experimentally: sample fabrication and experimental measurement. Here, the OAM-encoded $D^{2} NN$ operates at the wavelength of 1550 nm, which corresponds to pixel sizes of $\sim 800 nm$ . The diffractive layer of the OAM-encoded $D^{2} NN$ can be fabricated by micro/nanoprocessing technology compatible with CMOS technology, as the current state-of-the-art e-beam lithography technology has a fabrication resolution of only a few nanometers. However, there are still certain challenges left to be considered in the fabrication process due to the on-chip multilayer structures. These challenges may include issues related to overlay, alignment, and other aspects68^,69 that need to be solved with improved technology.

When detecting the spectrum of the output OAM beam, it can be analyzed using interferometric methods, diffractive methods, and other detection methods.60^,61^,67 In terms of measuring the diffractive network, here we take the interferometric method as an example. This method can detect the OAM spectra of multiplexed OAM beams, not only the single OAM mode. The measurement details of the detector are outlined in Sec. 4.3. For the MNIST data set and the MNIST array data set, a single detector at the output plane of the diffractive network is sufficient for OAM spectrum analysis. However, for the MNIST repeatable array data set, we need to use multiple detectors to achieve simultaneous detection of different OAM modes corresponding to different categorized digits.

At the same time, the OAM-encoded $D^{2} NNs$ require an interferometer detector with a high signal-to-noise ratio and high sensitivity, considering reflections, material absorptions, scattering, and other loss issues; we can attempt to decrease the sensitivity and robustness requirements of the detector. One approach is to increase the intensity of the optical signal received by the detector, which can be achieved by reducing the number of layers to minimize absorption and reflection losses. Note that there is always a trade-off between classification accuracy and output efficiency. As we are dealing with an optical classification network, we only need the detected effective optical signal to meet the minimal requirements for classification. Despite the difficulties, we believe that there is great potential for realizing this scheme of OAM-encoded $D^{2} NN$ as technology develops.

In summary, we have proposed and investigated an all-optical parallel classification using OAM mode-encoded diffractive networks, which can encode the spatial information of multiple objects as OAM modes of the VB. And then we utilize OAM spectra to analyze the OAM mode normalized intensity distribution for multitask optical classification. If the inference accuracy of the existing OAM-encoded $D^{2} NN$ can be further improved, it can be extended from target recognition to other deep-learning tasks, such as multilabel classification and dynamic image recognition. We also envision introducing more OAM modes (this may require the use of a more advanced multimode OAM comb as a light source60) to solve more complex tasks. Finally, we expect that the OAM-encoded $D^{2} NN$ can provide a new feasible pathway for all-optical parallel classification and OAM-based machine vision.

4 Appendix: Materials and Methods

4.1 Forward Propagation Model of the OAM-Encoded $D^{2} NN$

Traditional deep neural networks rely on forward propagation, backward propagation, and gradient descent algorithms for brain-like electronic computation by continuously adjusting the weights of electronic neurons. The diffraction of light that occurs during propagation is very similar to the way neurons are connected in deep neural networks. Based on the Rayleigh–Sommerfeld diffraction,70 each diffractive unit/neuron can be regarded as a coherent superposition of light propagating from every diffractive unit/neuron in the preceding diffractive layer. It can also be seen as the source of a secondary wave that is fully connected to the subsequent layer. The equation of light propagation between diffraction layers is given as $w_{i}^{l} (x, y, z) = \frac{z - z_{i}}{r^{2}} (\frac{1}{2 π r} + \frac{1}{j λ}) \exp (\frac{j 2 π r}{λ}),$ (2)where $w_{i}^{l} (x, y, z)$ is the complex-valued field propagated to each diffractive unit located at $(x, y, z)$ in layer $l + 1$ ’th by using the $i$ ’th diffractive unit located at $(x_{i}, y_{i}, z_{i})$ in layer $l$ ’th with a wavelength of $λ$ as the wave source, $r = \sqrt{{(x - x_{i})}^{2} + {(y - y_{i})}^{2} + {(z - z_{i})}^{2}}$ and $j^{2} = - 1$ . The light field function of the $i$ ’th neuron of the $l$ ’th layer $u_{i}^{l}$ can be considered as $u_{i}^{l} (x_{i}, y_{i}, z_{i}) = \sum_{j \in N} u_{j}^{l - 1} (x_{j}, y_{j}, z_{j}) \cdot t^{l} (x_{i}, y_{i}, z_{i}) \cdot w_{i}^{l - 1} (x_{i}, y_{i}, z_{i}),$ (3)where $N$ denotes all the pixels on the previous diffractive layer. $t^{l} (x_{i}, y_{i}, z_{i})$ is the complex-valued modulation of the optical field by the $l$ ’th diffractive layer, which has the functional expression $t^{l} (x_{i}, y_{i}, z_{i}) = a^{l} (x_{i}, y_{i}, z_{i}) \cdot \exp [j ϕ^{l} (x_{i}, y_{i}, z_{i})]$ , where $a$ and $ϕ$ denote the amplitude and phase coefficients, respectively, and both of which are trainable parameters in the diffractive networks, where $a$ and $ϕ$ are allowed in the range from 0 to 1 and 0 to $2 π$ , respectively.

Due to the significant computational burden associated with solving the conventional $D^{2} NN$ model using the Rayleigh–Sommerfeld formula, the use of Fresnel scalar diffraction theory can effectively reduce the computational effort. This theory can replace the Rayleigh–Sommerfeld formula in the results under the conditions of the layer spacing we use. Here, we use the Fresnel scalar diffraction theory to construct the forward propagation model of OAM-encoded diffractive neural networks. The complex amplitude of the OAM beam of the $i$ ’th neuron of the $l$ ’th layer $u_{i}^{l}$ can be considered as $u_{i}^{l} (x_{i}, y_{i}) = F^{- 1} {F [u_{i}^{l - 1} (x_{i}, y_{i}) \cdot t^{l - 1} (x_{i}, y_{i})] \cdot H (f_{x}, f_{y})},$ (4) $H (f_{x}, f_{y}) = \exp [j k (z - z_{i})] \cdot \exp [- j λ π (z - z_{i}) (f_{x}^{2} + f_{y}^{2})],$ (5)where $F$ and $F^{- 1}$ denote the fast Fourier transform and reverse fast Fourier transform, respectively, which are functions that represent the transformation of the optical field between the spatial and frequency domains, where $H (f_{x}, f_{y})$ is the transformation function in the frequency, which represents the propagation of the OAM beam in free space. $k = \frac{2 π}{λ}$ represents the wavenumber.

4.2 Error Analysis of OAM-encoded $D^{2} NN$

In the main text, the OAM-encoded $D^{2} NN$ is based entirely on the ideal case with fixed parameters. When considering the experiments, different factors such as fabrication size errors, optical alignment errors, and material absorption may affect the performance of the diffractive network. Here, we present a systematic analysis of the various types of error problems that may be encountered by OAM-encoded $D^{2} NN$ .

4.2.1 Deviation analysis of the pixel size and the layer spacing

According to the Fresnel scalar diffraction theory, the spacing between layers of the diffractive network should be at least 10 times larger than the size of entire layer. Therefore, we grouped the pixel size and optical full-sized errors together for analysis. We assumed a deviation of $\pm 20 %$ in the manufacturing dimensions, which is much larger than the fabrication error of the CMOS machining process.68^,69 We considered an error range of 0.8 times the pixel size and the corresponding layer spacing, as well as a range of 1.2 times the pixel size and the corresponding layer spacing. As shown in Fig. 6(a), the accuracy of the OAM-encoded $D^{2} NN$ varies within 1% of this error range. Therefore, we believe that the errors in pixel size and layer spacing caused by processing and manufacturing do not affect the OAM-encoded $D^{2} NN$ .

$The different colored curves represent different diffractive networks, as illustrated in the square diagram located in the lower left corner. (a) The deviation of the pixel size and the layer spacing. The horizontal coordinate represents the error range from 0.8 times the pixel size and the corresponding layer spacing to 1.2 times the pixel size and the corresponding layer spacing. (b) The analysis of the deviation of the object misalignment in horizontal and vertical directions. (c) The analysis of the deviation of the misalignment layer. The left image represents a random misalignment error of 5% for each layer, while the right image represents a random misalignment error of 10% for each layer.$

Figure 6.The different colored curves represent different diffractive networks, as illustrated in the square diagram located in the lower left corner. (a) The deviation of the pixel size and the layer spacing. The horizontal coordinate represents the error range from 0.8 times the pixel size and the corresponding layer spacing to 1.2 times the pixel size and the corresponding layer spacing. (b) The analysis of the deviation of the object misalignment in horizontal and vertical directions. (c) The analysis of the deviation of the misalignment layer. The left image represents a random misalignment error of 5% for each layer, while the right image represents a random misalignment error of 10% for each layer.

4.2.2 Deviation analysis of the object misalignment

First, we consider the possible object misalignment error between the incident OAM beam and the digital mask. We introduced deviations of 2%, 4%, 6%, 8%, and 10% in both the horizontal and vertical directions of the object. For each of these object misalignment errors, we tested all five types of diffractive networks mentioned in our main text. As shown in Fig. 6(b), when the deviation of object misalignment is within 5% in both the horizontal and vertical directions, the accuracy of all OAM-encoded $D^{2} NNs$ , except for S-OAM-encoded $D^{2} NN$ -M (see Table 2 for the nomenclature), fluctuates within 1%. Therefore, our diffractive networks could ensure that the deviation of the incident beam from the digital mask does not exceed 5%, which is smaller than the range of fabrication error.68^,69


	Training time (h)	Training loss	Training accuracy (%)	Test loss	Test accuracy (%)
S-OAM-encoded $D^{2} NN$ -S	12.74	0.402	84.30	0.343	85.43
S-OAM-encoded $D^{2} NN$ -M	5.69	0.708	57.42	0.667	64.13
M-OAM-encoded $D^{2} NN$ -M(2)	6.04	0.820	67.69	0.772	70.94
M-OAM-encoded $D^{2} NN$ -M(3)	4.09	1.345	48.94	1.238	52.41
M-OAM-encoded $D^{2} NN$ -M(4)	3.19	1.970	36.25	1.932	40.13

Table 2. Various indices for single-detector OAM-encoded $D^{2} NN$ for single-task classification (S-OAM-encoded $D^{2} NN$ -S), single-detector OAM-encoded $D^{2} NN$ for multi-task classification (S-OAM-encoded $D^{2} NN$ -M), multidetector OAM-encoded $D^{2} NN$ for repeatable multitask classification (M-OAM-encoded $D^{2} NN$ -M).

View all Tables

In addition, we also observed an interesting phenomenon regarding the three-detector and four-detector OAM-encoded $D^{2} NN$ . Surprisingly, their accuracy seems to increase when the object misalignment error is around 5%. We hypothesize that this effect may be caused by misidentification of certain numbers when the incident beam deviates (e.g., when the OAM beam shifts horizontally to the right, it can cause the light intensity distribution of the number “8” to resemble that of the number “3” due to the nonuniform distribution of the light intensity of multiplexed OAM beams).

4.2.3 Deviation analysis of layer misalignment

Here, we selected two values for the misalignment error: 5% and 10%. This indicates that the layers would experience dislocations of 5% or 10% in random directions. As shown in Fig. 6(c), the horizontal coordinates represent the number of diffractive layers where the corresponding misalignment error occurred. It has been proven that the OAM-encoded $D^{2} NN$ is highly robust against layer alignment errors, with minimal impact on accuracy. In addition, to explore the limit of the OAM-encoded $D^{2} NN$ ’s sensitivity to layer alignment errors, we conducted additional tests on the single-detector OAM-encoded $D^{2} NN$ for single-task classification with a 20% misalignment error (see Fig. 6). The accuracy of the OAM-encoded $D^{2} NN$ starts to exhibit a slight decline of 1% under these conditions. Consequently, we conclude that the performance of diffractive network can be reliably maintained as long as the alignment bit error between layers remains within 20% during sample processing and experimental testing.

4.2.4 Absorption error analysis of materials

As for the absorption effect, the material we used for the diffractive layer is silicon nitride, which corresponds to an extinction coefficient $k = 0$ in the wavelength of 1550 nm and does not have an absorption effect in the simulation. Considering that the fabricated silicon nitride material may have a small extinction coefficient during the experimental test, we assumed $k$ to be 0.05 and incorporated it into the updated diffractive network for testing. After testing, the loss of $D^{2} NN$ is $< 1 %$ . This may be due to the thickness of the diffractive network is about $1 μ m$ , which almost fails to produce any absorption.

4.2.5 Reflection error analysis of diffractive layers

The loss of the whole OAM-encoded $D^{2} NN$ system is mainly due to the reflection from the diffractive layers. When we assume that the beam enters the diffractive layer with positive incidence, the transmittance $T$ can be calculated as $T = 1 - \frac{{(n_{2} - n_{1})}^{2}}{{(n_{2} + n_{1})}^{2}},$ (6)where $n_{1}$ and $n_{2}$ are the refractive indices of the two media, respectively. In the wavelength range of 1550 nm, the refractive index of silicon nitride is approximated to be 2, while the refractive index of air is 1. Therefore, it can be calculated that the transmission of each diffractive layer is $\sim 89 %$ . So, the transmission efficiency of the entire diffractive network is estimated to be around 56%. During the experimental test, the loss of the network will be higher than the theoretically calculated value. While we can attempt to reduce losses in the system, such as by reducing the number of layers in the diffractive network, thus minimizing absorption and reflection losses. Note that there is always a trade-off between classification accuracy and output efficiency. As we are dealing with an optical classification network, we only need to detect the effective optical signal against noise to meet the minimal requirements for classification. Despite the difficulties, we believe that there is great potential to realize this scheme of OAM-encoded $D^{2} NN$ as technology develops.

4.3 OAM Spectrum Analysis

Multiple OAM states can appear in the same beam and are not limited to a single OAM mode. Similar to the spectrum that represents the intensity weights of different frequencies or wavelengths, the intensity weights of different OAM channels on the same beam are called the OAM spectrum. The spiral harmonic $\exp (j m ϕ)$ is the eigenwave function of OAM, and the beam $E (r, ϕ, z)$ can be represented by the spiral harmonic $\exp (j m ϕ)$ in the column coordinates as $E (r, ϕ, z) = \frac{1}{\sqrt{2 π}} \sum_{m = - \infty}^{+ \infty} a_{l} (r, z) \exp (j m ϕ),$ (7)with the complex coefficient $a_{m}$ $a_{m} (r, z) = \frac{1}{\sqrt{2 π}} \int_{0}^{2 π} E (r, ϕ, z) \exp (- j m ϕ) d ϕ,$ (8)where $r$ represents the beam waist radius of the OAM beam, $z$ represents the radial distance of the beam propagation, and $m$ is the topological charge of the OAM. Thus, the intensity of the $m$ ’th order helical harmonic is $C_{m} = \int_{0}^{+ \infty} {| a_{m} (r, z) |}^{2} r d r .$ (9)

Since the value $C_{m}$ is independent of the parameter $z$ , the relative intensity of such a helical harmonic is $R_{m} = \frac{C_{m}}{\sum_{q = - \infty}^{+ \infty} C_{q}},$ (10)which is the OAM spectrum of $E (r, ϕ, z)$ . Among these considerations, detecting complex amplitude information in the output optical field is crucial. In simulations, acquiring the complex amplitude information of the output OAM beam is straightforward. However, in experimental detection, obtaining the complex amplitude information of the output OAM beam is not direct. Taking the interferometric method as an example, the phase information in the output optical field is obtained from the interference field between the beam to be measured and the probing Gaussian beam. Subsequently, when combined with the amplitude information detected by the CCD camera, we can obtain the complex amplitude information of the output beam. As long as the complex amplitude information of the output VB is obtained, we can further determine the corresponding OAM spectrum using the equations mentioned above. Therefore, we only need to obtain information on the complex amplitude of the output OAM light in the simulation to obtain its corresponding OAM spectrum.

4.4 Preparation of Data Sets

The MNIST array data set and the MNIST repeatable array data set are used in the study to evaluate the discriminative criteria for multi-object classification in the proposed OAM-encoded $D^{2} NN$ .

MNIST array data set: The digits in the MNIST data set are divided into 10 classes according to different labels, and the number of digits in each class is recorded. The labels of two random classes are arbitrarily selected using the shuffle function and combined into a label group containing two labels in no distinguishable order. Then, the data corresponding to the labels is selected separately from the data set, and the two selected data are stitched together into a new array. The generation of new arrays and label groups is performed in an iterative process until all digits in a given category have been selected. In addition, it is worth noting that the order of the digits also carries additional information. For example, the digits “0” and “1” result in a different light field distribution than the digits “1” and “0.” The resulting MNIST array data set contains $\sim 27,000$ to 28,000 training sets and 4400 to 4500 test sets. The distribution of digits within each category in the MNIST data set is not uniform, which impacts the number of training and test sets. The MNIST array data set is regenerated after each round of the iterative process, and discarded data may be selected in subsequent rounds. As the number of training sessions increases, the probability of each digit appearing in the MNIST array data set gradually tends toward equality.

MNIST repeatable array data set: it builds on the MNIST array data set. Unlike the MNIST array data set, identical digits can be entered in the process of forming an array using random digits. The introduction of identical digits also requires encoding the order of combinations in the array. Due to the repeatability of the digits in the array within this data set, the MNIST repeatable array data set does not require rounding of digits.

4.5 Loss Function of OAM-Encoded $D^{2} NN$

We define the classical mean square error (MSE) loss function $L_{MSE}$ to calculate the difference between the predicted output $E$ and the ground truth target $G$ , which can be expressed as $L_{MSE} = \frac{1}{N} \sum_{i}^{N} {| E_{i} - G_{i} |}^{2},$ (11)where $N$ is the number of diffractive units in the output layer, which is set to $200 \times 200$ in the OAM-encoded $D^{2} NNs$ .

In traditional $D^{2} NN$ training, the softmax cross-entropy (SCE) loss function is often used in addition to the MSE loss function. The SCE loss function quantifies the degree of difference between two different probability distributions of the same random variable, which in diffractive networks is expressed as the difference between the true and predicted probability distributions. The smaller the value of the cross-entropy, the better the model prediction. The function $L_{SCE}$ can be expressed as $E_{i} = \frac{y_{i}}{\sum_{j} y_{j}},$ (12) $L_{SCE} = - \sum_{i} G_{i} \log E_{i},$ (13)where it is assumed that there is an array $Y$ with a total of $j$ numbers and $y_{i}$ denotes the $i$ ’th element in $Y$ with a softmax value of $E_{i}$ . $G$ represents the ground-truth target. The SCE loss function reduces the contrast of the output light in different spatial distributions, thereby effectively enhancing the inference accuracy of the classification. However, this performance improvement comes at the expense of the expected power efficiency of the network’s output. In the case of OAM-encoded $D^{2} NNs$ , the output purity of the OAM beam is also a critical factor to consider. Therefore, pursuing higher accuracy at the expense of generating a loss function that compromises output purity is not a viable option. While the SCE loss function is useful in certain scenarios, it is not the optimal choice for OAM-encoded $D^{2} NNs$ , where both accuracy and output purity are important factors.

Table 2 shows the relevant performance parameters for our different network models. Our models were performed on a server [GeForce RTX 3080 Ti graphical processing unit (GPU, Nvidia Inc.), Intel(R) Core(TM) i9-10900K @3.70 GHz central processing unit (CPU, Intel Inc.) and 64 GB of RAM, running the Windows 10 operating system (Microsoft)] with Python (v3.9.13) and PyTorch (1.11.0+cu113) for simulation computations. All the models were trained with 50 epochs. All the models were optimized using the built-in Adam optimizer. The learning rate was set to 0.01.

4.6 Optical Demonstration of OAM-Encoded $D^{2} NN$

The demonstration of optically simulating the entire model of the OAM-encoded $D^{2} NN$ is challenging to realize. Taking COMSOL Multiphysics software as an example, the size of the diffractive layer of OAM-encoded $D^{2} NN$ is $(200 \times 0.53 \times 1.55) = 164.3 μ m$ , and the total length of the model is $(1000 \times 1.55 \times 6) = 9300 μ m$ . The limit of the mesh delineation in COMSOL calculations ranges from one-quarter of a wavelength to one-sixth of a wavelength (i.e., between 0.2583 and $0.3875 μ m$ ). To simulate the full OAM-encoded $D^{2} NN$ , the required computer memory would be astronomical and unattainable. In order to show the consistency of our theoretical results in Python with the COMSOL Multiphysics software, we used COMSOL Multiphysics software to build a five-layer structure with $50 pixels \times 50 pixels$ for model demonstration, as well as a single-layer structure with $30 pixels \times 30 pixels$ for simulation. Figure 7(b) shows the light field distribution in the input side of the digit “9” when irradiated by a multiplexed OAM beam. Figure 7(c) shows the light-field distribution modulated by the diffractive layer at the output plane. It can be seen that the simulation results from the COMSOL Multiphysics software are highly consistent with the theoretical results obtained from Python. We believe that the simulation results can provide support and guidance for the experiments.

Figure 7.(a) The left figure shows the geometrical model of the five layer $D^{2} NN$ with the pixel size of $50 \times 50$ , and the right figure shows the mask model of the number “9” illuminated by the OAM beam. (b) The simulation of the incident OAM beam. (c) The simulation of the output plane by a one-layer $D^{2} NN$ with the pixel size of $30 \times 30$ . (b), (c) The figures from left to right are amplitude distribution simulated with Python, amplitude distribution simulated with COMSOL Multiphysics software, phase distribution simulated with Python, and phase distribution simulated with COMSOL Multiphysics software.