Abstract
Keywords
1 Introduction
The exponential growth of information and data processing has led to bottlenecks in the continuous improvement of performance for traditional electronic hardware processors.1 To address this problem, all-optical computing using photons as information carriers has become a promising solution.2
A diffractive deep neural network () is a series of successive diffractive layers designed in a computer using error backpropagation and stochastic gradient descent methods.11 Unlike machine vision systems that use conventional optics, the diffractive layer of consists of a series of two-dimensional passive pixel arrays. Each pixel point on the diffractive layer is a parameter that can be learned by the computer and can be used for independent complex-valued tuning of the light field. Based on its capabilities in optical information processing, has been applied to image recognition,11,26
Diffractive networks performed in passive optical elements have the advantages of fast processing speed and low energy consumption, while also enabling flexible utilization of various degrees of freedom of light in the network. For example, when using broadband light instead of monochromatic light to illuminate the diffractive networks, spectrally encoded machine vision applications,15,38 parallel computing,39 snapshot multispectral imaging,48 and spatially controlled wavelength multiplexing/demultiplexing49 can be accomplished. In addition, the linear transformation of polarization multiplexing can be achieved by using the polarization properties of light in diffractive networks instead of being based on birefringence or polarization-sensitive materials,50 which fully demonstrates the classification and computational potential of diffractive networks in complex-valued matrix vector operations. So far, the phase, amplitude, polarization, and wavelength of light have been applied in different diffractive networks to perform the specific required computational tasks.
Sign up for Advanced Photonics Nexus TOC. Get the latest issue of Advanced Photonics Nexus delivered right to you!Sign up now
As another important property of light, the orbital angular momentum (OAM) modes carried by vortex beams (VBs) are widely used in various fields by virtue of the unique properties brought about by their wavefront structure.51
Here, we report on the strategy of OAM-encoded diffractive deep neural networks (OAM-encoded ) which encodes the spatial information of objects into OAM modes of light by using deep-learning-trained diffractive layers to perform recognition and classification in vortex light multiplexed by different OAM modes. We use a VB that multiplexes 10 OAM modes with different topological charges while maintaining equal weights. And the beam is used to illuminate handwritten digits, which then pass through five diffractive layers of . The modulated vortex light is obtained at the output, and its OAM spectrum is analyzed. The normalized intensity distribution of each OAM in the OAM spectrum is assigned to a digit/class.
(1) First, we demonstrate a single detector OAM-encoded for a single task classification. We achieve a blind accuracy of 85.43% for the Mixed National Institute of Standards and Technology (MNIST) data set.67 For comparison, the spectrally encoded single-pixel machine vision without image reconstruction achieved blind test accuracy of 84.02% for the same data sets.15 (2) In addition, we show a single detector OAM-encoded for multitask classification. To evaluate the discriminative criteria for multi-object classification, we propose the self-defined MNIST array data set and MNIST repeatable array data set (see Sec. 4.4). Most of the previous multitask classification works were performed on several different data sets for parallel recognition.16,39 However, their accuracies were calculated separately and independently for each data set; few of them were computed in parallel on the same data set. The MNIST array data set and MNIST repeatable array data set will give some digits as a digit array for classification each time. When any one or more digits in the input are inferred incorrectly, we assume that the digit array is judged incorrectly. So, there are a large number of cases where the correct inference of just one digit in an array is attributed to misclassification. We achieved a blind test accuracy of 64.13% for the MNIST array data set. In fact, there are 45 inferred categories in the MNIST array data set, which is significantly larger than the 10 categories in the MNIST data set. (3) Moreover, we design a multidetector OAM-encoded for repeatable multitask classification. By measuring multiple OAM spectra of beams and comparing their intensities, we achieve parallel classification for two-digit, three-digit, and four-digit MNIST repeatable array data sets. Although using the MNIST array data set and the MNIST repeatable array data set instead of the MNIST data set undoubtedly increases the difficulty of judgment, the advantages of advanced parallel classification are highlighted by the process of promoting a single task into multiple tasks.
As shown in Table 1, this work achieves a breakthrough in parallel classification by utilizing the OAM degree of freedom compared to other existing designs. We believe that OAM-encoded provide a powerful framework to further improve the capability of all-optical parallel classification and OAM-based machine vision tasks. In the near future, the development of OAM mode multiplexing/demultiplexing technology may enable the application of OAM combs consisting of hundreds of OAM modes.60 The advancement will be possible to introduce more OAM modes into the OAM-encoded , and thus break through to a higher degree of parallelism for solving more complex multitasking parallel classifications.
Reference | Degree of | Footprint | Function | Performance | Parallel | Single |
This work | OAM | Image recognition | Accuracy: 85.49% | Yes | Yes | |
— | Image recognition | Accuracy: 93.39% | No | No | ||
Wavelength | Image recognition | Accuracy: 91.29% (84.02%) | No | Yes | ||
Wavelength | Image recognition | Accuracy: 87.74% | No | Yes | ||
Wavelength | Image recognition | Accuracies of four tasks are 92.8%, 83.0%, 81.0%, and 90.4%, respectively | Yes | No | ||
Wavelength | Multispectral imaging | Filter transmission efficiency: | — | — | ||
Wavelength | Spectral filters | Process optical waves over a continuous, wide range of frequencies | — | — | ||
Polarization | Image recognition | Accuracy: 93.75% | Yes | No | ||
Polarization | Linear transformations | Perform multiple complex-valued, arbitrary linear transformations using polarization multiplexing | — | — | ||
OAM | 3 cm × 3 cm | Logic operation | Proposed an OAM logical operation | — | — | |
OAM | 3 cm × 3 cm | Optical communication | The diffraction efficiency and mode conversion purity: | — | — | |
The bit error rates: | ||||||
OAM | Holography | 10 multiplexed OAM modes among five spatial depths in deep multiplexing holography | — | — | ||
OAM | Spectral detection | Optical operations/electronic operations: | — | — | ||
Table 1. Comparison with other
2 Results
2.1 Design of OAM-Encoded
In this paper, we demonstrate an approach to incorporate OAM into , which encode the spatial information of objects into the OAM modes of light. Our approach is based on the Fresnel scalar diffraction theory, and we propose three different variants of OAM-encoded , as shown in Fig. 1. The schematic diagram illustrates the OAM-encoded structures and highlights the similarities and differences between the proposed OAM-encoded . The similarity among the proposed OAM-encoded is that they are all composed of five diffractive layers, with a constant spacing of 1.55 mm between the input layer and the diffractive layer, as well as between the diffractive layers and between the diffractive layer and the output layer. The distance is determined by the qualifying conditions of the Fresnel scalar diffraction theory. The number of diffractive units per layer is . These diffractive networks are trained to run independently without being coupled to other networks, although they have the same number of layers and neurons. At the input, an OAM mode is generated by using a Laguerre–Gaussian (LG) beam operating at 1550 nm, with a waist radius of . Ten OAM modes with are selected, each corresponding to one of the 10 categories of handwritten digits in the MNIST data set. The to OAM modes represent digits 0 to 4, while the to OAM modes represent digits 5 to 9. A VB multiplexes 10 OAM modes with equal weights to illuminate handwritten digits. The equation we employed for multiplexing LG beams carrying different OAM modes can be expressed as follows:
Figure 1.Schematic diagrams of the three types of the OAM-encoded
The first scheme with the OAM-encoded demonstrated encoding of a single digit using the OAM mode, then transmitting it through diffractive layers. The results showed that the OAM beam generated in the output plane corresponded to the handwritten digit input, as shown in Fig. 1(a). The OAM-encoded was then used for parallel image recognition. As shown in Fig. 1(b), two different categories of digits were positioned in separate spatial locations, encoded with the OAM mode, and transmitted simultaneously through the diffractive networks. The result was an independent multiplexed OAM beam at the output, with the OAM modes corresponding to the two initial input digit categories. However, using a single detector for parallel detection resulted in an inability to distinguish between identical digits, as the single detector OAM-encoded lacked the ability to detect sequential information. To address this issue, a multidetector OAM-encoded was used to discriminate repeating digits [see Fig. 1(c)]. Compared to the single detector OAM-encoded , the ability of the multiple detectors to encode sequential information between repeating digits allows them to recognize the same digits while further increasing the parallel classification power of the diffractive network.
2.2 Single Detector OAM-Encoded
Here is a demonstration of the recognition of an OAM encoded digit “1” using the single detector (not a single-pixel detector) OAM-encoded . A multiplexed OAM beam is used to illuminate the MNIST handwritten digit “1” and then passes through the diffractive layers, resulting in a modulated OAM beam at the output receiver plane [see Fig. 2(b)]. The optical field distribution of the input OAM-encoded digit “1” in each layer after modulation by the trained diffractive networks is shown in Fig. 2(a). It can be seen that the input digit “1” exhibits a residual, which is caused by the uneven distribution of light intensity in the mixed OAM beam. By our comparison, this type of irradiation does not affect the accuracy of the blind test recognition. After the modulation of the diffractive layers, a second-order OAM beam is reconstructed at the output, which can show that our diffractive network is able to perform the given task relatively well. Although the output light contains non-single OAM modes due to modulation limitations and diffraction effects, the classification can still be inferred from the intensity distributions among different OAM modes. We obtained the normalized intensity distribution of each OAM mode by analyzing the OAM spectra of the OAM beams at the output (see Sec. 4.3). The category of the inferred digit is determined by the highest normalized intensity of the OAM mode. As shown in Fig. 2(c), the intensity of the OAM mode with , corresponding to the digit “1,” is 79.37%, which is significantly higher than that of other OAM modes, demonstrating effective filtering of the vortex light with other OAM modes.
Figure 2.(a) The amplitude and phase distributions of the OAM beams are shown for the input plane, the diffractive layers, and the output plane. The input image is a handwritten digit “1” encoded as an OAM beam with +2 mode. (b) Schematic of the modulation of the light field by the single-detector OAM-encoded
During the training process, the single-detector OAM-encoded reduces the loss value by continuously updating and adjusting the phase and amplitude distribution of the diffractive layers. The loss and accuracy functions for both the training and testing phases are shown in Fig. 2(d), where the dashed lines represent the results of each recognition, and the solid lines represent the average of the results of the three recognitions. From the mean curves, it can be seen that the single-detector OAM-encoded experiences a sharp drop in the loss function at the beginning of the iterative process and then stabilizes after a few iterations. In addition, the test accuracy is slightly higher than the training accuracy, and the loss function exhibits smoother fluctuations during the test phase. The blind accuracy of the single-detector OAM-encoded for the MNIST data set was found to be 85.49% [as shown in Fig. 2(e)]. The accuracy of using OAM encoding is essentially the same as that of with wavelength encoding when compared to spectrally encoded single-pixel machine vision using diffractive networks that do not reconstruct images.15 It should be noted that the single detector of OAM-encoded is not a single-pixel detector, but rather a single interferometer-like detector (see Sec. 4.3). This shows that this single-detector OAM-encoded design can efficiently perform a single-digit recognition task.
2.3 Single-detector OAM-Encoded
Following our demonstration of single image classification using OAM-encoded , we present a more challenging application of the same framework: single-detector OAM-encoded for multitask classification. In Fig. 3(b), by simultaneously irradiating two different digits, “7” and “0,” with independent spatial distributions as an array to the input layer, OAM beams are generated at the center of the output layer, multiplexing the OAM modes with and corresponding to each of the two digits. The OAM-encoded multiplexes the spatial information of both digits into the same OAM beams, effectively utilizing the orthogonality of the OAM modes. However, if the two input digits have the same label, using the highest normalized intensity measure may lead to indistinguishable outcomes. For example, whether we input two digits “2” as an array or one digit “2” combining an array of other digits, a single-detector OAM-encoded cannot accurately determine how many digits “2” are present at the input because only the highest intensity is considered as the judgment criterion, which can lead to a large error in the network. To address this issue, we utilize a modified MNIST array data set that prevents the inclusion of digits with the same label in a single array (see Sec. 4.4). In Fig. 3(a), the two input digits are modulated by the diffractive layers to produce the optical field in the output plane with the expected OAM modes. By detecting the OAM spectra of the OAM beams at the output, the two OAM modes with the highest normalized intensity represent the classes of the presumed digits [see Fig. 3(c)]. Among them, the normalized intensity of the OAM mode with corresponding to the digit “7” is 38.97%, and the normalized intensity of the OAM mode with corresponding to the digit “0” is 35.57%, which far exceeds the other modes. Although OAM modes with the same proportional intensity distribution should theoretically be obtained at the output, the problem of different intensities between the two OAM modes is inevitable due to the limitations of the diffractive network modulation capability. However, this uneven distribution of intensities only slightly affects the accuracy of the inference (after testing, the accuracy error caused by this distribution does not exceed 1%).
Figure 3.(a) The amplitude and phase distribution of the OAM beams in the input plane, diffractive layers, and output plane. The input handwritten digits are “7” and “0,” which correspond to the multiplexed OAM beams that produce “-3” and “+1” OAM modes. (b) Schematic of the light field modulation by single-detector OAM-encoded
After iterative training convergence, our single-detector OAM-encoded for multitask classification can achieve a blind measurement accuracy of 64.13% [see Fig. 3(d)]. The test results obtained indicate that the accuracy of the single-detector OAM-encoded , which performs parallel recognition of multiple digits, is lower compared to the previously reported . In terms of accuracy requirements, the OAM-encoded must correctly recognize all digits in the input array. As can be seen from the confusion matrix, there are actually 45 categories to be recognized in the MNIST array data set, which is significantly larger than the 10 categories in the MNIST data set [see Fig. 3(e)]. It is the substantial increase in task complexity that causes the plummeting of our blind test accuracy for multitask classification compared to that for single-task classification.
2.4 Multidetector OAM-Encoded
Next, when considering the ability of OAM-encoded to perform parallel recognition of large batches of images, it is necessary to load the sequence of digits into the light field. In addition, the reason we use multiple detectors is to simultaneously measure the OAM spectrum of multiplexed OAM beams at the output plane, which cannot be realized by using a single detector. If we separate the OAM beams at the output and utilize multiple detectors for OAM detection, we can enhance the capability of the OAM-encoded to process multiple images and introduce multiple digits at the input for multitask classification. In addition, we can use the positional information between different detectors to encode the sequential information of the same digits in an array and achieve parallel recognition of repeatable digit tasks.
Therefore, we propose a multidetector OAM-encoded for repeatable multitask classification that can encode repeatable numerical order using spatial information to enhance the parallel ability of the diffractive network to process more complex information. Unlike the first two schemes that generate a single multiplexed OAM beam at the central location, multiple OAM beams are generated at discrete spatial locations in the output plane. The number of generated OAM beams is equal to the number of digits in the input array, facilitating the use of multiple detectors for identification and classification. Figure 4(b) shows a schematic demonstration of the four-detector OAM-encoded . When the four digits are modulated by the diffractive layers, they will produce OAM beams with the corresponding OAM modes at the specified spatial locations in the output layer. Figure 4(a) shows the amplitude and phase of the input two, three, and four digits at different positions in the input layer, diffractive layers, and output layer, respectively. It can be seen that the intensities of different output OAM beams are not uniformly distributed, which is similar to the problem encountered in single-detector OAM-encoded for single-task classification, and is caused by the limitation of the diffractive network’s own modulation capability. In addition, it is shown in Fig. 4(a) that there is only a logical correspondence between our input and output layers for digital recognition, and no direct correspondence in the optical path propagation. When the digits “6” and “0” are entered, the intensity of the generated OAM mode and corresponding to their digit classification accounts for 46.55% and 69.77% of the OAM beam, respectively. When the arrays “6,” “1,” and “3” with repeatable digits are input, the normalized intensities of the corresponding OAM modes , , and are 51.78%, 40.98%, and 45.20% of the output, respectively. And the OAM modes , , , and corresponding to the array containing repeatable digits “2,” “1,” “7,” and “3” account for 46.77%, 42.27%, 38.84%, and 34.73% of the total intensity, respectively. These proportions exceed the intensity accounted for by the other OAM modes [see Fig. 4(c)]. It can be seen that the multidetector OAM-encoded can handle the parallel recognition task excellently when spatially separated OAM beams are generated at the output and jointly detected by the same number of detectors.
Figure 4.(a) From top to bottom, the multidetector OAM-encoded
The accuracy curves obtained from successive iterative tests show that the multidetector OAM-encoded achieves blind test accuracies of 70.94%, 52.41%, and 40.13% for two-digit, three-digit, and four-digit MNIST repeatable array data sets [see Fig. 5(a)]. Facing the same challenge as the single-detector OAM-encoded for multitask classification, the rapid increase in the number of labels in the repeatable array data set further degrades the blind testing accuracy of the network. The two-digit, three-digit, and four-digit data sets have 100, 1000, and 10,000 labels, respectively. The difficulty is much higher than that of the original MNIST data set because it requires correctly classifying every digit in the array. In three-detector and four-detector OAM-encoded , there are too many labels consisting of different digits, and it is not feasible to display a pixel map of this size within the limited space for the inserted image. However, if we only capture a portion of the confusion matrix, we would sacrifice the comprehensiveness of all the data. Therefore, we choose a scaled-down version of the confusion matrix for the inserted image while employing a localized zoom approach [Fig. 5(b)]. In addition, the results of the multidetector OAM-encoded for repeatable multitask classification show that using more digits for parallel classification within the same array leads to a further decrease in classification accuracy. The ability of the OAM-encoded to handle more digits can be improved by adopting certain approaches, such as increasing the size of the diffractive layer and expanding the number of neurons used for recognition.
Figure 5.(a) The loss function and accuracy function of the two-detector, three-detector, and four-detector OAM-encoded
3 Discussion and Conclusions
Experimental implementation of typically uses a spatial light modulator to modulate the light source and 3D printing to fabricate metasurfaces designed by an electronic computer. Limited by the precision size of 3D printing, this fabrication method is typically only available for terahertz bands. There are two main challenges in building OAM encoded experimentally: sample fabrication and experimental measurement. Here, the OAM-encoded operates at the wavelength of 1550 nm, which corresponds to pixel sizes of . The diffractive layer of the OAM-encoded can be fabricated by micro/nanoprocessing technology compatible with CMOS technology, as the current state-of-the-art e-beam lithography technology has a fabrication resolution of only a few nanometers. However, there are still certain challenges left to be considered in the fabrication process due to the on-chip multilayer structures. These challenges may include issues related to overlay, alignment, and other aspects68,69 that need to be solved with improved technology.
When detecting the spectrum of the output OAM beam, it can be analyzed using interferometric methods, diffractive methods, and other detection methods.60,61,67 In terms of measuring the diffractive network, here we take the interferometric method as an example. This method can detect the OAM spectra of multiplexed OAM beams, not only the single OAM mode. The measurement details of the detector are outlined in Sec. 4.3. For the MNIST data set and the MNIST array data set, a single detector at the output plane of the diffractive network is sufficient for OAM spectrum analysis. However, for the MNIST repeatable array data set, we need to use multiple detectors to achieve simultaneous detection of different OAM modes corresponding to different categorized digits.
At the same time, the OAM-encoded require an interferometer detector with a high signal-to-noise ratio and high sensitivity, considering reflections, material absorptions, scattering, and other loss issues; we can attempt to decrease the sensitivity and robustness requirements of the detector. One approach is to increase the intensity of the optical signal received by the detector, which can be achieved by reducing the number of layers to minimize absorption and reflection losses. Note that there is always a trade-off between classification accuracy and output efficiency. As we are dealing with an optical classification network, we only need the detected effective optical signal to meet the minimal requirements for classification. Despite the difficulties, we believe that there is great potential for realizing this scheme of OAM-encoded as technology develops.
In summary, we have proposed and investigated an all-optical parallel classification using OAM mode-encoded diffractive networks, which can encode the spatial information of multiple objects as OAM modes of the VB. And then we utilize OAM spectra to analyze the OAM mode normalized intensity distribution for multitask optical classification. If the inference accuracy of the existing OAM-encoded can be further improved, it can be extended from target recognition to other deep-learning tasks, such as multilabel classification and dynamic image recognition. We also envision introducing more OAM modes (this may require the use of a more advanced multimode OAM comb as a light source60) to solve more complex tasks. Finally, we expect that the OAM-encoded can provide a new feasible pathway for all-optical parallel classification and OAM-based machine vision.
4 Appendix: Materials and Methods
4.1 Forward Propagation Model of the OAM-Encoded
Traditional deep neural networks rely on forward propagation, backward propagation, and gradient descent algorithms for brain-like electronic computation by continuously adjusting the weights of electronic neurons. The diffraction of light that occurs during propagation is very similar to the way neurons are connected in deep neural networks. Based on the Rayleigh–Sommerfeld diffraction,70 each diffractive unit/neuron can be regarded as a coherent superposition of light propagating from every diffractive unit/neuron in the preceding diffractive layer. It can also be seen as the source of a secondary wave that is fully connected to the subsequent layer. The equation of light propagation between diffraction layers is given as
Due to the significant computational burden associated with solving the conventional model using the Rayleigh–Sommerfeld formula, the use of Fresnel scalar diffraction theory can effectively reduce the computational effort. This theory can replace the Rayleigh–Sommerfeld formula in the results under the conditions of the layer spacing we use. Here, we use the Fresnel scalar diffraction theory to construct the forward propagation model of OAM-encoded diffractive neural networks. The complex amplitude of the OAM beam of the ’th neuron of the ’th layer can be considered as
4.2 Error Analysis of OAM-encoded
In the main text, the OAM-encoded is based entirely on the ideal case with fixed parameters. When considering the experiments, different factors such as fabrication size errors, optical alignment errors, and material absorption may affect the performance of the diffractive network. Here, we present a systematic analysis of the various types of error problems that may be encountered by OAM-encoded .
4.2.1 Deviation analysis of the pixel size and the layer spacing
According to the Fresnel scalar diffraction theory, the spacing between layers of the diffractive network should be at least 10 times larger than the size of entire layer. Therefore, we grouped the pixel size and optical full-sized errors together for analysis. We assumed a deviation of in the manufacturing dimensions, which is much larger than the fabrication error of the CMOS machining process.68,69 We considered an error range of 0.8 times the pixel size and the corresponding layer spacing, as well as a range of 1.2 times the pixel size and the corresponding layer spacing. As shown in Fig. 6(a), the accuracy of the OAM-encoded varies within 1% of this error range. Therefore, we believe that the errors in pixel size and layer spacing caused by processing and manufacturing do not affect the OAM-encoded .
Figure 6.The different colored curves represent different diffractive networks, as illustrated in the square diagram located in the lower left corner. (a) The deviation of the pixel size and the layer spacing. The horizontal coordinate represents the error range from 0.8 times the pixel size and the corresponding layer spacing to 1.2 times the pixel size and the corresponding layer spacing. (b) The analysis of the deviation of the object misalignment in horizontal and vertical directions. (c) The analysis of the deviation of the misalignment layer. The left image represents a random misalignment error of 5% for each layer, while the right image represents a random misalignment error of 10% for each layer.
4.2.2 Deviation analysis of the object misalignment
First, we consider the possible object misalignment error between the incident OAM beam and the digital mask. We introduced deviations of 2%, 4%, 6%, 8%, and 10% in both the horizontal and vertical directions of the object. For each of these object misalignment errors, we tested all five types of diffractive networks mentioned in our main text. As shown in Fig. 6(b), when the deviation of object misalignment is within 5% in both the horizontal and vertical directions, the accuracy of all OAM-encoded , except for S-OAM-encoded -M (see Table 2 for the nomenclature), fluctuates within 1%. Therefore, our diffractive networks could ensure that the deviation of the incident beam from the digital mask does not exceed 5%, which is smaller than the range of fabrication error.68,69
Training time (h) | Training loss | Training accuracy (%) | Test loss | Test accuracy (%) | |
S-OAM-encoded | 12.74 | 0.402 | 84.30 | 0.343 | 85.43 |
S-OAM-encoded | 5.69 | 0.708 | 57.42 | 0.667 | 64.13 |
M-OAM-encoded | 6.04 | 0.820 | 67.69 | 0.772 | 70.94 |
M-OAM-encoded | 4.09 | 1.345 | 48.94 | 1.238 | 52.41 |
M-OAM-encoded | 3.19 | 1.970 | 36.25 | 1.932 | 40.13 |
Table 2. Various indices for single-detector OAM-encoded
In addition, we also observed an interesting phenomenon regarding the three-detector and four-detector OAM-encoded . Surprisingly, their accuracy seems to increase when the object misalignment error is around 5%. We hypothesize that this effect may be caused by misidentification of certain numbers when the incident beam deviates (e.g., when the OAM beam shifts horizontally to the right, it can cause the light intensity distribution of the number “8” to resemble that of the number “3” due to the nonuniform distribution of the light intensity of multiplexed OAM beams).
4.2.3 Deviation analysis of layer misalignment
Here, we selected two values for the misalignment error: 5% and 10%. This indicates that the layers would experience dislocations of 5% or 10% in random directions. As shown in Fig. 6(c), the horizontal coordinates represent the number of diffractive layers where the corresponding misalignment error occurred. It has been proven that the OAM-encoded is highly robust against layer alignment errors, with minimal impact on accuracy. In addition, to explore the limit of the OAM-encoded ’s sensitivity to layer alignment errors, we conducted additional tests on the single-detector OAM-encoded for single-task classification with a 20% misalignment error (see Fig. 6). The accuracy of the OAM-encoded starts to exhibit a slight decline of 1% under these conditions. Consequently, we conclude that the performance of diffractive network can be reliably maintained as long as the alignment bit error between layers remains within 20% during sample processing and experimental testing.
4.2.4 Absorption error analysis of materials
As for the absorption effect, the material we used for the diffractive layer is silicon nitride, which corresponds to an extinction coefficient in the wavelength of 1550 nm and does not have an absorption effect in the simulation. Considering that the fabricated silicon nitride material may have a small extinction coefficient during the experimental test, we assumed to be 0.05 and incorporated it into the updated diffractive network for testing. After testing, the loss of is . This may be due to the thickness of the diffractive network is about , which almost fails to produce any absorption.
4.2.5 Reflection error analysis of diffractive layers
The loss of the whole OAM-encoded system is mainly due to the reflection from the diffractive layers. When we assume that the beam enters the diffractive layer with positive incidence, the transmittance can be calculated as
4.3 OAM Spectrum Analysis
Multiple OAM states can appear in the same beam and are not limited to a single OAM mode. Similar to the spectrum that represents the intensity weights of different frequencies or wavelengths, the intensity weights of different OAM channels on the same beam are called the OAM spectrum. The spiral harmonic is the eigenwave function of OAM, and the beam can be represented by the spiral harmonic in the column coordinates as
Since the value is independent of the parameter , the relative intensity of such a helical harmonic is
4.4 Preparation of Data Sets
The MNIST array data set and the MNIST repeatable array data set are used in the study to evaluate the discriminative criteria for multi-object classification in the proposed OAM-encoded .
MNIST array data set: The digits in the MNIST data set are divided into 10 classes according to different labels, and the number of digits in each class is recorded. The labels of two random classes are arbitrarily selected using the shuffle function and combined into a label group containing two labels in no distinguishable order. Then, the data corresponding to the labels is selected separately from the data set, and the two selected data are stitched together into a new array. The generation of new arrays and label groups is performed in an iterative process until all digits in a given category have been selected. In addition, it is worth noting that the order of the digits also carries additional information. For example, the digits “0” and “1” result in a different light field distribution than the digits “1” and “0.” The resulting MNIST array data set contains to 28,000 training sets and 4400 to 4500 test sets. The distribution of digits within each category in the MNIST data set is not uniform, which impacts the number of training and test sets. The MNIST array data set is regenerated after each round of the iterative process, and discarded data may be selected in subsequent rounds. As the number of training sessions increases, the probability of each digit appearing in the MNIST array data set gradually tends toward equality.
MNIST repeatable array data set: it builds on the MNIST array data set. Unlike the MNIST array data set, identical digits can be entered in the process of forming an array using random digits. The introduction of identical digits also requires encoding the order of combinations in the array. Due to the repeatability of the digits in the array within this data set, the MNIST repeatable array data set does not require rounding of digits.
4.5 Loss Function of OAM-Encoded
We define the classical mean square error (MSE) loss function to calculate the difference between the predicted output and the ground truth target , which can be expressed as
In traditional training, the softmax cross-entropy (SCE) loss function is often used in addition to the MSE loss function. The SCE loss function quantifies the degree of difference between two different probability distributions of the same random variable, which in diffractive networks is expressed as the difference between the true and predicted probability distributions. The smaller the value of the cross-entropy, the better the model prediction. The function can be expressed as
Table 2 shows the relevant performance parameters for our different network models. Our models were performed on a server [GeForce RTX 3080 Ti graphical processing unit (GPU, Nvidia Inc.), Intel(R) Core(TM) i9-10900K @3.70 GHz central processing unit (CPU, Intel Inc.) and 64 GB of RAM, running the Windows 10 operating system (Microsoft)] with Python (v3.9.13) and PyTorch (1.11.0+cu113) for simulation computations. All the models were trained with 50 epochs. All the models were optimized using the built-in Adam optimizer. The learning rate was set to 0.01.
4.6 Optical Demonstration of OAM-Encoded
The demonstration of optically simulating the entire model of the OAM-encoded is challenging to realize. Taking COMSOL Multiphysics software as an example, the size of the diffractive layer of OAM-encoded is , and the total length of the model is . The limit of the mesh delineation in COMSOL calculations ranges from one-quarter of a wavelength to one-sixth of a wavelength (i.e., between 0.2583 and ). To simulate the full OAM-encoded , the required computer memory would be astronomical and unattainable. In order to show the consistency of our theoretical results in Python with the COMSOL Multiphysics software, we used COMSOL Multiphysics software to build a five-layer structure with for model demonstration, as well as a single-layer structure with for simulation. Figure 7(b) shows the light field distribution in the input side of the digit “9” when irradiated by a multiplexed OAM beam. Figure 7(c) shows the light-field distribution modulated by the diffractive layer at the output plane. It can be seen that the simulation results from the COMSOL Multiphysics software are highly consistent with the theoretical results obtained from Python. We believe that the simulation results can provide support and guidance for the experiments.
Figure 7.(a) The left figure shows the geometrical model of the five layer
Biographies of the authors are not available.
References
[1] M. M. Waldrop. The chips are down for Moore’s law. Nature, 530, 144-147(2016).
[3] D. Solli, B. Jalali. Analog optical computing. Nat. Photonics, 9, 704-706(2015).
[44] M. Veli et al. Terahertz pulse shaping using diffractive surfaces. Nat. Commun., 12, 37(2021).
[70] L. Mandel, E. Wolf. Some properties of coherent light. J. Opt. Soc. Am., 51, 815-819(1961).
Set citation alerts for the article
Please enter your email address