• Advanced Photonics Nexus
  • Vol. 3, Issue 4, 046003 (2024)
Jumin Qiu1, Shuyuan Xiao2,3, Lujun Huang4,*, Andrey Miroshnichenko5..., Dejian Zhang1, Tingting Liu2,3,* and Tianbao Yu1,*|Show fewer author(s)
Author Affiliations
  • 1Nanchang University, School of Physics and Materials Science, Nanchang, China
  • 2Nanchang University, School of Information Engineering, Nanchang, China
  • 3Nanchang University, Institute for Advanced Study, Nanchang, China
  • 4East China Normal University, School of Physics and Electronic Science, Shanghai, China
  • 5University of New South Wales Canberra, School of Physics and Electronic Science, Canberra, Australia
  • show less
    DOI: 10.1117/1.APN.3.4.046003 Cite this Article Set citation alerts
    Jumin Qiu, Shuyuan Xiao, Lujun Huang, Andrey Miroshnichenko, Dejian Zhang, Tingting Liu, Tianbao Yu, "Decision-making and control with diffractive optical networks," Adv. Photon. Nexus 3, 046003 (2024) Copy Citation Text show less
    DON for decision-making and control. (a)–(c) The proposed network plays the video game of Super Mario Bros. in a human-like manner. In the network architecture, an input layer captures continuous and high-dimensional game snapshots (seeing), a series of diffractive layers choose a particular action through a learned control policy for each situation faced (making a decision), and an output layer maps the intensity distribution into preset action regions to generate the control signals in the games (controlling). (d) Training framework of policy and network. Deep reinforcement learning through an agent interacts with a simulated environment to find a near-optimal control policy represented by a CNN, which is employed as the ground truth to update the DON by an error backpropagate algorithm. (e) The experimental setup of the DON for decision-making and control. (f) The building block of the DON.
    Fig. 1. DON for decision-making and control. (a)–(c) The proposed network plays the video game of Super Mario Bros. in a human-like manner. In the network architecture, an input layer captures continuous and high-dimensional game snapshots (seeing), a series of diffractive layers choose a particular action through a learned control policy for each situation faced (making a decision), and an output layer maps the intensity distribution into preset action regions to generate the control signals in the games (controlling). (d) Training framework of policy and network. Deep reinforcement learning through an agent interacts with a simulated environment to find a near-optimal control policy represented by a CNN, which is employed as the ground truth to update the DON by an error backpropagate algorithm. (e) The experimental setup of the DON for decision-making and control. (f) The building block of the DON.
    Playing tic-tac-toe. (a) The schematic illustration of the DON composed of an input layer, hidden layers of three cascaded diffractive blocks, and an output layer for playing tic-tac-toe. (b) and (c) The sequential control of the DON in performing gameplay tasks for X and O. (d) The accuracy rate of playing tic-tac-toe. There is a collection of 87 games utilized for predicting X, obtaining 81 wins and 6 draws in these games. In the rest of the 583 games, O obtains 454 wins, 74 draws, and 21 losses. When previous moves have occupied the predicted position at a turn, such a case is counted as a playing error and occurs 34 times. (e) Dependence of the prediction accuracy on the number of hidden layers.
    Fig. 2. Playing tic-tac-toe. (a) The schematic illustration of the DON composed of an input layer, hidden layers of three cascaded diffractive blocks, and an output layer for playing tic-tac-toe. (b) and (c) The sequential control of the DON in performing gameplay tasks for X and O. (d) The accuracy rate of playing tic-tac-toe. There is a collection of 87 games utilized for predicting X, obtaining 81 wins and 6 draws in these games. In the rest of the 583 games, O obtains 454 wins, 74 draws, and 21 losses. When previous moves have occupied the predicted position at a turn, such a case is counted as a playing error and occurs 34 times. (e) Dependence of the prediction accuracy on the number of hidden layers.
    Playing Super Mario Bros. (a) The layout of the designed network for playing Super Mario Bros. (b) and (c) Snapshots of Mario’s jumping and crouching actions by comparing the output intensities of actions. The output intensity of the jump is maximum at the 201st frame, so the predicted action is jump, and Mario is controlled to act, as shown in panel (b). A similar series of prediction and control for another crouch action can also be observed in panel (c). (d) The inverse prediction result. Considering the predicted crouch at the current state is crucial for updating Mario’s action, we use the maximized output intensity of the crouch as input, ignoring the simultaneous output of other actions (Video 1, MP4, 19.8 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s1]).
    Fig. 3. Playing Super Mario Bros. (a) The layout of the designed network for playing Super Mario Bros. (b) and (c) Snapshots of Mario’s jumping and crouching actions by comparing the output intensities of actions. The output intensity of the jump is maximum at the 201st frame, so the predicted action is jump, and Mario is controlled to act, as shown in panel (b). A similar series of prediction and control for another crouch action can also be observed in panel (c). (d) The inverse prediction result. Considering the predicted crouch at the current state is crucial for updating Mario’s action, we use the maximized output intensity of the crouch as input, ignoring the simultaneous output of other actions (Video 1, MP4, 19.8 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s1]).
    Playing Car Racing. (a) The layout of the designed network for playing Car Racing. (b) The control of the steering direction and angle of the car with respect to the difference value between the intensities at the current state, normalized between −1 and 1. (c)–(f) Snapshots of controlling the car steering. When the car is facing a left-turn track in panel (c), the output intensity on the left keeps the value greater than the right intensity, allowing continuous control in updating the rotation angle of the left-turn action. A similar control process can also be performed for the right-turn track in panel (e). In addition, the anti-disturbance of the network is validated by introducing (d) the Gaussian blur and (f) Gaussian noise to the game images (Video 2, MP4, 8.36 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s2]; Video 3, MP4, 6.78 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s3]; Video 4, MP4, 16.8 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s4]).
    Fig. 4. Playing Car Racing. (a) The layout of the designed network for playing Car Racing. (b) The control of the steering direction and angle of the car with respect to the difference value between the intensities at the current state, normalized between −1 and 1. (c)–(f) Snapshots of controlling the car steering. When the car is facing a left-turn track in panel (c), the output intensity on the left keeps the value greater than the right intensity, allowing continuous control in updating the rotation angle of the left-turn action. A similar control process can also be performed for the right-turn track in panel (e). In addition, the anti-disturbance of the network is validated by introducing (d) the Gaussian blur and (f) Gaussian noise to the game images (Video 2, MP4, 8.36 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s2]; Video 3, MP4, 6.78 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s3]; Video 4, MP4, 16.8 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s4]).
    Experimental demonstration of the DON for tic-tac-toe. (a) The photo of the experimental system, where the unlabeled devices are lenses, a spatial filter is used to remove the unwanted multiple-order energy peaks, and a filter is mounted on the camera. (b) The output of the first layer of the sample in Fig. 2(a), and the red arrows represent the polarization direction of the incident light. (c) and (d) The sequential control of the DON in playing the same two games as in Figs. 2(b) and 2(c), respectively. The experimental results are normalized based on simulation results. Sim., simulation result; Exp., experimental result.
    Fig. 5. Experimental demonstration of the DON for tic-tac-toe. (a) The photo of the experimental system, where the unlabeled devices are lenses, a spatial filter is used to remove the unwanted multiple-order energy peaks, and a filter is mounted on the camera. (b) The output of the first layer of the sample in Fig. 2(a), and the red arrows represent the polarization direction of the incident light. (c) and (d) The sequential control of the DON in playing the same two games as in Figs. 2(b) and 2(c), respectively. The experimental results are normalized based on simulation results. Sim., simulation result; Exp., experimental result.
    Jumin Qiu, Shuyuan Xiao, Lujun Huang, Andrey Miroshnichenko, Dejian Zhang, Tingting Liu, Tianbao Yu, "Decision-making and control with diffractive optical networks," Adv. Photon. Nexus 3, 046003 (2024)
    Download Citation