Decision-making and control with diffractive optical networks

Jumin Qiu; Shuyuan Xiao; Lujun Huang; Andrey Miroshnichenko; Dejian Zhang; Tingting Liu; Tianbao Yu

doi:10.1117/1.APN.3.4.046003

Journals >Advanced Photonics Nexus >Volume 3 >Issue 4 >Page 046003 > Article

Advanced Photonics Nexus
Vol. 3, Issue 4, 046003 (2024)

Decision-making and control with diffractive optical networks

Jumin Qiu¹, Shuyuan Xiao^2,3, Lujun Huang^4,*, Andrey Miroshnichenko⁵..., Dejian Zhang¹, Tingting Liu^2,3,* and Tianbao Yu^1,*|Show fewer author(s)

Author Affiliations

¹Nanchang University, School of Physics and Materials Science, Nanchang, China

²Nanchang University, School of Information Engineering, Nanchang, China

³Nanchang University, Institute for Advanced Study, Nanchang, China

⁴East China Normal University, School of Physics and Electronic Science, Shanghai, China

⁵University of New South Wales Canberra, School of Physics and Electronic Science, Canberra, Australia

show less

DOI: 10.1117/1.APN.3.4.046003 Cite this Article Set citation alerts

Jumin Qiu, Shuyuan Xiao, Lujun Huang, Andrey Miroshnichenko, Dejian Zhang, Tingting Liu, Tianbao Yu, "Decision-making and control with diffractive optical networks," Adv. Photon. Nexus 3, 046003 (2024) Copy Citation Text

EndNote(RIS)

BibTex

Plain Text

show less

$DON for decision-making and control. (a)–(c) The proposed network plays the video game of Super Mario Bros. in a human-like manner. In the network architecture, an input layer captures continuous and high-dimensional game snapshots (seeing), a series of diffractive layers choose a particular action through a learned control policy for each situation faced (making a decision), and an output layer maps the intensity distribution into preset action regions to generate the control signals in the games (controlling). (d) Training framework of policy and network. Deep reinforcement learning through an agent interacts with a simulated environment to find a near-optimal control policy represented by a CNN, which is employed as the ground truth to update the DON by an error backpropagate algorithm. (e) The experimental setup of the DON for decision-making and control. (f) The building block of the DON.$

Fig. 1. DON for decision-making and control. (a)–(c) The proposed network plays the video game of Super Mario Bros. in a human-like manner. In the network architecture, an input layer captures continuous and high-dimensional game snapshots (seeing), a series of diffractive layers choose a particular action through a learned control policy for each situation faced (making a decision), and an output layer maps the intensity distribution into preset action regions to generate the control signals in the games (controlling). (d) Training framework of policy and network. Deep reinforcement learning through an agent interacts with a simulated environment to find a near-optimal control policy represented by a CNN, which is employed as the ground truth to update the DON by an error backpropagate algorithm. (e) The experimental setup of the DON for decision-making and control. (f) The building block of the DON.

Download full size | View in the Article

$Playing tic-tac-toe. (a) The schematic illustration of the DON composed of an input layer, hidden layers of three cascaded diffractive blocks, and an output layer for playing tic-tac-toe. (b) and (c) The sequential control of the DON in performing gameplay tasks for X and O. (d) The accuracy rate of playing tic-tac-toe. There is a collection of 87 games utilized for predicting X, obtaining 81 wins and 6 draws in these games. In the rest of the 583 games, O obtains 454 wins, 74 draws, and 21 losses. When previous moves have occupied the predicted position at a turn, such a case is counted as a playing error and occurs 34 times. (e) Dependence of the prediction accuracy on the number of hidden layers.$

Fig. 2. Playing tic-tac-toe. (a) The schematic illustration of the DON composed of an input layer, hidden layers of three cascaded diffractive blocks, and an output layer for playing tic-tac-toe. (b) and (c) The sequential control of the DON in performing gameplay tasks for X and O. (d) The accuracy rate of playing tic-tac-toe. There is a collection of 87 games utilized for predicting X, obtaining 81 wins and 6 draws in these games. In the rest of the 583 games, O obtains 454 wins, 74 draws, and 21 losses. When previous moves have occupied the predicted position at a turn, such a case is counted as a playing error and occurs 34 times. (e) Dependence of the prediction accuracy on the number of hidden layers.

Download full size | View in the Article

Fig. 3. Playing Super Mario Bros. (a) The layout of the designed network for playing Super Mario Bros. (b) and (c) Snapshots of Mario’s jumping and crouching actions by comparing the output intensities of actions. The output intensity of the jump is maximum at the 201st frame, so the predicted action is jump, and Mario is controlled to act, as shown in panel (b). A similar series of prediction and control for another crouch action can also be observed in panel (c). (d) The inverse prediction result. Considering the predicted crouch at the current state is crucial for updating Mario’s action, we use the maximized output intensity of the crouch as input, ignoring the simultaneous output of other actions (Video 1, MP4, 19.8 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s1]).

Download full size | View in the Article

Fig. 4. Playing Car Racing. (a) The layout of the designed network for playing Car Racing. (b) The control of the steering direction and angle of the car with respect to the difference value between the intensities at the current state, normalized between −1 and 1. (c)–(f) Snapshots of controlling the car steering. When the car is facing a left-turn track in panel (c), the output intensity on the left keeps the value greater than the right intensity, allowing continuous control in updating the rotation angle of the left-turn action. A similar control process can also be performed for the right-turn track in panel (e). In addition, the anti-disturbance of the network is validated by introducing (d) the Gaussian blur and (f) Gaussian noise to the game images (Video 2, MP4, 8.36 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s2]; Video 3, MP4, 6.78 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s3]; Video 4, MP4, 16.8 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s4]).

Download full size | View in the Article

Fig. 5. Experimental demonstration of the DON for tic-tac-toe. (a) The photo of the experimental system, where the unlabeled devices are lenses, a spatial filter is used to remove the unwanted multiple-order energy peaks, and a filter is mounted on the camera. (b) The output of the first layer of the sample in Fig. 2(a), and the red arrows represent the polarization direction of the incident light. (c) and (d) The sequential control of the DON in playing the same two games as in Figs. 2(b) and 2(c), respectively. The experimental results are normalized based on simulation results. Sim., simulation result; Exp., experimental result.

Download full size | View in the Article

Jumin Qiu, Shuyuan Xiao, Lujun Huang, Andrey Miroshnichenko, Dejian Zhang, Tingting Liu, Tianbao Yu, "Decision-making and control with diffractive optical networks," Adv. Photon. Nexus 3, 046003 (2024)

Download Citation

EndNote(RIS)

BibTex

Plain Text

Set citation alerts for the article

Tools

Share

Set citation alerts for the article

Save the article for my favorites

Paper Information

微信扫一扫：分享

微信扫一扫：分享