• Advanced Photonics
  • Vol. 7, Issue 1, 016004 (2025)
James Spall1,2,†, Xianxin Guo1,2,*, and Alexander I. Lvovsky1,2,*
Author Affiliations
  • 1University of Oxford, Clarendon Laboratory, Oxford, United Kingdom
  • 2Lumai Ltd., Wood Centre for Innovation, Oxford, United Kingdom
  • show less
    DOI: 10.1117/1.AP.7.1.016004 Cite this Article Set citation alerts
    James Spall, Xianxin Guo, Alexander I. Lvovsky, "Training neural networks with end-to-end optical backpropagation," Adv. Photon. 7, 016004 (2025) Copy Citation Text show less
    Illustration of optical training. (a) Network architecture of the ONN used in this work, which consists of two fully connected linear layers and a hidden layer. (b) Simplified experimental schematic of the ONN. Each linear layer performs optical MVM with a cylindrical lens and an SLM that encodes the weight matrix. Hidden layer activations are computed using SA in an atomic vapor cell. Light propagates in both directions during optical training. (c) Working principle of SA activation. The forward beam (pump) is shown by solid red arrows and the backward (probe) by purple wavy arrows. The probe transmission depends on the strength of the pump and approximates the gradient of the SA function. For high forward intensity (top panel), a large portion of the atoms are excited to the upper level. Stimulated emission produced by these atoms largely compensates for the absorption due to the atoms at the ground level. For the weak pump (bottom panel), the excited level population is small, and the absorption is significant. (d) NN training procedure. (e) Optical training procedure. Both signal and error propagations in the two directions are fully implemented optically. Loss function calculation and parameter update are left for electronics without interrupting the optical information flow.
    Fig. 1. Illustration of optical training. (a) Network architecture of the ONN used in this work, which consists of two fully connected linear layers and a hidden layer. (b) Simplified experimental schematic of the ONN. Each linear layer performs optical MVM with a cylindrical lens and an SLM that encodes the weight matrix. Hidden layer activations are computed using SA in an atomic vapor cell. Light propagates in both directions during optical training. (c) Working principle of SA activation. The forward beam (pump) is shown by solid red arrows and the backward (probe) by purple wavy arrows. The probe transmission depends on the strength of the pump and approximates the gradient of the SA function. For high forward intensity (top panel), a large portion of the atoms are excited to the upper level. Stimulated emission produced by these atoms largely compensates for the absorption due to the atoms at the ground level. For the weak pump (bottom panel), the excited level population is small, and the absorption is significant. (d) NN training procedure. (e) Optical training procedure. Both signal and error propagations in the two directions are fully implemented optically. Loss function calculation and parameter update are left for electronics without interrupting the optical information flow.
    Multilayer ONN characterization. (a) Scatterplots of measured-against-theory results for MVM-1 (first layer forward), MVM-2a (second layer forward), and MVM-2b (second layer backward). All three MVM results are taken simultaneously. Histograms of the signal and noise error for each MVM are displayed underneath. (b) First layer activations ameas(1) measured after the vapor cell, plotted against the theoretically expected linear MVM-1 output ztheory(1) before the cell. The green line is a best-fit curve of the theoretical SA nonlinear function. (c) Amplitude of a weak constant probe passed backward through the vapor cell as a function of the pump ztheory(1), with a constant input probe. Measurements for both forward and backward beams are taken simultaneously.
    Fig. 2. Multilayer ONN characterization. (a) Scatterplots of measured-against-theory results for MVM-1 (first layer forward), MVM-2a (second layer forward), and MVM-2b (second layer backward). All three MVM results are taken simultaneously. Histograms of the signal and noise error for each MVM are displayed underneath. (b) First layer activations ameas(1) measured after the vapor cell, plotted against the theoretically expected linear MVM-1 output ztheory(1) before the cell. The green line is a best-fit curve of the theoretical SA nonlinear function. (c) Amplitude of a weak constant probe passed backward through the vapor cell as a function of the pump ztheory(1), with a constant input probe. Measurements for both forward and backward beams are taken simultaneously.
    Optical training performance. (a) Decision boundary charts of the ONN inference output for three different classification tasks, after the ONN has been trained optically (top) or in silico (bottom). (b) Learning curves of the ONN for classification of the “Rings” dataset, showing the mean and standard deviation of the validation loss and accuracy averaged over five repeated training runs. Shown above are decision boundary charts of the ONN output for the test set, after different epochs. (c) Evolution of output neuron values and output errors, for the training set inputs of the two classes. (d) Comparison between optically measured and digitally calculated gradients. Each panel shows gradients for each of the 10 weight matrix elements.
    Fig. 3. Optical training performance. (a) Decision boundary charts of the ONN inference output for three different classification tasks, after the ONN has been trained optically (top) or in silico (bottom). (b) Learning curves of the ONN for classification of the “Rings” dataset, showing the mean and standard deviation of the validation loss and accuracy averaged over five repeated training runs. Shown above are decision boundary charts of the ONN output for the test set, after different epochs. (c) Evolution of output neuron values and output errors, for the training set inputs of the two classes. (d) Comparison between optically measured and digitally calculated gradients. Each panel shows gradients for each of the 10 weight matrix elements.
    DatasetInput neuronsHidden neuronsOutput neuronsLearning rateEpochsBatches per epochBatch size
    Rings2520.01162020
    XOR0.00530
    Arches0.0125
    Table 1. Summary of network architecture and hyperparameters used in both optical and digital training.
    Network layerFunctionImplementation example
    Linear layerMVMFree-space optical multiplier and photonic crossbar array
    DiffractionProgrammable optical mask
    ConvolutionLens Fourier transform
    Nonlinear layerSAAtomic vapor cell, semiconductor absorber, and graphene
    Saturable gainEDFA, SOA, and Raman amplifier
    Table 2. Generalization of the optical training scheme.
    James Spall, Xianxin Guo, Alexander I. Lvovsky, "Training neural networks with end-to-end optical backpropagation," Adv. Photon. 7, 016004 (2025)
    Download Citation