Backpropagation through nonlinear units for the all-optical training of neural networks

Xianxin Guo; Thomas D. Barrett; Zhiming M. Wang; A. I. Lvovsky

doi:10.1364/PRJ.411104

Journals >Photonics Research >Volume 9 >Issue 3 >Page B71 > Article

Photonics Research
Vol. 9, Issue 3, B71 (2021)

Backpropagation through nonlinear units for the all-optical training of neural networks

Xianxin Guo^{1、2、3、5、†,*}, Thomas D. Barrett^{2、6、†,*}, Zhiming M. Wang^1、7、*, and A. I. Lvovsky^{2、4、8、*}

Author Affiliations

¹Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China

²Clarendon Laboratory, University of Oxford, Oxford OX1 3PU, UK

³Institute for Quantum Science and Technology, University of Calgary, Calgary, Alberta T2N 1N4, Canada

⁴Russian Quantum Center, Skolkovo 143025, Moscow, Russia

⁵e-mail: xianxin.guo@physics.ox.ac.uk

⁶e-mail: thomas.barrett@physics.ox.ac.uk

⁷e-mail: zhmwang@uestc.edu.cn

⁸e-mail: alex.lvovsky@physics.ox.ac.uk

show less

DOI: 10.1364/PRJ.411104 Cite this Article Set citation alerts

Xianxin Guo, Thomas D. Barrett, Zhiming M. Wang, A. I. Lvovsky. Backpropagation through nonlinear units for the all-optical training of neural networks[J]. Photonics Research, 2021, 9(3): B71 Copy Citation Text

EndNote(RIS)

BibTex

Plain Text

show less

ONN with all-optical forward- and backward-propagation. (a) A single ONN layer that consists of weighted interconnections and an SA nonlinear activation function. The forward- (red) and backward-propagating (orange) optical signals, whose amplitudes are proportional to the neuron activations, a(l−1), and errors, δ(l), respectively, are tapped off by beam splitters, measured by heterodyne detection and multiplied to determine the weight matrix update in Eq. (2). This multiplication can also be implemented optically, as discussed in the text. The final update of the weights, as well as the preparation of network input, is implemented electronically. (b) Error calculation at the output layer performed optically or digitally, as described in the text.

Fig. 1. ONN with all-optical forward- and backward-propagation. (a) A single ONN layer that consists of weighted interconnections and an SA nonlinear activation function. The forward- (red) and backward-propagating (orange) optical signals, whose amplitudes are proportional to the neuron activations, a(l−1), and errors, δ(l), respectively, are tapped off by beam splitters, measured by heterodyne detection and multiplied to determine the weight matrix update in Eq. (2). This multiplication can also be implemented optically, as discussed in the text. The final update of the weights, as well as the preparation of network input, is implemented electronically. (b) Error calculation at the output layer performed optically or digitally, as described in the text.

Download full size | View in the Article

Saturable absorber response. (a) The transmission and (b) transmission derivative of an SA unit with optical depths of 1 (left) and 30 (right), as defined by Eqs. (4) and (6), respectively. Also shown in panel (b) are the actual probe transmissions given by Eq. (5), which approximate the derivatives, with and without the rescaling. The scaling factors are 1.2 (left) and 2.5 (right). In the amplitude region (i), the SA behaves as a linear absorber for weak input but then exhibits strong nonlinearity when the pump intensity approaches the saturation threshold. Region (ii) corresponds to strong saturation: the ground-state population is depleted, and the absorber is rendered transparent.

Fig. 2. Saturable absorber response. (a) The transmission and (b) transmission derivative of an SA unit with optical depths of 1 (left) and 30 (right), as defined by Eqs. (4) and (6), respectively. Also shown in panel (b) are the actual probe transmissions given by Eq. (5), which approximate the derivatives, with and without the rescaling. The scaling factors are 1.2 (left) and 2.5 (right). In the amplitude region (i), the SA behaves as a linear absorber for weak input but then exhibits strong nonlinearity when the pump intensity approaches the saturation threshold. Region (ii) corresponds to strong saturation: the ground-state population is depleted, and the absorber is rendered transparent.

Download full size | View in the Article

Fig. 3. Effects of imperfect approximation of the activation function derivative. (a) Feed-forward neural network architecture using a single hidden layer of 128 neurons. (b) Distribution of neuron inputs (EP,in(1)≡z(1)), which is concentrated in the unsaturated region (1) of the SA activation function, g(·). As a result, the approximation error in the linear region (2) is less impactful on the training. (c) The transmission of an SA unit with α0=10, along with the exact and (rescaled for easier comparison) optically approximated transmission derivatives. (d) Performance loss associated with approximating activation function derivatives g′(·) with random functions, plotted as a function of the approximation error, for α0=10 (see Appendix B for details). (e) Average error of the derivative approximation in Eq. (5) as a function of the optical depth of an SA nonlinearity.

Download full size | View in the Article

Fig. 4. Performance on image classification. (a) (i) The fully connected network architecture. (ii) Learning curves for the SA [with either exact derivatives in Eq. (6) of the activation function or their approximation in Eq. (5)] and benchmark ReLU networks. (iii) The final classification accuracy achieved as a function of the optical depth, α0, of the SA cell. (b) (i) The convolutional network architecture. Sequential convolution layers of 32 and 64 channels convert a 28×28 pixel image into a 1024-dimensional feature vector, which is then classified (into NC=10 classes for MNIST and KMNIST, and NC=47 classes for EMNIST) by fully connected layers. Pooling layers are not shown for simplicity. (ii) Classification accuracy of convolutional networks when using various activation functions. The same deep network architecture is applied to all data sets, but the SA networks use mean-pooling, while the benchmark networks use max-pooling. The last row shows the performance of a simple linear classifier as a baseline.

Download full size | View in the Article

Fig. 5. Optical backpropagation through saturable gain (SG) nonlinearity. (a) Fully connected network architecture, which is the same as Fig. 4(a) except for the nonlinearity. (b) Transmission and transmission derivatives of the SG unit with gain factor g0=3. (c) Learning curves for the SG-based ONN and benchmark ReLU networks. (d) The final classification accuracy achieved as a function of the gain.

Download full size | View in the Article

Xianxin Guo, Thomas D. Barrett, Zhiming M. Wang, A. I. Lvovsky. Backpropagation through nonlinear units for the all-optical training of neural networks[J]. Photonics Research, 2021, 9(3): B71

Download Citation

EndNote(RIS)

BibTex

Plain Text

Set citation alerts for the article

Tools

Share

Set citation alerts for the article

Save the article for my favorites

Paper Information