Iterative photonic processor for fast complex-valued matrix inversion

Minjia Chen; Qixiang Cheng; Masafumi Ayata; Mark Holm; Richard Penty

doi:10.1364/PRJ.468097

Journals >Photonics Research >Volume 10 >Issue 11 >Page 2488 > Article

Photonics Research
Vol. 10, Issue 11, 2488 (2022)

Iterative photonic processor for fast complex-valued matrix inversion

Minjia Chen¹, Qixiang Cheng^1、*, Masafumi Ayata², Mark Holm², and Richard Penty¹

Author Affiliations

¹Department of Engineering, Centre for Photonic Systems, Electrical Engineering Division, University of Cambridge, Cambridge CB3 0FA, UK

²Huawei Technologies (Sweden) AB, 164 40 Kista, Sweden

show less

DOI: 10.1364/PRJ.468097 Cite this Article Set citation alerts

Minjia Chen, Qixiang Cheng, Masafumi Ayata, Mark Holm, Richard Penty. Iterative photonic processor for fast complex-valued matrix inversion[J]. Photonics Research, 2022, 10(11): 2488 Copy Citation Text

EndNote(RIS)

BibTex

Plain Text

show less

Graphical explanation of the Min-Max algorithm for a 4×4 processor. Eigenvalues represented by circles in (a)–(d) correspond to colors. (a) Eigenvalues of A shown in the polar coordinate system (all the eigenvalues lie in a half-complex plane, indicating the RICH method converges for a certain ω); (b) rotate all the eigenvalues of A into the right half-plane; (c) eigenvalues of ωoptA; (d) eigenvalues of IN−ωoptA. Now the convergence condition ρ(IN−ωoptA)<1 is satisfied and the fastest convergence rate is also reached.

Fig. 1. Graphical explanation of the Min-Max algorithm for a 4×4 processor. Eigenvalues represented by circles in (a)–(d) correspond to colors. (a) Eigenvalues of A shown in the polar coordinate system (all the eigenvalues lie in a half-complex plane, indicating the RICH method converges for a certain ω); (b) rotate all the eigenvalues of A into the right half-plane; (c) eigenvalues of ωoptA; (d) eigenvalues of IN−ωoptA. Now the convergence condition ρ(IN−ωoptA)<1 is satisfied and the fastest convergence rate is also reached.

Download full size | View in the Article

System architecture of the proposed N×N iterative photonic processor for complex-valued matrix inversion. (a) Workflow of the iterative photonic processing system. The computation includes four main steps: 1) weights loading; 2) gain setting; 3) computation activation; and 4) results readout. VDAC, voltage digital-to-analog converter; IDAC, current digital-to-analog converter. (b) Architecture of an N×N iterative photonic processor. It consists of nine key photonic blocks, including Laser, Summation 1, Input Vectors Fan-Out, Weight Bank, Summation 2, Amplification, Filtering, Detection, and Recirculating Loop. AWG, arrayed waveguide gratings.

Fig. 2. System architecture of the proposed N×N iterative photonic processor for complex-valued matrix inversion. (a) Workflow of the iterative photonic processing system. The computation includes four main steps: 1) weights loading; 2) gain setting; 3) computation activation; and 4) results readout. VDAC, voltage digital-to-analog converter; IDAC, current digital-to-analog converter. (b) Architecture of an N×N iterative photonic processor. It consists of nine key photonic blocks, including Laser, Summation 1, Input Vectors Fan-Out, Weight Bank, Summation 2, Amplification, Filtering, Detection, and Recirculating Loop. AWG, arrayed waveguide gratings.

Download full size | View in the Article

Fig. 3. Models of (a) 1-to-N Fan-Out block, (b) Summation block, (c) Weight Bank block, (d) Laser block, (e) Amplification and Filtering blocks, (f) Detection block, and (g) electronic peripherals.

Download full size | View in the Article

Fig. 4. (a) Typical signal amplitude changes during computation without filtering. (b) Plot of the sine integral function; (c) typical signal amplitude changes during computation after filtering.

Download full size | View in the Article

Fig. 5. Conceptual figure of an integrated 4×4 inverter (without wavelength multiplexing) where the LDs, BPFs, SOAs, and BPDs are monolithically integrated on-chip. TIAs and digital signal processing (DSP) are used for results readout. One column of the inverse matrix can be computed at a time by turning on one of the LDs, while the complete computation results can be obtained by turning on each of the LDs, respectively, or using multiple copies of the unit shown here.

Download full size | View in the Article

Fig. 6. (a), (b) Net computing speed of different-sized N×N photonic RIPs on SOI, Si3N4, and IMOS platforms. The light propagation speed is estimated through the effective indices of the waveguides, while the computing speed is estimated considering light propagation speed, loop length, and number of iterations simultaneously. (a) Inversion rate in terms of GInv/s and (b) processing speed in terms of TMAC/s are shown. (c) Power efficiency of different-sized N×N photonic RIPs.

Download full size | View in the Article

Fig. 7. Matrix weights encoding error for (a)–(e) different DAC bit resolutions and (f) 20 nm wavelength span. Using a 16-bit DAC is enough to guarantee <0.1% relative weight encoding error. The encoding error due to wavelength multiplexing is around 3%. (g) ASE noise powers of different-sized processors when cascading different numbers of SOA stages. Red circles highlight the minimal achievable ASE powers for different-sized processors. Pin,sat at optimal stages of different-sized processors are indicated by the “+” sign. SNR of coherent detection when (h) both thermal noise and shot noise are considered, (i) only thermal noise is considered, and (j) only shot noise is considered. Thermal noise is dominant when signal power is low, while shot noise is dominant when signal power is high.

Download full size | View in the Article

Fig. 8. (a) Inversion accuracy of different-sized photonic RIPs when input signal powers are different (optical filter BW=64.5 MHz). Values in blue indicate the required iteration numbers for convergence. High-input signal power (>1 dBm) is necessary for ensuring an accuracy of >90% when using wavelength multiplexing technique. (b) Fitted relationship between inversion accuracy and optical filter BW (input signal power is 16.6 dBm) for processor size ranging from 2×2 to 64×64; (c) error breakdown of different-sized photonic RIPs (input signal power is 16.6 dBm).

Download full size | View in the Article

Method	Constraints on $A$	Key Steps	Complexity
GE	None	1) $A x = e \to U x = y$ 2) Back substitution	$\sim O (N^{3})$
LUD	None	1) $A = L U$ 2) $U x = y$ , $L y = b$ 3) Forward substitution: $y$ 4) Back substitution: $x$	$\sim O (N^{3})$
CD	Positive definite	1) $A = U^{} U$ 2) $U x = y$ , $U^{} y = b$ 3) Forward substitution: $y$ 4) Back substitution: $x$	$\sim O (N^{3})$
QRD	None	1) $A = Q R$ 2) $R x = y$ , $Q y = b$ 3) $y = Q^{*} b$ 4) Back substitution	$\sim O (N^{3})$
SVD	None	1) $A = P Σ Q^{}$ 2) $A^{- 1} = Q Σ^{- 1} P^{}$	$\sim O (N^{3})$

Table 1. Summary of Main Direct Inversion Methods

View in the Article

Method	Convergence Condition	Iterative Relationship	Complexity	Convergence Rate
JC	Positive definite	$x^{(k + 1)} = (I_{N} - D^{- 1} A) x^{(k)} + D^{- 1} e$	$\sim O (N^{3})$	Slow
GS	Positive definite	$x^{(k + 1)} = (I_{N} - {(D - E)}^{- 1} A) x^{(k)} + {(D - E)}^{- 1} e$	$\sim O (N^{3})$	Faster than JC
SOR	Positive definite, $0 < ω < 2$	$x^{(k + 1)} = (I_{N} - ω {(D - ω E)}^{- 1} A) x^{(k)} + ω {(D - ω E)}^{- 1} e$	$\sim O (N^{3})$	$ω > 1$ : accelerate; $ω = 1$ : GS; $ω < 1$ : slow down
RICH	Eigenvalues lie in a half-complex plane	$x^{(k + 1)} = (I_{N} - ω A) x^{(k)} + ω e$	$\sim O (N^{3})$	Depend on the choice of $ω$
SD	Positive definite	1) $r^{(0)} = e - A x^{(0)}$ 2) $p^{(k)} = r^{(k)}$ 3) $α_{k} = \frac{p^{(k) T} r^{(k)}}{p^{(k) T} A p^{(k)}}$ 4) $x^{(k + 1)} = x^{(k)} + α_{k} p^{(k)}$ 5) $r^{(k + 1)} = r^{(k)} - α_{k} {A p}^{(k)}$	$\sim O (N^{3})$	As slow as JC Be accelerated with preconditioning
CG	Positive definite	1) $r^{(0)} = e - A x^{(0)}$ 2) $p^{(0)} = r^{(0)}$ 3) $α_{k} = \frac{r^{(k) T} r^{(k)}}{p^{(k) T} A p^{(k)}}$ 4) $x^{(k + 1)} = x^{(k)} + α_{k} p^{(k)}$ 5) $r^{(k + 1)} = r^{(k)} - α_{k} A p^{(k)}$ 6) $β_{k} = \frac{r^{(k + 1) T} r^{(k + 1)}}{r^{(k) T} r^{(k)}}$ 7) $p^{(k + 1)} = r^{(k + 1)} + β_{k} p^{(k)}$	$\sim O (N^{3})$	Slightly faster than steepest descent Faster than SOR with preconditioning

Table 2. Summary of Main Iterative Inversion Methods

View in the Article

Photonic Blocks	Components	Functionality
Laser	CW LDs	Input signal
Summation 1	Single-stage 50:50 $2 \times 2 / 1 \times 2$ MMI coupler	1) Couple initial input; 2) add $ω I_{N}$ in each iteration
Input Vectors Fan-out	Cascaded 50:50 $1 \times 2$ MMI couplers	Split looped-back signals
Weight Bank	Push-pull MZIs	Encode elements of complex-valued matrix $M$
Summation 2	Cascaded 50:50 $2 \times 2 / 1 \times 2$ MMI couplers	Add signals up during matrix multiplication $M X^{(k)}$
Amplification	Cascaded SOAs	Compensate for on-chip losses
Filtering	AWGs and BPFs	Reduce the ASE noise from SOAs
Detection	Coherent detectors	Inversion results readout
Recirculating Loop	Phase-sensitive waveguides	Provide connections for iterative computation

Table 3. Correspondence between Key Photonic Blocks and Computational Functionalities

View in the Article

Method	Flip-Chip Bonding	Wafer/Die Bonding	$μ TP$	Hetero-epitaxy
Integration density	Low	Medium	High	High
Efficiency of III-V material use	Medium	Medium	High	Very High
Alignment accuracy	High	High	Hedium	High
Throughput	Medium	High	High	High
Cost	High	Medium	Low	Low
Maturity	Mature	Mature	R&D	R&D

Table 4. Comparison of III-V-on-Si Integration Methods

View in the Article

Component	SOI (μm)	${Si}_{3} N_{4}$ (μm)	IMOS (μm)
Summation 1	20	240	47
Input Vectors Fan-out	(2N–1)·72	(2N–1)·180	(2N–1)·80
Weight Bank	100	1100	200
Summation 2	(2N–1)·90	(2N–1)·300	(2N–1)·120
Amplification	$2.2 \log_{2} N \cdot 176$ [36]	$2.2 \log_{2} N \cdot 246$ [37]	$2.2 \log_{2} N \cdot 63$ [38]
Filtering	128	130	200

Table 5. Length Estimation of an N×N Iterative Photonic Processor on Photonic Integration Platforms

View in the Article

Component	Number	Unit Power (mW)	Total Power (mW)
Laser	$N$	69 [45]	69 N
TOPS	$2 N^{2}$	0.49 [46]	$0.98 N^{2}$
SOA	$x N^{2}$ ^a	50 [36]	$50 x N^{2}$
DAC	$2 N^{2}$	0.045 [47]	$0.09 N^{2}$
ADC	$2 N^{2}$	0.46 [48]	$0.92 N^{2}$

Table 6. Power Estimation of an N×N Iterative Photonic Processor

View in the Article

Parameter	Value
Processor size	$2 \times 2 - 64 \times 64$
Number of random matrix instances	500/processor size
Half-wave voltage of MZI	4.36 V [53]
DAC resolution	16 bits
SOA NF	3.8 dB [52]
BW of the optical BPF	64.5 MHz [54]
IL of the optical BPF	0.2 dB [54]
IL of an MMI coupler	0.2 dB [29]
IL of a waveguide crossing	0.019 dB [55]
Center frequency	193.6 THz
WDM channel spacing	0.1 nm [28]
Electron charge	$1.6 \times 10^{- 19} C$
Planck’s constant	$6.626 \times 10^{- 34} J \cdot s$
BW of the electronic filter	32.25 MHz
Boltzmann constant	$1.38 \times 10^{- 23} J / K$
Temperature	300 K
Electronic resistance	50 Ω

Table 7. Parameters Used in Accuracy Analyses of the Iterative Photonic Processor

Minjia Chen, Qixiang Cheng, Masafumi Ayata, Mark Holm, Richard Penty. Iterative photonic processor for fast complex-valued matrix inversion[J]. Photonics Research, 2022, 10(11): 2488

Download Citation

EndNote(RIS)

BibTex

Plain Text

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information