• Photonics Research
  • Vol. 10, Issue 11, 2488 (2022)
Minjia Chen1, Qixiang Cheng1、*, Masafumi Ayata2, Mark Holm2, and Richard Penty1
Author Affiliations
  • 1Department of Engineering, Centre for Photonic Systems, Electrical Engineering Division, University of Cambridge, Cambridge CB3 0FA, UK
  • 2Huawei Technologies (Sweden) AB, 164 40 Kista, Sweden
  • show less
    DOI: 10.1364/PRJ.468097 Cite this Article Set citation alerts
    Minjia Chen, Qixiang Cheng, Masafumi Ayata, Mark Holm, Richard Penty. Iterative photonic processor for fast complex-valued matrix inversion[J]. Photonics Research, 2022, 10(11): 2488 Copy Citation Text show less
    Graphical explanation of the Min-Max algorithm for a 4×4 processor. Eigenvalues represented by circles in (a)–(d) correspond to colors. (a) Eigenvalues of A shown in the polar coordinate system (all the eigenvalues lie in a half-complex plane, indicating the RICH method converges for a certain ω); (b) rotate all the eigenvalues of A into the right half-plane; (c) eigenvalues of ωoptA; (d) eigenvalues of IN−ωoptA. Now the convergence condition ρ(IN−ωoptA)<1 is satisfied and the fastest convergence rate is also reached.
    Fig. 1. Graphical explanation of the Min-Max algorithm for a 4×4 processor. Eigenvalues represented by circles in (a)–(d) correspond to colors. (a) Eigenvalues of A shown in the polar coordinate system (all the eigenvalues lie in a half-complex plane, indicating the RICH method converges for a certain ω); (b) rotate all the eigenvalues of A into the right half-plane; (c) eigenvalues of ωoptA; (d) eigenvalues of INωoptA. Now the convergence condition ρ(INωoptA)<1 is satisfied and the fastest convergence rate is also reached.
    System architecture of the proposed N×N iterative photonic processor for complex-valued matrix inversion. (a) Workflow of the iterative photonic processing system. The computation includes four main steps: 1) weights loading; 2) gain setting; 3) computation activation; and 4) results readout. VDAC, voltage digital-to-analog converter; IDAC, current digital-to-analog converter. (b) Architecture of an N×N iterative photonic processor. It consists of nine key photonic blocks, including Laser, Summation 1, Input Vectors Fan-Out, Weight Bank, Summation 2, Amplification, Filtering, Detection, and Recirculating Loop. AWG, arrayed waveguide gratings.
    Fig. 2. System architecture of the proposed N×N iterative photonic processor for complex-valued matrix inversion. (a) Workflow of the iterative photonic processing system. The computation includes four main steps: 1) weights loading; 2) gain setting; 3) computation activation; and 4) results readout. VDAC, voltage digital-to-analog converter; IDAC, current digital-to-analog converter. (b) Architecture of an N×N iterative photonic processor. It consists of nine key photonic blocks, including Laser, Summation 1, Input Vectors Fan-Out, Weight Bank, Summation 2, Amplification, Filtering, Detection, and Recirculating Loop. AWG, arrayed waveguide gratings.
    Models of (a) 1-to-N Fan-Out block, (b) Summation block, (c) Weight Bank block, (d) Laser block, (e) Amplification and Filtering blocks, (f) Detection block, and (g) electronic peripherals.
    Fig. 3. Models of (a) 1-to-N Fan-Out block, (b) Summation block, (c) Weight Bank block, (d) Laser block, (e) Amplification and Filtering blocks, (f) Detection block, and (g) electronic peripherals.
    (a) Typical signal amplitude changes during computation without filtering. (b) Plot of the sine integral function; (c) typical signal amplitude changes during computation after filtering.
    Fig. 4. (a) Typical signal amplitude changes during computation without filtering. (b) Plot of the sine integral function; (c) typical signal amplitude changes during computation after filtering.
    Conceptual figure of an integrated 4×4 inverter (without wavelength multiplexing) where the LDs, BPFs, SOAs, and BPDs are monolithically integrated on-chip. TIAs and digital signal processing (DSP) are used for results readout. One column of the inverse matrix can be computed at a time by turning on one of the LDs, while the complete computation results can be obtained by turning on each of the LDs, respectively, or using multiple copies of the unit shown here.
    Fig. 5. Conceptual figure of an integrated 4×4 inverter (without wavelength multiplexing) where the LDs, BPFs, SOAs, and BPDs are monolithically integrated on-chip. TIAs and digital signal processing (DSP) are used for results readout. One column of the inverse matrix can be computed at a time by turning on one of the LDs, while the complete computation results can be obtained by turning on each of the LDs, respectively, or using multiple copies of the unit shown here.
    (a), (b) Net computing speed of different-sized N×N photonic RIPs on SOI, Si3N4, and IMOS platforms. The light propagation speed is estimated through the effective indices of the waveguides, while the computing speed is estimated considering light propagation speed, loop length, and number of iterations simultaneously. (a) Inversion rate in terms of GInv/s and (b) processing speed in terms of TMAC/s are shown. (c) Power efficiency of different-sized N×N photonic RIPs.
    Fig. 6. (a), (b) Net computing speed of different-sized N×N photonic RIPs on SOI, Si3N4, and IMOS platforms. The light propagation speed is estimated through the effective indices of the waveguides, while the computing speed is estimated considering light propagation speed, loop length, and number of iterations simultaneously. (a) Inversion rate in terms of GInv/s and (b) processing speed in terms of TMAC/s are shown. (c) Power efficiency of different-sized N×N photonic RIPs.
    Matrix weights encoding error for (a)–(e) different DAC bit resolutions and (f) 20 nm wavelength span. Using a 16-bit DAC is enough to guarantee <0.1% relative weight encoding error. The encoding error due to wavelength multiplexing is around 3%. (g) ASE noise powers of different-sized processors when cascading different numbers of SOA stages. Red circles highlight the minimal achievable ASE powers for different-sized processors. Pin,sat at optimal stages of different-sized processors are indicated by the “+” sign. SNR of coherent detection when (h) both thermal noise and shot noise are considered, (i) only thermal noise is considered, and (j) only shot noise is considered. Thermal noise is dominant when signal power is low, while shot noise is dominant when signal power is high.
    Fig. 7. Matrix weights encoding error for (a)–(e) different DAC bit resolutions and (f) 20 nm wavelength span. Using a 16-bit DAC is enough to guarantee <0.1% relative weight encoding error. The encoding error due to wavelength multiplexing is around 3%. (g) ASE noise powers of different-sized processors when cascading different numbers of SOA stages. Red circles highlight the minimal achievable ASE powers for different-sized processors. Pin,sat at optimal stages of different-sized processors are indicated by the “+” sign. SNR of coherent detection when (h) both thermal noise and shot noise are considered, (i) only thermal noise is considered, and (j) only shot noise is considered. Thermal noise is dominant when signal power is low, while shot noise is dominant when signal power is high.
    (a) Inversion accuracy of different-sized photonic RIPs when input signal powers are different (optical filter BW=64.5 MHz). Values in blue indicate the required iteration numbers for convergence. High-input signal power (>1 dBm) is necessary for ensuring an accuracy of >90% when using wavelength multiplexing technique. (b) Fitted relationship between inversion accuracy and optical filter BW (input signal power is 16.6 dBm) for processor size ranging from 2×2 to 64×64; (c) error breakdown of different-sized photonic RIPs (input signal power is 16.6 dBm).
    Fig. 8. (a) Inversion accuracy of different-sized photonic RIPs when input signal powers are different (optical filter BW=64.5  MHz). Values in blue indicate the required iteration numbers for convergence. High-input signal power (>1  dBm) is necessary for ensuring an accuracy of >90% when using wavelength multiplexing technique. (b) Fitted relationship between inversion accuracy and optical filter BW (input signal power is 16.6 dBm) for processor size ranging from 2×2 to 64×64; (c) error breakdown of different-sized photonic RIPs (input signal power is 16.6 dBm).
    MethodConstraints on AKey StepsComplexity
    GENone1) Ax=eUx=y 2) Back substitutionO(N3)
    LUDNone1) A=LU 2) Ux=y, Ly=b 3) Forward substitution: y 4) Back substitution: xO(N3)
    CDPositive definite1) A=U*U 2) Ux=y, U*y=b 3) Forward substitution: y 4) Back substitution: xO(N3)
    QRDNone1) A=QR 2) Rx=y, Qy=b 3) y=Q*b 4) Back substitutionO(N3)
    SVDNone1) A=PΣQ* 2) A1=QΣ1P*O(N3)
    Table 1. Summary of Main Direct Inversion Methods
    MethodConvergence ConditionIterative RelationshipComplexityConvergence Rate
    JCPositive definitex(k+1)=(IND1A)x(k)+D1eO(N3)Slow
    GSPositive definitex(k+1)=(IN(DE)1A)x(k)+(DE)1eO(N3)Faster than JC
    SORPositive definite, 0<ω<2x(k+1)=(INω(DωE)1A)x(k)+ω(DωE)1eO(N3)ω>1: accelerate; ω=1: GS; ω<1: slow down
    RICHEigenvalues lie in a half-complex planex(k+1)=(INωA)x(k)+ωeO(N3)Depend on the choice of ω
    SDPositive definite1) r(0)=eAx(0) 2) p(k)=r(k) 3) αk=p(k)Tr(k)p(k)TAp(k) 4) x(k+1)=x(k)+αkp(k) 5) r(k+1)=r(k)αkAp(k)O(N3)As slow as JC Be accelerated with preconditioning
    CGPositive definite1) r(0)=eAx(0) 2) p(0)=r(0) 3) αk=r(k)Tr(k)p(k)TAp(k) 4) x(k+1)=x(k)+αkp(k) 5) r(k+1)=r(k)αkAp(k) 6) βk=r(k+1)Tr(k+1)r(k)Tr(k) 7) p(k+1)=r(k+1)+βkp(k)O(N3)Slightly faster than steepest descent Faster than SOR with preconditioning
    Table 2. Summary of Main Iterative Inversion Methods
    Photonic BlocksComponentsFunctionality
    LaserCW LDsInput signal
    Summation 1Single-stage 50:50 2×2/1×2 MMI coupler1) Couple initial input; 2) add ωIN in each iteration
    Input Vectors Fan-outCascaded 50:50 1×2 MMI couplersSplit looped-back signals
    Weight BankPush-pull MZIsEncode elements of complex-valued matrix M
    Summation 2Cascaded 50:50 2×2/1×2 MMI couplersAdd signals up during matrix multiplication MX(k)
    AmplificationCascaded SOAsCompensate for on-chip losses
    FilteringAWGs and BPFsReduce the ASE noise from SOAs
    DetectionCoherent detectorsInversion results readout
    Recirculating LoopPhase-sensitive waveguidesProvide connections for iterative computation
    Table 3. Correspondence between Key Photonic Blocks and Computational Functionalities
    MethodFlip-Chip BondingWafer/Die BondingμTPHetero-epitaxy
    Integration densityLowMediumHighHigh
    Efficiency of III-V material useMediumMediumHighVery High
    Alignment accuracyHighHighHediumHigh
    ThroughputMediumHighHighHigh
    CostHighMediumLowLow
    MaturityMatureMatureR&DR&D
    Table 4. Comparison of III-V-on-Si Integration Methods
    ComponentSOI (μm)Si3N4 (μm)IMOS (μm)
    Summation 12024047
    Input Vectors Fan-out(2N–1)·72(2N–1)·180(2N–1)·80
    Weight Bank1001100200
    Summation 2(2N–1)·90(2N–1)·300(2N–1)·120
    Amplification2.2log2N·176 [36]2.2log2N·246 [37]2.2log2N·63 [38]
    Filtering128130200
    Table 5. Length Estimation of an N×N Iterative Photonic Processor on Photonic Integration Platforms
    ComponentNumberUnit Power (mW)Total Power (mW)
    LaserN69 [45]69 N
    TOPS2N20.49 [46]0.98N2
    SOAxN2a50 [36]50xN2
    DAC2N20.045 [47]0.09N2
    ADC2N20.46 [48]0.92N2
    Table 6. Power Estimation of an N×N Iterative Photonic Processor
    ParameterValue
    Processor size2×264×64
    Number of random matrix instances500/processor size
    Half-wave voltage of MZI4.36 V [53]
    DAC resolution16 bits
    SOA NF3.8 dB [52]
    BW of the optical BPF64.5 MHz [54]
    IL of the optical BPF0.2 dB [54]
    IL of an MMI coupler0.2 dB [29]
    IL of a waveguide crossing0.019 dB [55]
    Center frequency193.6 THz
    WDM channel spacing0.1 nm [28]
    Electron charge1.6×1019C
    Planck’s constant6.626×1034  J·s
    BW of the electronic filter32.25 MHz
    Boltzmann constant1.38×1023  J/K
    Temperature300 K
    Electronic resistance50 Ω
    Table 7. Parameters Used in Accuracy Analyses of the Iterative Photonic Processor
    Minjia Chen, Qixiang Cheng, Masafumi Ayata, Mark Holm, Richard Penty. Iterative photonic processor for fast complex-valued matrix inversion[J]. Photonics Research, 2022, 10(11): 2488
    Download Citation