## **PHOTONICS** Research

# Experimental demonstration of robust nanophotonic devices optimized by topological inverse design with energy constraint

GUOWU ZHANG,<sup>1,\*</sup> DAN-XIA XU,<sup>2</sup> VURI GRINBERG,<sup>2</sup> AND ODILE LIBOIRON-LADOUCEUR<sup>1</sup>

<sup>1</sup>Department of Electrical and Computer Engineering, McGill University, Montréal, Quebec H3A 0E9, Canada <sup>2</sup>National Research Council Canada, Ottawa, Ontario K1A 0R6, Canada \*Corresponding author: guowu.zhang@mail.mcgill.ca

Received 25 February 2022; revised 5 May 2022; accepted 13 May 2022; posted 16 May 2022 (Doc. ID 457066); published 30 June 2022

In this paper, we present the experimental results for integrated photonic devices optimized with an energyconstrained inverse design method. When this constraint is applied, optimizations are directed to solutions that contain the optical field inside the waveguide core medium, leading to more robust designs with relatively larger minimum feature size. We optimize three components: a mode converter (MC), a 1310 nm/1550 nm wavelength duplexer, and a three-channel C-band wavelength demultiplexer for coarse wavelength division multiplexing (CWDM) application with 50 nm channel spacing. The energy constraint leads to nearly binarized structures without applying independent binarization stage. It also reduces the appearance of small features. In the MC, well-binarized design, improved insertion loss, and cross talk are obtained as a result. Furthermore, the proposed constraint improves the robustness to fabrication imperfections as shown in the duplexer design. With energy constraint optimization, the corresponding spectrum shifts for the duplexer under  $\pm 10$  nm dimensional variations are reduced from 105 nm to 55 nm and from 72 nm to 60 nm for the 1310 nm and 1550 nm channel, respectively. In the CWDM demultiplexer, robustness toward  $\pm 10$  nm fabrication error is improved by a factor of 2. The introduction of the energy constraint into topological optimization demonstrates computational gain with better-performing designs. © 2022 Chinese Laser Press

https://doi.org/10.1364/PRJ.457066

#### **1. INTRODUCTION**

Nanophotonic computational design methods have been investigated intensively over the past decades [1-15]. Heuristic methods, such as the genetic algorithm (GA) and particle swarm optimization (PSO) were first investigated and demonstrated in nanophotonic design [16-18]. Later, the adjoint method, which enables efficient gradient calculation with only two optical simulations, was introduced into computational methods for nanophotonic design [19–21]. At the same time, the introduction of level-set [22-26] and density topology [27–30] provided a systematic way to represent design possibilities and, thus, enabled the gradient-based algorithm to further explore the design parameter space. In the level-set method, only the boundary of the design is allowed to be optimized, while in the density topology-based optimization, the permittivity values of the design can be changed continuously providing an enormous design parameter space for the optimization to explore. With such a large possible design space to explore, density topology-based optimization typically converges to a local minimum quickly with remarkable device performance. However, allowing a continuous permittivity poses another problem as intermediate permittivity values do not usually correspond to real materials. To address this problem, several methods are proposed such as adding a binarization process during which the intermediate permittivity values are gradually removed [31], neighbor-biasing [32], and adding an explicit objective function for measuring the binarization of the design [33].

Topologically optimized designs typically include minimum size features that violate the lithography limitations of current nanofabrication technologies. To ensure the optimized designs meet the minimum feature size requirement for a given foundry, several techniques have been proposed. For example, pixel-by-pixel optimization has been proposed and demonstrated where the dimension of a pixel is set to be larger than the predefined minimum feature size [34–36]. Besides, adding minimum length scale control during the optimization process has also been proposed to guarantee the final design's minimum feature size larger than the predefined value [25,37,38].

Another shortcoming is that the robustness toward fabrication error of the topologically optimized designs is difficult to explicitly optimize. Indeed, the performance of the fabricated devices often deviates from the optimized results. One existing solution is to take into account possible fabrication errors during the optimization process so that the optimized designs still operate well over a certain range of their realizations [39]. However, this inevitably increases the required computational resources.

In our previous work [40], we proposed a novel approach of applying an energy constraint in the optimization process that maximizes the optical energy residing in the silicon regions, or equivalently minimizes the energy inside the surrounding silicon dioxide cladding. By applying energy constraint, the optimization is naturally guided to a nearly binarized device. As most of the energy is confined inside the silicon, less energy will interact with the boundaries. Thus, robustness toward fabrication errors of the final optimized designs is improved. In addition, this constraint inhibits the generation of small features during optimization. Our previously published theoretical work was based on a 2D simulation to illustrate the principle, while in this paper the energy constraint is applied to 3D optimization for experimental device implementation. We demonstrate the benefits of applying energy constraint to the topological optimization of three types of devices: a mode converter (MC), an O-band/C-band duplexer, and a C-band threechannel wavelength division multiplexing (WDM) demultiplexer with 50 nm channel spacing. Specifically, we show how applying energy constraint benefits the binarization of the design during the optimization process, reducing the presence of small features and improving the robustness toward fabrication error. The results of a collection of 15 designs from different inverse design initialization for an O-band/C-band duplexer demonstrate the statistical generality of these improvements. Then, the optimized designs with our proposed energy constraint are fabricated using a commercial electron beam lithography (EBL), and the experimental characterization of the device performance is presented to validate the improvement.

#### 2. OPTIMIZATION RESULTS AND COMPARISON

The proposed devices are designed for fabrication on a siliconon-insulator (SOI) chip with an optical waveguide thickness of 220 nm. The theoretical basis and 2D algorithm implementation are presented in Ref. [40]. The 3D implementation details are discussed in Appendix A so that we can focus on the optimization and experimental results in the main text. Three different functional devices are optimized in this work, that is an MC, a 1310 nm/1550 nm wavelength duplexer, and a threechannel coarse wavelength division multiplexing (CWDM) demultiplexer with 50 nm channel spacing.

#### A. Mode Converter

#### 1. Optimization Results for Mode Converter

MCs offer essential optical functions in mode division multiplexing (MDM) systems and are widely used as design prototypical examples for inverse design in the literature [15,30]. In this section, we design an MC between the fundamental transverse electric field (TE0) and the second-order transverse electrical field (TE1). We show how applying energy constraint benefits the binarization of the design during the optimization process, thereby reducing the computation cost.

In our optimization configuration, the fundamental mode (TE0) is excited at the input waveguide, with the goal of generating the next high-order mode (TE1) at the output waveguide. The dimensions of the input and output waveguides, the design region, the cross section of the waveguides, and the position of the monitor (marked by the green dashed line) are shown in Fig. 1. To benchmark our method, the dimension of the design region is selected to be 3.2  $\mu$ m × 1.52  $\mu$ m similar to Refs. [41,42]. Accordingly, the electromagnetic objective function is defined as the mode overlap integral to the desired mode at the output waveguide. Assuming there is no reflected wave at the output waveguide, the mode matching integral,  $\eta$ , is defined by Eq. (1) following Ref. [43], where the surface integral is normalized to the source optical power  $P_{\rm src}$  and the optical power of the desired mode  $P_m$ ,

$$\eta(\mathbf{E}, \mathbf{H}_m) = \frac{1}{4P_m P_{\rm src}} \left| \iint_A \mathbf{E} \times \mathbf{H}_m^* \cdot dA \right|^2,$$
(1)

where **E** is the simulated electric field at the monitor position and  $\mathbf{H}_m$  is the magnetic field distribution of the desired mode mat the output waveguide. According to the desired input and output, the electromagnetic objective function for the optimization is defined as Eq. (2) for a minimization problem. In the objective function, only the transmission to the desired mode is included as the cross talk performance is sufficiently low as illustrated in the following optimization results. Note that the cross talk penalty can be added to the objective function to potentially further improve the performance as illustrated in Ref. [41]. The optimization is carried out at three wavelength points equally spaced within the wavelength range from 1500 nm to 1600 nm to achieve broadband operation,

$$F_{\rm FM}(\mathbf{E}) = -\eta(\mathbf{E}, \mathbf{H}_{\rm TE1}).$$
 (2)

For comparison purposes, we investigate three different optimization strategies: gray scale only, baseline, and energy constraint. Each of these optimization strategies is detailed in Appendix A. Fixed computational resources (in this work, it refers to the number of quasi-Newton algorithm iterations, which is 300 for this device) are used for all three optimization strategies. For baseline optimization, the continuous stage and binarization stage take 200 and  $5 \times 20$  iterations, respectively. For energy constraint optimization, we modify the objective



**Fig. 1.** Illustration of the dimensions of the input and output waveguides (in red) and the design region (in yellow) for the TE0 to TE1 mode converter.

function to include both energy constraint and electromagnetic performance requirements.

The optimization results for the three optimization strategies are shown in Fig. 2. Compared with the gray scale only optimization, applying the energy constraint improves the design binarization. This is observed from the reduction of the intermediate permittivity distribution in the final designs shown in Fig. 2(a). Indeed, the binarization level b (discussed in Appendix A.5) increases with the energy constraint coefficient  $w_c$  [Fig. 2(b)] when  $w_c$  is smaller than 0.3 indicating the benefit of applying energy constraint to binarize the device. When the energy constraint coefficient is in the range  $0.3 < w_c < 0.5$ , the binarization level converges to an optimum design as the device is almost perfectly binarized, and the insertion loss (IL) performance degradation is trivial. For  $w_c > 0.5$ , increasing the energy constraint coefficient does not improve the binarization level while the IL performance degrades by approximately 0.15 dB as the optimization emphasizes more on the energy constraint. The backreflections obtained in the simulation are below -21 dB and -27 dB for baseline and  $w_c = 0.3$ , respectively. The radiation downward toward the substrate is less than -24 dB and -30 dB for baseline and  $w_c = 0.3$ , respectively. For this MC device, energy constraint benefits the inverse design process by guiding the design to a well-binarized device.

Designs from the baseline and energy constraint  $w_c = 0.3$  are selected for fabrication since the gray scale only optimization generates designs with intermediate permittivity values and the performance degradation is nontrivial after converting the design onto a GDS layout file. The final layout is generated using a direct binarization method described by Eq. (A15) in Appendix A.

#### 2. Experimental Results for Mode Converter

The devices are fabricated using 100 keV EBL at Applied Nanotools (ANT) [44]. The silicon device layer patterning

is followed by an inductively coupled plasma-induced reactive ion etching (ICP-RIE) process. A 2.2  $\mu$ m cladding oxide layer is deposited onto the device using a plasma-enhanced chemical vapor deposition (PECVD) process to protect the silicon layer. The recommended minimum features are specified as 60 nm. The mean and standard width variations for a 500 nm waveguide are found to be 6.295 nm and 1.132 nm, respectively [45]. The correlation length for this variation is found to be 12.23 mm.

The IL and cross talk performance for the fabricated MCs are characterized first. Figures 3(a) and 3(b) show the optical image of the fabricated MC test structures. Figure 3(c) shows the SEM images of the fabricated MC for baseline and energy constraint scenarios. To investigate the IL performance, two, four, six, and eight MCs are cascaded to accumulate enough loss. The IL performance is then calculated from these measurement results using the least squares (LS) regression. The cross talk performance of the MCs is measured using an adiabatic directional coupler (ADC)-based mode multiplexer and demultiplexer to generate the required TE1 modes, a device we previously demonstrated [41]. The measurements are conducted with a testbed comprising a C-band tunable laser (Keysight 8164B), a polarization controller, a 12-channel fiber array, fiber alignment stages, and an optical powermeter. A pair of grating couplers is used to receive (transmit) the light from the singlemode fibers (the chip) onto the chip (the single-mode fiber). All continuous wave (CW) measurements are normalized to a loopback structure that connects two grating couplers with a short waveguide. The measured IL for the loopback structure is approximately 16 dB at 1550 nm.

Figure 3(d) shows the normalized measured transmission spectrum for the MCs. The results show that the devices optimized with energy constraint perform slightly better than the baseline case. Specifically, the IL at 1550 nm for energy-constrained MC is 0.18 dB whereas 0.34 dB for the baseline case. The corresponding cross talk is -26.7 dB and -23.0 dB.



**Fig. 2.** (a) Initial and optimized mode converters for gray scale optimization strategies (note that  $w_c = 0$  corresponds to a gray scale optimization), baseline, and with energy constraint coefficients  $w_c$  of 0.1, 0.3, 0.5, 0.7, and 0.9; (b) insertion loss and binarization level as a function of the energy constraint coefficient parameter  $w_c$  (horizontal purple dashed line, IL performance of the baseline design; horizontal light blue dashed line, binarization level of the baseline design); energy intensity distribution for the optimized designs (c) without and (d) with energy constraint.

**Research Article** 



**Fig. 3.** Optical image of the test structures for characterizing the mode converter in terms of (a) IL performance and (b) cross talk performance; (c) SEM images for the fabricated design of the baseline (left) and energy constraint strategies (right); (d) measured IL and XT for the mode converter designed by the baseline (solid line) and energy constraint strategies (dashed line); right part shows the zoom in version of the measured IL performance.

For MCs that have relatively simple shapes, the improvement over the baseline case is limited because the energy is already well-confined inside silicon without much interaction with the boundary in the baseline case. To compare our experimental results with the performance of other inverse-designed MCs/ exchangers, we summarize the performance of our device and previous reported works in Table 1.

## B. Broadband 1310 nm/1550 nm Wavelength Duplexer

### 1. Optimization Results for 1310 nm/1550 nm Wavelength Duplexer

In this section, a broadband TE mode 1310 nm/1550 nm wavelength duplexer, an important device in WDM systems, is optimized. These wavelength sensitive devices are ideal for quantifying the robustness toward fabrication error since fabrication variations lead to detrimental spectral response shifts [32,40]. The dimensions of the input and output waveguide, the design region, and the position of the monitor (marked using green dashed lines) are shown in Fig. 4(a). The design region is set to  $3.2 \ \mu m \times 3 \ \mu m$  according to a previous publication [2]. The distance between the two output waveguides is set to be  $2 \ \mu m$  to ensure lower enough coupling between them. Here the optimization seeks to maximize the transmission of 1310 nm (1550 nm) light into CH1 (CH2) while minimizing

the transmission of the 1550 nm (1310 nm) input into CH1 (CH2). The corresponding electromagnetic objective function is defined in Eq. (3). Note that the "1" and "0" in Eq. (3) represent the target transmission. The optimization is carried out at eight wavelength points with four equally spaced points from 1275 nm to 1365 nm and another four equally spaced points from 1515 nm to 1605 nm, as shown in Fig. 4(b),

$$F_{\text{EM}}(\mathbf{E}_{1310}, \mathbf{E}_{1550}) = |\eta(\mathbf{E}_{1310}, \mathbf{H}_{1310,\text{CH1}}) - 1|^2 + |\eta(\mathbf{E}_{1550}, \mathbf{H}_{1550,\text{CH2}}) - 1|^2 + |\eta(\mathbf{E}_{1310}, \mathbf{H}_{1310,\text{CH2}}) - 0|^2 + |\eta(\mathbf{E}_{1550}, \mathbf{H}_{1550,\text{CH1}}) - 0|^2.$$
(3)

As was the case for the MC, the gray scale only, baseline, and energy constraint optimizations are investigated. We use 450 quasi-Newton algorithm iterations for all three optimization strategies for this device. For baseline optimization, the continuous stage and binarization stage take 200 and  $5 \times 50$  iterations, respectively. Each complete 3D optimization takes 12 h on our GPU cluster.

The optimization results for the three optimization strategies are shown in Fig. 5(a). Similar to the observations made about the MC optimization, applying energy constraint improves the binarization of the design. The field energy

 Table 1. Comparison of the Performance of Mode Converters with the Previous Reported Inversely Designed Mode

 Converters/Exchangers

| Ref.                    | Footprint (µm <sup>2</sup> ) | BW (nm) | XT (dB) | IL (dB) | Time (h) | Hardware                  |
|-------------------------|------------------------------|---------|---------|---------|----------|---------------------------|
| [41]                    | $4 \times 4$                 | 100     | -22.0   | 0.34    | 36       | Intel Core i7-8700 CPU    |
| [42]                    | $4 \times 1.6$               | 40      | -9.1    | 1.34    | 50       | Intel Xeon E5 2697 v3 CPU |
| This work (baseline)    | $3.2 \times 1.52$            | 100     | -23.0   | 0.34    | 4        | NVidia V100 GPU           |
| This work $(w_c = 0.3)$ | 3.2 × 1.52                   | 100     | -26.7   | 0.18    | 4        | NVidia V100 GPU           |



**Fig. 4.** (a) Illustration of the dimensions of the input and output waveguides (in red) and the design region (in yellow) for the 1310 nm/1550 nm wavelength duplexer; (b) illustration of the wavelength points at which the optimization is conducted for the 1310 nm and 1550 nm channels, respectively.

distribution in Fig. 5(b) for the baseline (left) and energy constraint optimization (right) strategies shows how the energy is well-confined inside silicon after applying energy constraints and has less interaction with the device boundary. This can be confirmed by the 1550 nm energy (red in the figure) that goes into CH2 passing through several holes (circled using blue dashed line) for the baseline case whereas none was encountered in the energy constraint case. The energy constraint objective function  $F_{\text{energy}}$  values after the optimizations are 0.055 and 0.090 for the cases with and without (baseline) energy constraint, correspondingly. We can also observe that the baseline optimization strategy produces smaller features compared with energy constraint optimization. One explanation is that small features lead to weak field confinement or strong coupling, which is not favored by the energy constraint. The simulated transmission spectra for the duplexers optimized with the baseline and energy constraint strategies are shown in Figs. 5(c) and 5(d), respectively. The results show the duplexer has an IL of 0.4 dB over a 124 nm 0.5 dB bandwidth and a cross talk of at most -22.5 dB for the baseline case. For the energy constraint optimization, the IL obtained is 0.35 dB, and the cross talk is at most -16.8 dB over a 0.5 dB bandwidth of 122 nm. Simulation results show that most of the lost power is radiated out-of-plane upward and downward from the devices. The simulated backreflections are below -17 dB and -23 dB for baseline and  $w_c = 0.3$  at the operating wavelengths, respectively. The radiation downward toward the substrate is below -26 dB and -27 dB for baseline and  $w_c = 0.3$ , respectively. As the devices include multiple optical channels, we present the performance (i.e., worst IL, cross talk, backreflection, radiation downward the substrate, and optical bandwidth) defined by the worst case among all channels throughout this paper.

We then investigate through simulations the device's robustness toward fabrication error under different optimization strategies. Specifically, the boundary is biased to +10 nm, 0 nm, and -10 nm, and then transmission spectra under these biases are simulated using commercial software, Ansys Lumerical FDTD [46]. The simulated spectra shown in Fig. 6 indicate that the spectrum shift caused by the specified boundary changes decreases from 80 nm to 60 nm after applying energy constraint optimization, as characterized by the shifts of the cross point of transmission curves for CH1 and CH2. The results validate the improvement of the robustness toward fabrication imperfections.

To validate the generality of the above conclusion, another 15 devices from different random initial parameters for baseline and energy constraint strategies are optimized, collected, and analyzed. First, the reduction of small features after applying energy constraint is investigated. The histograms of the minimum feature sizes for the 15 optimizations are shown in Figs. 7(a) and 7(b) for the baseline and energy constraint scenarios. The result shows that these 15 optimizations from the baseline strategy all include minimum features less than



**Fig. 5.** (a) Initial and optimized 1310 nm/1550 nm wavelength duplexer layouts for different strategies (note that  $w_c = 0$  corresponds to the gray scale optimization); (b) energy intensity distribution inside the optimized designs for baseline (left) and the case with energy constraint coefficient  $w_c = 0.3$  (right) (red, 1550 nm; cyan, 1310 nm); simulated transmission spectrum results from (c) baseline optimization and (d) the case with energy constraint coefficient  $w_c = 0.3$ .



**Fig. 6.** Simulated transmission spectrum for (a) baseline optimization and (b) the case with energy constraint coefficient  $w_c = 0.3$  under  $\pm 10$  nm fabrication error. For clarity of the figure, the data for the nominal designs are not shown here.

10 nm, which is challenging for today's fabrication foundry, whereas only a third of them include minimum features less than 10 nm after applying energy constraint. Note that, even though applying energy constraint prevents generating small features, it does this implicitly, which means it cannot guarantee that the minimum feature size of the final design satisfies the fabrication foundry's requirements. In this case, we believe the optimized design from our energy constraint optimization can be used as a good initial state for further optimization (e.g., length scale optimization [25,37,38]) to meet fabrication requirements. Then, the improvement of the robustness toward fabrication imperfection by applying energy constraint is investigated. This is done by simulating the transmission spectrum of the 15 optimized designs with boundary biased  $\pm 10$  nm to mimic the fabrication error (illustrated in Fig. 18 of the Appendix A). The simulated transmission spectra for all these 15 optimized designs are plotted in one figure and shown in Figs. 7(c) and 7(d) for the baseline and energy constraint  $(w_c = 0.3)$ , respectively. The spectrum shifts are reduced from 90 nm to 65 nm on average after applying energy constraint. More interestingly, despite these 15 designs being from different random start parameters, the optimized transmission spectrum behaves uniformly within the desired bandwidth as shown in Figs. 7(c) and 7(d).

After optimization, these designs from the baseline and energy constraint are converted to a GDS file and sent for fabrication. As the fabrication error varies from foundry to foundry or even from different runs of the same foundry, devices with different boundary biases are fabricated. Specifically, to investigate robustness toward fabrication error, for each optimized design, three different versions are included: (1) biased with an over-etching of 10 nm, (2) nominal design, and (3) biased with an under-etching of 10 nm. The measured spectrum shifts for these devices with the intentional fabrication biases illustrate how applying energy constraint improves the robustness of fabricated designs.

## 2. Experimental Results for 1310 nm/1550 nm Wavelength Duplexer

To characterize the performance of the 1310 nm/1550 nm wavelength duplexer, three tunable lasers (Keysight O-band laser 8164B, Santec ES-band laser TSL-550, Keysight C-band laser 8164B) are used to cover the whole spectrum range from 1260 nm to 1600 nm. Specifically, the available lasers cover 1260 nm to 1350 nm, 1355 nm to 1480 nm, and 1490 nm to 1600 nm, respectively, leaving small gaps in the spectral coverage. Two different grating couplers are used. One covers the O-band, and the other one covers the E-, S-, and C-bands. The fiber array is tilted to the proper angle to maximize the transmission for the given wavelength range. Both baseline and energy constraint optimization are investigated here. The robustness toward fabrication is characterized using the spectrum shifts under certain fabrication errors. Figure 8 provides the randomly chosen SEM images of eight fabricated devices for baseline (top row) and energy constraint (bottom row) scenarios. The structures at the left of the SEM images are used to calibrate the dimension of the device after fabrication.

The measured transmission spectrum of the optimized devices from a uniform initial distribution is shown in Fig. 9 for both baseline and energy constraint scenarios. Figures 9(a) and 9(b) show the measured spectrum normalized by the transmission of the loopback structure. As one may note, the spectrum shows Fabry-Perot prominent fringes between 1310 nm and 1380 nm and for wavelengths larger than 1580 nm. These fringes that have approximately 2 nm free spectral range (FSR) come from the cavity generated by in-waveguide reflection between the two grating couplers [47]. To obtain a relatively smooth spectrum for the device and remove the impact of the Fabry-Perot fringes, a moving average smooth filter with a 2 nm window is employed. With this filtering, the spectrum shifts can be quantified more easily. The measured, normalized, and smoothed transmission spectra are shown in Figs. 9(c) and 9(d). The results clearly show how the spectrum shift is reduced



**Fig. 7.** Histogram of minimum feature size ranges for (a) 15 baseline optimization runs and (b) 15 runs with an energy constraint coefficient of  $w_c = 0.3$ ; overlap of all 15 simulated transmission spectra for (c) baseline optimization and (d) with energy constraint coefficient  $w_c = 0.3$  under  $\pm 10$  nm fabrication imperfection.



Fig. 8. SEM images of eight fabricated 1310 nm/1550 nm wavelength duplexers optimized from different random start parameters. Top row, devices from baseline optimization strategy; bottom row, corresponding devices from energy constraint optimization strategy.



**Fig. 9.** Measured and normalized transmission spectrum of the fabricated 1310 nm/1550 nm wavelength duplexer under  $\pm 10$  nm and 0 nm boundary bias for (a) baseline, (b) energy constraint optimization; filtered transmission spectrum of the results presented in (a) and (b) for (c) baseline, (d) energy constraint optimization.

by adding energy constraint. Compared with the baseline case, the spectrum shifts of the energy-constrained devices are reduced from 92 nm to 60 nm. The measured IL from the nominal baseline optimized design is 1 dB and 1.5 dB at 1310 nm and 1550 nm, respectively. The corresponding cross talk from another channel is -18.0 dB and -20.0 dB respectively. The measured IL from nominal energy-constrained design is 0.75 dB and 1.5 dB at 1310 nm and 1550 nm. The corresponding cross talk from another channel is -23.5 dB and

-26.1 dB, respectively. Note that the experimental cross talk performance is better than the simulation results, which seems to contradict the usual expectations. However, it can be explained by an observed over-etching effect during the fabrication process, indicated by the spectrum blueshift of the grating coupler (GC) between simulation and experiment. Also, the simulation results in Fig. 6 indicate that certain biases can cause improvement in cross talk performance. This opens another door for pre-compensating the fabrication error of the optimized design so that the fabricated designs can have a better performance after fabrication. In addition, the transmission shows a flat passband profile for both the 1310 nm channel and 1550 nm channel, which is a desired property for WDM application. Another interesting finding is that the same device is duplicated on several positions at an 8 mm × 8 mm die, and the measured results for these designs at different chip locations are quite uniform. To compare our experimental results with the performance of other inverse designed 1310 nm/ 1550 nm wavelength duplexers, we summarize the performance of our device and previous reported works in Table 2.

The generality of the improvement in robustness toward  $\pm 10$  nm fabrication error is validated by experimental results of different 1310 nm/1550 nm wavelength duplexers optimized from 15 random initializations. The measured, normalized, and smoothed transmission spectra from all these designs with  $\pm 10$  nm boundary bias are plotted in one figure and shown in Figs. 10(a) and 10(b) for the baseline and energy constraint optimization strategies, respectively. On average, the spectrum shifts are reduced by 34% by applying energy constraint (from 100 nm to 66 nm). More interestingly, even for these 15 designs resulting from different random initializations, the measured transmission spectra show certain uniformity within the desired bandwidth (Fig. 10). Further, the measured passband shape also shows a flattop in the experimental results.

 Table 2.
 Comparison of the Performance of the 1310 nm/1550 nm Wavelength Duplexer with Previous Reported

 Inversely Designed 1310 nm/1550 nm Wavelength Duplexers

| Ref.                      | Footprint<br>(µm <sup>2</sup> ) | 3-dB BW<br>O/C Band (nm) | XT (dB) | IL<br>O/C Band (dB) | Time (h) | Hardware             |
|---------------------------|---------------------------------|--------------------------|---------|---------------------|----------|----------------------|
| [2]                       | $2.8 \times 2.8$                | 100/170                  | -11     | 1.8/2.4             | 36       | NVidia GTX Titan GPU |
| This work (baseline)      | 3.2 × 3                         | 150/170                  | -15     | 1/1.5               | 12       | NVidia V100 GPU      |
| This work ( $w_c = 0.3$ ) | 3.2 × 3                         | 120/150                  | -15     | 0.75/1.5            | 12       | NVidia V100 GPU      |



**Fig. 10.** Measured, normalized, and smoothed transmission spectra of the fabricated 1310 nm/1550 nm wavelength duplexers under different  $\pm 10$  nm boundary bias for the 15 designs optimized from different random initial parameters for (a) baseline and (b) energy constraint optimization.



**Fig. 11.** (a) Illustration of the dimensions of the input and output waveguides (in red) and the design region (in yellow) for the CWDM demultiplexer. (b) Illustration of the wavelength points at which the optimization is conducted for the CH1, CH2, and CH3, respectively.

#### C. C-Band CWDM Demultiplexer

1. Optimization Results for Three-Channel Demultiplexer In this section, we optimize a TE mode three-channel C-band CWDM demultiplexer with 50 nm channel spacing which is more complex compared to the MC and the 1310 nm/ 1550 nm wavelength duplexer [32]. The dimension of the input and output waveguides, the channel specification, and the design region size are shown in Fig. 11(a). The design region is set to 6.2  $\mu$ m × 5.4  $\mu$ m similar to a previous publication [32] which is sufficient for this device as validated by the following optimization results. The distance between adjacent output waveguides is set to be 2  $\mu$ m to ensure low coupling between them. Here the optimization seeks to maximize the transmission of 1500 nm/1550 nm/1600 nm light into CH1/CH2/CH3 while minimizing the transmission from other wavelength channels into the corresponding channel. The electromagnetic objective function is defined accordingly as Eq. (4). To achieve a flattop passband, the optimization is carried out at six wavelength points with two wavelength points for CH1, CH2, and CH3, respectively, as shown in Fig. 11(b),

**Research Article** 

$$F_{\rm EM}(\mathbf{E}_{1500}, \mathbf{E}_{1550}, \mathbf{E}_{1600})$$

$$= |\eta(\mathbf{E}_{1500}, \mathbf{H}_{1500, {\rm CH1}}) - 1|^{2} + |\eta(\mathbf{E}_{1550}, \mathbf{H}_{1550, {\rm CH2}}) - 1|^{2}$$

$$+ |\eta(\mathbf{E}_{1600}, \mathbf{H}_{1600, {\rm CH3}}) - 1|^{2}$$

$$+ |\eta(\mathbf{E}_{1500}, \mathbf{H}_{1500, {\rm CH2}}) - 0|^{2} + |\eta(\mathbf{E}_{1500}, \mathbf{H}_{1500, {\rm CH3}}) - 0|^{2}$$

$$+ |\eta(\mathbf{E}_{1550}, \mathbf{H}_{1550, {\rm CH1}}) - 0|^{2}$$

$$+ |\eta(\mathbf{E}_{1550}, \mathbf{H}_{1550, {\rm CH3}}) - 0|^{2} + |\eta(\mathbf{E}_{1600}, \mathbf{H}_{1600, {\rm CH1}}) - 0|^{2}$$

$$+ |\eta(\mathbf{E}_{1600}, \mathbf{H}_{1600, {\rm CH2}}) - 0|^{2}.$$
(4)

In this device, only two optimization strategies are considered: gray scale only optimization and energy constraint optimization. The baseline strategy is not included here as it does not converge to a binarized design within the allocated computation resources. Further increasing the iterations in the binarization stage may help, but it will add additional computation resources. Instead of fixing the energy constraint coefficient  $w_c$  to 0.3 as in Section 2.B, here we vary the energy constraint coefficient from 0 to 0.9 to investigate the improvements. Note that gray scale only optimization can be viewed as a special case for energy constraint (when the energy constraint



**Fig. 12.** (a) Optimized CWDM demultiplexer layouts for different strategies (note that  $w_c = 0$  corresponds to the gray scale only optimization); (b) corresponding energy distribution inside the optimized designs with different energy constraint coefficients; (c) corresponding simulated transmission spectrum for the optimized designs with different parameters  $w_c$ .

coefficient is zero). Each of the optimizations takes approximately 48 h on our GPU cluster.

The optimization results for the energy constraint with different energy constraint coefficients are shown in Fig. 12. Compared with gray scale only optimization ( $w_c = 0$ ), applying energy constraint clearly shows improvement to binarization of the designs. The corresponding energy distributions inside the design region are shown in Fig. 12(b) (blue, 1500 nm; green, 1550 nm; red, 1600 nm). From the results, one can see how applying energy constraint promotes binarization for the optimization as well as confining energy inside the silicon with less interaction with the core/cladding boundary. Another finding is that the island and holes inside the device are becoming more connected as the energy constraint coefficient increases. This can be understood as a well-connected device allowing energy to exist inside the silicon and not go outside of the core medium. The simulated transmission spectra after optimization are shown in Fig. 12(c), where increasing the energy constraint coefficient slightly degrades the optimized device's optical performance. Note that the results are obtained by simulating the gray scale permittivity distribution. Specifically, when the energy constraint coefficient increases from 0 to 0.9, the IL degrades by 0.8 dB from 0.2 dB to 1 dB, and the cross talk degrades by 11.0 dB from -23.0 dB to -12.0 dB. Interestingly, however, the 0.5 dB bandwidth seems to increase slightly from 26 nm to 30 nm as the energy constraint coefficient is increased from 0 to 0.9. The backreflections obtained in the simulation are at most -20 dB for any  $w_c$  at the operating wavelengths, and the corresponding radiation downward to the substrate is at most -22 dB. As mentioned, the performance is quantified by the worst performance, including the maximum IL, cross talk, back-reflection, radiation downward toward the substrate, and smallest bandwidth among the three channels.

The optimized devices near binarization are converted to GDS files. For this type of device, we only present three scenarios with energy constraint coefficients  $w_c = 0.5$ , 0.7, and 0.9 as these devices are close to being binarized at the place where the energy exists so that we expect less performance degradation during the GDS transferring process. Then robustness toward  $\pm 10$  nm fabrication error is first investigated using simulation. As shown in the simulation results (Fig. 13), the spectrum shifts are reduced with the increase of the energy constraint coefficient. Specifically, the spectrum shifts are decreased from 42 nm to 19 nm, from 50 nm to 21 nm, and from 35 nm to 23 nm for channels CH1 (1500 nm), CH2 (1550 nm), and CH3 (1600 nm), respectively. The results are summarized in Table 3 as a comparison with the experimental results.



**Fig. 13.** Simulated transmission spectrum of the optimized CWDM demultiplexers under different  $\pm 10$  nm and 0 nm boundary biases optimized with different energy constraint coefficients for CH1 (left), CH2 (middle) and CH3 (right), (a)  $w_c = 0.5$ , (b)  $w_c = 0.7$ , and (c)  $w_c = 0.9$ , respectively.

| Channels | Spectrum Shifts (nm) |             |               |               |             |               |  |  |
|----------|----------------------|-------------|---------------|---------------|-------------|---------------|--|--|
|          |                      | Simulated   |               | Measured      |             |               |  |  |
|          | $w_{c} = 0.5$        | $w_c = 0.7$ | $w_{c} = 0.9$ | $w_{c} = 0.5$ | $w_c = 0.7$ | $w_{c} = 0.9$ |  |  |
| CH1      | 42                   | 28          | 19            | 40            | 27          | 20            |  |  |
| CH2      | 50                   | 34          | 21            | 60            | 32          | 21            |  |  |
| CH3      | 35                   | 30          | 23            | 36            | 30          | 21            |  |  |





**Fig. 14.** (a) SEM images of the fabricated CWDM demultiplexer for  $w_c = 0.5, 0.7, 0.9$ ; (b) zoom-in SEM images with the overlap of the desired boundary; (c)–(e) measured transmission spectrum of the nominal demultiplexers optimized with energy constraint coefficients  $w_c = 0.5, 0.7, 0.9$ .

## 2. Experimental Results for Three-Channel C-band Demultiplexer

To characterize the performance of the TE mode three-channel C-band CWDM demultiplexer, two tunable lasers (Santec ESband laser TSL-550, Keysight C-band laser 8164B) are used to cover the spectrum range from 1450 nm to 1650 nm. Moreover, the spectrum shifts under  $\pm 10$  nm fabrication biases are used to quantify the robustness toward fabrication error. For this device, optimized designs with energy constraint coefficients  $w_c$  of 0.5, 0.7, and 0.9 are investigated as nearly binarized designs are obtained at the end of the optimization. Figure 14(a) shows the SEM images of the fabricated CWDM wavelength demultiplexer. In addition, we show the overlap of the desired boundary and fabricated boundary in Fig. 14(b) to investigate fabrication errors. The results show that the fabrication error is nonuniform within the device footprint.

Figure 14(c) shows the normalized transmission spectrum of the optimized CWDM demultiplexers with energy constraint coefficients 0.5, 0.7, and 0.9. Note that the gap in the transmission spectrum from 1480 nm to 1490 nm is due to the limitation of the tunable bandwidth range for the two tunable lasers, which also applies to all the measurements throughout this paper. The measured ILs are 1.6 dB, 1.5 dB, and 1.8 dB for  $w_c = 0.5, 0.7, 0.9$ , respectively. The corresponding cross talk is -19.1 dB, -16.3 dB, and -11.5 dB. One may notice that the center wavelength shows a blueshift compared with simulation results in Fig. 12. This is because the fabrication process has an over-etching fabrication error. To compare our experimental

 Table 4.
 Comparison of the Performance of the CWDM Demultiplexers with Previous Reported Inversely Designed

 CWDM Demultiplexers
 CWDM Demultiplexers

|                           | Footprint          | 1-dB/3-dB    |         |         | Channel      |          |                      |
|---------------------------|--------------------|--------------|---------|---------|--------------|----------|----------------------|
| Ref.                      | (μm <sup>2</sup> ) | BW (nm)      | XT (dB) | IL (dB) | Spacing (nm) | Time (h) | Hardware             |
| [32]                      | 5.5 × 4.5          | Not reported | -11.0   | 2.82    | 40           | 60       | NVidia GTX Titan GPU |
| This work ( $w_c = 0.5$ ) | $6.2 \times 5.4$   | 19/32        | -19.1   | 1.9     | 50           | 48       | NVidia V100 GPU      |
| This work ( $w_c = 0.7$ ) | $6.2 \times 5.4$   | 30/43        | -16.3   | 1.2     | 50           | 48       | NVidia V100 GPU      |
| This work $(w_c = 0.9)$   | $6.2 \times 5.4$   | 30/47        | -11.5   | 1.8     | 50           | 48       | NVidia V100 GPU      |



**Fig. 15.** Measured transmission spectrum of the fabricated CWDM demultiplexers under different  $\pm 10$  nm and 0 nm boundary biases optimized with different energy constraint coefficients for CH1 (left), CH2 (middle), and CH3 (right), respectively, (a)  $w_c = 0.5$ , (b)  $w_c = 0.7$ , and (c)  $w_c = 0.9$ .

results with the performance of other inverse designed CWDM demultiplexers in the literature, we summarize the performance of our device and previous reported works in Table 4.

The robustness toward fabrication error is investigated next. Figure 15 shows the measured and normalized transmission spectrum for the CWDM demultiplexer under  $\pm 10$  nm different fabrication biases for optimized designs with energy constraint coefficients 0.5, 0.7, and 0.9. The results clearly



**Fig. 16.** Measured spectrum shifts due to  $\pm 10$  nm etching changes as a function of the energy constraint coefficient  $w_c$  for CH1 (blue), CH2 (green), and CH3 (red).

show how increasing the energy constraint coefficient improves the robustness toward fabrication variations. Specifically, the measured spectrum shifts decrease from 40 nm to 20 nm, from 60 nm to 21 nm, and from 36 nm to 21 nm for CH1, CH2, and CH3 channel, respectively. These results closely follow the simulated predictions shown in Fig. 13. To clearly illustrate the improvement of the robustness, we plot the spectrum shifts as a function of energy constraint coefficients in Fig. 16.

#### 3. CONCLUSION

In summary, we present the experimental validation of the benefits of applying energy constraint in topological inverse design. Both the theoretical implementation details and the experimental results are presented. By applying this constraint, most of the energy stays inside silicon regions providing several advantages. First, it promotes binarization in the optimization process, thereby reducing the required computation resources. Second, it generally increases the size of the features. Finally, it improves robustness to fabrication imperfections due to process variations. Three components are optimized to validate the improvement: (1) a broadband TE0 to TE1 MC, (2) a TE mode 1310 nm/1550 nm wavelength duplexer, and (3) a TE mode three-channel CWDM demultiplexer with 50 nm channel spacing. In the MC example, we demonstrate that the applied energy constraint helps binarize the structure eliminating the additional binarization stage and reducing the number of small features. For the 1310 nm/1550 nm wavelength duplexer, aside from the benefits mentioned above, we also illustrate the improvements to robustness toward fabrication errors when energy constraint is applied. Finally, using the CWDM demultiplexer, we show how robustness toward fabrication imperfections can be further improved by increasing the energy constraint coefficient. We believe our work experimentally validates the benefits of simultaneously optimizing the device's optical performance as well as the field distribution inside the design region to achieve more practical designs.

#### APPENDIX A: PRINCIPLES OF INVERSE DESIGN METHOD WITH ENERGY CONSTRAINT

The formalism of the baseline method and the implementation details of the 3D optimization method with our energy constraint are given in this appendix, building on the 2D formulation published previously [40]. In general, nanophotonic device design/optimization is equivalent to changing the permittivity distribution within the design region to achieve predefined device performance. Following this logic, we describe how the permittivity is constructed from the design variables and then how the optimization of the design variables is conducted.

#### **1. Design Space Parameterization**

In this paper, the density-based topological method [27–30] is employed as it provides a larger possible design space to explore compared to the level-set method. The design space parameterization process is shown in Fig. 17. In this method, a normalized design variable  $\rho$  (0< $\rho$ <1) is employed to parameterize the permittivity distribution inside the predefined design region *D*, as enclosed by the blue dashed line in Fig. 17(a). The design variable  $\rho$  is first filtered to generate  $\tilde{\rho}$ . Then,  $\tilde{\rho}$ is transformed into a physically feasible distribution  $\bar{\rho}$  through a processing step called projection. The relations of  $\bar{\rho}$ , the filtered field  $\tilde{\rho}$ , and the design variable  $\rho$  are defined as follows:



**Fig. 17.** Top view illustration of filtering and projection on a predefined design region (dash blue line): (a) design variable of material distribution  $\rho$  ranging between 0 and 1, (b) the variable of material distribution  $\tilde{\rho}$  after filtering using a hat-shaped filter h(x, y) shown in the insert, (c) the variable of material distribution  $\bar{\rho}$ , which is the projection of the filtered material distribution  $\tilde{\rho}$ . The inset for projection shows how changing  $\beta$  will help binarize the structure.

$$\tilde{\rho}_i = \frac{\sum_{j \in D} h_{ij} \rho_j}{\sum_{j \in D} h_{ij}},$$
(A1)

$$b_{ij} = \begin{cases} R - ||\mathbf{r}_i - \mathbf{r}_j||, \ ||\mathbf{r}_i - \mathbf{r}_j|| \le R\\ 0, \ \text{otherwise} \end{cases},$$
(A2)

$$\bar{\rho}_i = \frac{\tanh(\beta\gamma) + \tanh(\beta(\bar{\rho}_i - \gamma))}{\tanh(\beta\gamma) + \tanh(\beta(1 - \gamma))},$$
(A3)

$$\varepsilon_i = \varepsilon_{\text{SiO}_2} + (\varepsilon_{\text{Si}} - \varepsilon_{\text{SiO}_2})\bar{\rho}_i,$$
 (A4)

where h is the 2D linear hat-shaped filter [27] with its coefficients defined in Eq. (A2).  $||\mathbf{r}_i - \mathbf{r}_i||$  defines the distance between two points in the design region, i and j, where **r** is the Euclidean vector from the origin of a corresponding point. According to Eqs. (A1) and (A2), the *i*th value of the filtered distribution  $\tilde{\rho}$  is the weighted summation of the elements that lie within the radius R, where R is the radius of the linear hatshaped filter and defines the strength of the weighting in Eq. (A2). We choose the filter radius R to be 200 nm. Gaussian blurring is another commonly used technique for achieving similar goals and producing comparable results [30,33]. We show the continuous distribution of the hat-shaped filter h(x, y) in the inset of Fig. 17.  $\beta$  in Eq. (A3) is the parameter that controls the strength of the projection, and  $\gamma$  is a parameter between 0 and 1 that defines the midpoint of the projection (set to 0.5 and referred to as the projection threshold). Increasing the parameter  $\beta$  corresponds to strengthening the projection as shown in the inset between Figs. 17(b) and 17(c). The permittivity distribution  $\varepsilon_i$  inside the design region is generated as Eq. (A4), where  $\varepsilon_{SiO_2}$  and  $\varepsilon_{Si}$  represent the permittivity of silicon and silicon dioxide, respectively.

#### 2. Gray Scale and Baseline Optimization Methods for Nanophotonic Devices

In the conventional optimization of nanophotonic devices, the objective function describes the desired performance of the device and is often related to the pre-defined function of electromagnetic field distributions. After defining this performance objective function  $F_{\rm obj}(\mathbf{E})$ , the optimization can be summarized as Eqs. (A5a)–(A5c). To be consistent, the optimization of nanophotonic devices is set as a minimization problem of the objective function  $F_{\rm obj}(\mathbf{E})$  with respect to the design variable  $\rho$  throughout this paper,

$$\min_{\mathbf{a}} F_{\mathrm{obj}}(\mathbf{E}) = F_{\mathrm{EM}}(\mathbf{E}), \qquad (A5a)$$

subject to Maxwell's equation in the frequency domain excited by the source  $\mathbf{J}$ ,

$$\nabla \times \frac{1}{\mu_r \mu_0} \nabla \times \mathbf{E} - \omega^2 \varepsilon_0 \boldsymbol{\varepsilon}_r(\rho) \mathbf{E} = -i\omega \mathbf{J},$$
 (A5b)

$$0 < \rho < 1.$$
 (A5c)

The optimization problem in Eqs. (A5a)–(A5c) is solved using an iterative quasi-Newton method, limited-memory Broyden–Fletcher–Goldfarb–Shanno with a box constraint

1799

(L-BFGS-B) algorithm. However, solving the optimization problem above, which is referred to as gray scale optimization in this paper, does not generate a binarized design that is ready for fabrication [31]. A prevalent solution proposed in Ref. [31] is to divide the optimization into two steps, which is also used in our paper and referred to as a baseline optimization strategy. In the first step, the optimization runs with a fixed small binarization parameter  $\beta$ , whereas the parameter  $\beta$  increases gradually in the second step, which slowly binarizes the design by removing the intermediate permittivity values. In this work, a projection strength parameter  $\beta$  of 20 is used for gray scale only optimization. For the baseline optimization, the optimization runs with a projection strength parameter  $\beta$  of 20 for a preset number of iterations. The optimized device is then used as the initial design for the binarization process, which can be further divided into five sub-optimizations to gradually binarize the optimized design. After each sub-optimization, the projection strength parameter  $\beta$  increases following Eq. (A6), where  $\beta_k$  represents  $\beta$  of the sub-optimization k from 0 to 4:

$$\beta_{k+1} = 4\beta_k$$
 (k = 0, 1, 2, 3, 4). (A6)

#### 3. Density Topology Optimization with Energy Constraint

Conventional method described above has a few shortcomings. As mentioned, the final designs may have intermediate permittivity. Then, the final designs typically include small features. Besides, the robustness toward fabrication variations is not guaranteed. To tackle these problems, we proposed to apply energy constraint during the optimization process [40]. With this constraint, not only is the device performance related to the electromagnetic distribution at the output waveguide optimized but the energy of the electromagnetic distribution inside the design domain is also optimized. We seek to increase the energy inside silicon or equivalently decrease the energy inside silicon dioxide during the optimization process. Several advantages are observed by applying energy constraint. First, the final designs are typically binarized with energy constraint. Second, there is a significant reduction in the presence of small features as they often come with less field confinement, which is disfavored by energy constraint. Finally, the robustness toward boundary shifts is improved as less energy crosses the boundary between the core and cladding materials.

According to our motivation above, the objective function for the energy constraint  $F_{\text{energy}}$  is defined by Eq. (A7). The theoretical detail can be found in our previous work [40],

$$F_{\text{energy}} = \frac{\sum_{i \in D} \frac{1}{2} (1 - \bar{\rho}_i) \varepsilon_i |E_i|^2}{\sum_{i \in D} \frac{1}{2} \varepsilon_i |E_i|^2},$$
 (A7)

where the numerator represents the energy stored in the electrical field inside the silicon dioxide region while the denominator represents the energy in all the design domain. During the optimization process, this objective function is minimized, having the following impacts. First, the energy prefers to exist in the material region being as close to silicon as possible (e.g.,  $\bar{\rho}_i \rightarrow 1$ ) as it has less contribution toward  $F_{\rm energy}$ . Second, this constraint encourages us to maximize  $\bar{\rho}_i$ , which is equivalent to binarizing the design. Finally, this constraint can be applied throughout the optimization without the need

to add another binarization process by gradually increasing the parameter  $\beta$ . For the energy constraint optimization strategy, the projection strength  $\beta$  is fixed to 20.

With the energy constraint objective term  $F_{\text{energy}}$  from Eq. (A7), the overall optimization is formulated as [6,40]

$$\underset{\rho}{\min} F_{\text{obj}}(\mathbf{E}) = (1 - w_c) \times F_{\text{EM}}(\mathbf{E}) + w_c \times F_{\text{energy}}(\rho, \mathbf{E}),$$
(A8)

subject to Eqs. (A5b) and (A5c), where  $w_c$  is a weight factor that controls the contribution of  $F_{energy}$  to the overall optimization objective function. Larger  $w_c$  means that the algorithm emphasizes more the energy constraint objective during the optimization.  $F_{\rm EM}(\mathbf{E})$  represents the electromagnetic (EM) objective function, which is often related to the pre-defined function with respect to the device performance, such as transmission or spectral profile. We refer to  $w_c$  as the energy constraint coefficient in this paper. Instead of adding the energy constraint as a penalty term in the objective function, it can be directly applied in the constraints together with Maxwell's equations. In this case, the overall optimization problem can be solved using the method of moving asymptotes (MMA) [48–50].

#### 4. Implementation Detail of the 3D Optimization

In this section, we present the implementation details of the proposed energy constraint for nanophotonic devices optimization where the EM simulations are performed in 3D. Specifically, the optimization problems described by Eqs. (A5a)-(A5c) and (A8) are solved using an iterative quasi-Newton method, the L-BFGS-B algorithm. In each iteration, the gradient of the objective function over design parameters  $\rho$ , which is still a 2D distribution in the SOI platform, is first calculated. Then the L-BFGS-B optimizer updates the design parameters according to the gradient information from the current iteration and the estimated second-order information of the objective function according to the gradient information from previous iterations. Note that we have shown the calculation of gradient using either automatic differentiation or adjoint method in our previous work for 2D optimization scenarios [40]. For the 3D case, the overall problem to solve is too large. Indeed, using only the automatic differentiation is impractical to calculate the gradient; thus, we use a hybrid method in which the automatic differentiation and adjoint method are combined to calculate the gradient. In our previous work [40], we show that the gradient of the objective function with energy constraint can be expressed as

$$\frac{\mathrm{d}F_{\mathrm{obj}}}{\mathrm{d}\rho} = \sum_{i} \frac{\mathrm{d}F_{\mathrm{obj}}}{\mathrm{d}\varepsilon_{i}} \frac{\mathrm{d}\varepsilon_{i}}{\mathrm{d}\rho},\tag{A9}$$

$$\frac{\mathrm{d}F_{\mathrm{obj}}}{\mathrm{d}\varepsilon_{i}} = \frac{\partial F_{\mathrm{obj}}}{\partial\varepsilon_{i}} + 2\mathrm{Re}\left[\frac{\partial F_{\mathrm{obj}}}{\partial\mathbf{E}} \left(\nabla \times \frac{1}{\mu_{r}\mu_{0}} \nabla \times -\omega^{2}\varepsilon_{0}\varepsilon_{r}(\rho)\right)^{-1} \omega^{2}\varepsilon_{0}\frac{\mathrm{d}\varepsilon_{r}}{\mathrm{d}\varepsilon_{i}}\mathbf{E}\right],$$
(A10)

where **E** is the simulated field distribution, also called forward field [1-15]. These expressions are valid for either 2D or 3D implementation. To calculate the gradient of the objective

**Research Article** 



**Fig. 18.** (a) Illustration of continuous distribution from optimization; (b) illustration of the GDS transferring process using Eq. (A15); (c) illustration of boundary bias erosion (-10 nm), normal (0 nm), and dilation (+10 nm) at the cross section position; (d) and (e) zoom-in illustration for part of the design boundary.

function with respect to the design parameters  $dF_{obj}/d\rho$ , it is decomposed into two terms using the chain rule as shown in Eq. (A9). The gradient of permittivity  $dF_{obj}/d\varepsilon_i$  can be expressed as Eq. (A10) according to Ref. [40]. By defining a new field distribution  $\mathbf{E}_{adjoint}$  as shown in Eq. (A11), Eq. (A10) can be further simplified as Eq. (A12). Note that  $\mathbf{E}_{adjoint}$  is referred to as the adjoint field [1–15],

$$\mathbf{E}_{\text{adjoint}}^{T} = \frac{\partial F_{\text{obj}}}{\partial \mathbf{E}} \left( \nabla \times \frac{1}{\mu_{r} \mu_{0}} \nabla \times -\omega^{2} \varepsilon_{0} \boldsymbol{\varepsilon}_{r}(\rho) \right)^{-1}, \quad \text{(A11)}$$

$$\frac{\mathrm{d}F_{\mathrm{obj}}}{\mathrm{d}\varepsilon_i} = \frac{\partial F_{\mathrm{obj}}}{\partial\varepsilon_i} + 2\mathrm{Re}\left[\mathbf{E}_{\mathrm{adjoint}}^T \boldsymbol{\omega}^2 \varepsilon_0 \frac{\mathrm{d}\boldsymbol{\varepsilon}_r}{\mathrm{d}\varepsilon_i} \mathbf{E}\right].$$
 (A12)

The gradient of the objective function with respect to permittivity is calculated following Eq. (A12). The first term  $\partial F_{\rm obj}/\partial \varepsilon_i$ , which is the partial derivative of the objective function with respect to the permittivity, is calculated using automatic differentiation provided by Deep Learning Toolbox of MATLAB [51]. The forward and adjoint fields are solved using an open-source package Maxwell FDFD in MATLAB [52]. In this work, a uniform grid spacing of 40 nm is selected for the simulation. The same grid spacing is also used to evaluate the performance in Ansys Lumerical FDTD after the optimization. Solving Maxwell's equations for forward and adjoint problems is done using one GPU node of the Tesla NVIDIA V100 GPU [53]. Note that the first term  $\partial F_{obj}/\partial \varepsilon_i$  is zero for the baseline case as the objective function is only related to the EM field at the output position, whereas it is nonzero for the energy constraint case. The overall gradient is then calculated according to Eq. (A12). After the gradient is calculated, the optimization is carried out using MATLAB built-in L-BFGS-B algorithm.

#### 5. Characterization Method for Optimization Results

As density topology-based optimization is used, intermediate permittivity often exists in the final optimized designs. To quantify how well the optimized designs are binarized, we define

$$b = 1 - \frac{\sum_{i \in D} 4\bar{\rho}_i (1 - \bar{\rho}_i)}{N},$$
 (A13)

where N is the total number of design parameters in the design region D. The parameter b indicates how well the final design is binarized and varies between 0 and 1. When b equals 1, the design is fully binarized, which means the design region consists of either silicon or silicon dioxide, whereas, when b equals 0, the design is the least binarized, meaning that the permittivity distribution inside the design domain is exactly in the middle between silicon and silicon dioxide. We refer to b as the binarization level throughout this paper.

In this work, IL is used to characterize the performance of the optimized andabricated devices. It is defined as the average IL over the whole operating wavelength band,

IL = 
$$-\frac{1}{n} \sum_{i=1}^{n} 10 \log_{10}(P^{i}_{\text{desired}_{\text{mode}}}/P^{i}_{\text{input}}),$$
 (A14)

where *i* represents the *i*th wavelength point,  $P_{input}^{i}$  represents the input power,  $P_{desired\_mode}^{i}$  is the output power on the desired output mode at the desired output port, and *n* represents the overall simulated wavelength points. For MC, *n* is 3. For the 1310 nm/1550 nm wavelength duplexer and CWDM demultiplexer, *n* is 8 and 6, respectively.

To evaluate the tolerance to fabrication imperfections, the final optimized designs are transferred to a layout file (e.g., GDS file) using direct binarization shown in Eq. (A15) below. Then, the device is biased to mimic the fabrication imperfections. Specifically, the boundary is biased by  $-\Delta$  (in nm) (erosion/over-etching  $\Delta$ ), 0 nm (normal-etching), and  $+\Delta$  (dilation/under-etching  $\Delta$ ). We evaluate the performance using a commercial software Ansys Lumerical FDTD [46]. Figure 18 illustrates the boundary bias of an optimized design,

$$\boldsymbol{\varepsilon}(x,y) = \begin{cases} \varepsilon_{\mathrm{Si}} &, \ \tilde{\rho}(x,y) \ge 0.5\\ \varepsilon_{\mathrm{SiO}_2} &, \ \tilde{\rho}(x,y) < 0.5 \end{cases}$$
(A15)

**Funding.** National Research Council Canada (AI for Design and HTSN Challenge Programs); China Scholarship Council.

**Disclosures.** The authors declare that there are no conflicts of interest related to this paper.

**Data Availability.** Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

#### REFERENCES

- J. Lu and J. Vučković, "Nanophotonic computational design," Opt. Express 21, 13351–13367 (2013).
- A. Y. Piggott, J. Lu, K. G. Lagoudakis, J. Petykiewicz, T. M. Babinec, and J. Vučković, "Inverse design and demonstration of a compact and broadband on-chip wavelength demultiplexer," Nat. Photonics 9, 374– 377 (2015).
- S. Molesky, Z. Lin, A. Y. Piggott, W. Jin, J. Vucković, and A. W. Rodriguez, "Inverse design in nanophotonics," Nat. Photonics 12, 659–670 (2018).
- D. Melati, Y. Grinberg, M. K. Dezfouli, S. Janz, P. Cheben, J. H. Schmid, and D. X. Xu, "Mapping the global design space of nanophotonic components using machine learning pattern recognition," Nat. Commun. **10**, 4775 (2019).
- A. Y. Piggott, E. Y. Ma, L. Su, G. H. Ahn, N. V. Sapra, D. Vercruysse, and J. Vuckovic, "Inverse-designed photonics for semiconductor foundries," ACS Photon. 7, 569–575 (2020).
- L. Su, D. Vercruysse, J. Skarda, N. V. Sapra, J. A. Petykiewicz, and J. Vučković, "Nanophotonic inverse design with SPINS: software architecture and practical considerations," Appl. Phys. Rev. 7, 011407 (2020).
- 7. B. Shen, P. Wang, R. Polson, and R. Menon, "An integrated-nanophotonics polarization beamsplitter with 2.4  $\times$  2.4  $\mu m^2$  footprint," Nat. Photonics **9**, 378–382 (2015).
- W. Chang, S. Xu, M. Cheng, D. Liu, and M. Zhang, "Inverse design of a single-step-etched ultracompact silicon polarization rotator," Opt. Express 28, 28343–28351 (2020).
- M. P. Bendsoe and O. Sigmund, *Topology Optimization: Theory,* Methods, and Applications (Springer, 2013).
- J. S. Jensen and O. Sigmund, "Topology optimization for nanophotonics," Laser Photon. Rev. 5, 308–321 (2011).
- S. Yang, H. Jia, L. Zhang, J. Dai, X. Fu, T. Zhou, G. Zhang, and L. Yang, "Gradient-probability-driven discrete search algorithm for on-chip photonics inverse design," Opt. Express 29, 28751–28766 (2021).
- L. Cheng, S. Mao, Z. Chen, Y. Wang, C. Zhao, and H. Y. Fu, "Ultra-compact dual-mode mode-size converter for silicon photonic few-mode fiber interfaces," Opt. Express 29, 33728–33740 (2021).
- N. Zhao, S. Boutami, and S. Fan, "Efficient method for accelerating line searches in adjoint optimization of photonic devices by combining Schur complement domain decomposition and Born series expansions," Opt. Express 30, 6413–6424 (2022).
- S. Boutami, N. Zhao, and S. Fan, "Determining the optimal learning rate in gradient-based electromagnetic optimization using the Shanks transformation in the Lippmann–Schwinger formalism," Opt. Lett. 45, 595–598 (2020).
- N. Zhao, S. Boutami, and S. Fan, "Accelerating adjoint variable method based photonic optimization with Schur complement domain decomposition," Opt. Express 27, 20711–20719 (2019).
- Y. Zhang, S. Yang, A. E. J. Lim, G. Q. Lo, C. Galland, T. Baehr-Jones, and M. Hochberg, "A compact and low loss Y-junction for submicron silicon waveguide," Opt. Express 21, 1310–1316 (2013).
- P. Sanchis, P. Villalba, F. Cuesta, A. Håkansson, A. Griol, J. V. Galán, A. Brimont, and J. Martí, "Highly efficient crossing structure for silicon-on-insulator waveguides," Opt. Lett. 34, 2760–2762 (2009).
- Y. Zhang, S. Yang, E. J. Lim, G. Lo, T. Baehr-Jones, and M. Hochberg, "A CMOS-compatible, low-loss and low-crosstalk silicon waveguide crossing," IEEE Photon. Technol. Lett. 25, 422–425 (2013).

- P. I. Borel, A. Harpøth, L. H. Frandsen, M. Kristensen, P. Shi, J. S. Jensen, and O. Sigmund, "Topology optimization and fabrication of photonic crystal structures," Opt. Express 12, 1996–2001 (2004).
- J. S. Jensen and O. Sigmund, "Systematic design of photonic crystal structures using topology optimization: low-loss waveguide bends," Appl. Phys. Lett. 84, 2022–2024 (2004).
- M. Gerken and D. A. Miller, "Multilayer thin-film structures with high spatial dispersion," Appl. Opt. 42, 1330–1345 (2003).
- C. M. Lalau-Keraly, S. Bhargava, O. D. Miller, and E. Yablonovitch, "Adjoint shape optimization applied to electromagnetic design," Opt. Express 21, 21693–21701 (2013).
- O. D. Miller, "Photonic design: from fundamental solar cell physics to computational inverse design," Ph.D. thesis (University of California at Berkeley, 2012).
- A. Y. Piggott, J. Petykiewicz, L. Su, and J. Vučković, "Fabricationconstrained nanophotonic inverse design," Sci. Rep. 7, 1786 (2017).
- D. Vercruysse, N. V. Sapra, L. Su, R. Trivedi, and J. Vučković, "Analytical level set fabrication constraints for inverse design," Sci. Rep. 9, 8999 (2019).
- N. V. Sapra, D. Vercruysse, L. Su, K. Y. Yang, J. Skarda, A. Y. Piggott, and J. Vučković, "Inverse design and demonstration of broadband grating couplers," IEEE J. Sel. Top. Quantum Electron. 25, 6100207 (2019).
- L. F. Frellsen, Y. Ding, O. Sigmund, and L. H. Frandsen, "Topology optimized mode multiplexing in silicon-on-insulator photonic wire waveguides," Opt. Express 24, 16866–16873 (2016).
- J. L. P. Ruiz, A. A. Amad, L. H. Gabrielli, and A. A. Novotny, "Optimization of the electromagnetic scattering problem based on the topological derivative method," Opt. Express 27, 33586–33605 (2019).
- J. Huang, J. Yang, D. Chen, X. He, Y. Han, J. Zhang, and Z. Zhang, "Ultra-compact broadband polarization beam splitter with strong expansibility," Photon. Res. 6, 574–578 (2018).
- Y. Augenstein and C. Rockstuhl, "Inverse design of nanophotonic devices with structural integrity," ACS Photon. 7, 2190–2196 (2020).
- F. Wang, B. S. Lazarov, and O. Sigmund, "On projection methods, convergence and robust formulations in topology optimization," Struct. Multidiscip. Optim. 43, 767–784 (2011).
- L. Su, A. Y. Piggott, N. V. Sapra, J. Petykiewicz, and J. Vuckovic, "Inverse design and demonstration of a compact on-chip narrowband three-channel wavelength demultiplexer," ACS Photon. 5, 301–305 (2018).
- O. Sigmund, "Morphology-based black and white filters for topology optimization," Struct. Multidiscip. Optim. 33, 401–424 (2007).
- S. Boutami and S. Fan, "Efficient pixel-by-pixel optimization of photonic devices utilizing the Dyson's equation in a Green's function formalism. Part I. Implementation with the method of discrete dipole approximation," J. Opt. Soc. Am. B 36, 2378–2386 (2019).
- S. Boutami and S. Fan, "Efficient pixel-by-pixel optimization of photonic devices utilizing the Dyson's equation in a Green's function formalism. Part II. Implementation using standard electromagnetic solvers," J. Opt. Soc. Am. B 36, 2387–2394 (2019).
- S. Boutami, K. Hassan, C. Dupré, L. Baud, and S. Fan, "Experimental demonstration of silicon photonic devices optimized by a flexible and deterministic pixel-by-pixel technique," Appl. Phys. Lett. **117**, 071104 (2020).
- E. Khoram, X. Qian, M. Yuan, and Z. Yu, "Controlling the minimal feature sizes in adjoint optimization of nanophotonic devices using bspline surfaces," Opt. Express 28, 7060–7069 (2020).
- M. Zhou, B. S. Lazarov, F. Wang, and O. Sigmund, "Minimum length scale in topology optimization by geometric constraints," Comp. Methods Appl. Mech. Eng. 293, 266–282 (2015).
- M. Schevenels, B. S. Lazarov, and O. Sigmund, "Robust topology optimization accounting for spatially varying manufacturing errors," Comp. Methods Appl. Mech. Eng. 200, 3613–3627 (2011).
- G. Zhang, D.-X. Xu, Y. Grinberg, and O. Liboiron-Ladouceur, "Topological inverse design of nanophotonic devices with energy constraint," Opt. Express 29, 12681–12695 (2021).
- G. Zhang and O. Liboiron-Ladouceur, "Scalable and low crosstalk silicon mode exchanger for mode division multiplexing system enabled by inverse design," IEEE Photon. J. 13, 6601013 (2021).

- H. Jia, T. Zhou, X. Fu, J. Ding, and L. Yang, "Inverse-design and demonstration of ultracompact silicon meta-structure mode exchange device," ACS Photon. 5, 1833–1838 (2018).
- A. Michael and E. Yablonovitch, "Leveraging continuous material averaging for inverse electromagnetic design," Opt. Express 26, 31717–31737 (2018).
- 44. Applied Nanotools Inc., https://www.appliednt.com.
- J. Jhoja, Z. Lu, J. Pond, and L. Chrostowski, "Efficient layout-aware statistical analysis for photonic integrated circuits," Opt. Express 28, 7799–7816 (2020).
- 46. Lumerical, https://www.lumerical.com/.
- 47. T. V. Vaerenbergh, P. Sun, S. Hooten, M. Jain, Q. Wilmart, A. Seyedi, and R. Beausoleil, "Wafer-level testing of inverse-designed and

adjoint-inspired vertical grating coupler designs compatible with DUV lithography," Opt. Express **29**, 37021–37036 (2021).

- O. Sigmund and K. Maute, "Topology optimization approaches," Struct. Multidisc. Optim. 48, 1031–1055 (2013).
- J. P. Groen, C. R. Thomsen, and O. Sigmund, "Multi-scale topology optimization for stiffness and de-homogenization using implicit geometry modeling," Struct. Multidiscip. Optim. 63, 2919–2934 (2021).
- K. Svanberg, "A class of globally convergent optimization methods based on conservative convex separable approximations," SIAM J. Optim. 12, 555–573 (2002).
- 51. "MATLAB," https://www.mathworks.com/.
- 52. "MaxwellFDFD," https://github.com/wsshin/maxwellfdfd.
- 53. "Trixie," https://github.com/ai4d-iasc/trixie.