Adaptive gradient-based source and mask co-optimization with process awareness

Yijiang Shen; Fei Peng; Xiaoyan Huang; Zhenrong Zhang

doi:10.3788/COL201917.121102

Abstract

We develop a source and mask co-optimization framework incorporating the minimization of edge placement error (EPE) and process variability band (PV Band) into the cost function to compensate simultaneously for the image distortion and the increasingly pronounced lithographic process conditions. Explicit differentiable functions of the EPE and the PV Band are presented, and adaptive gradient methods are applied to break symmetry to escape suboptimal local minima. Dependence on the initial mask conditions is also investigated. Simulation results demonstrate the efficacy of the proposed source and mask optimization approach in pattern fidelity improvement, process robustness enhancement, and almost unaffected performance with random initial masks.

Optical microlithography is increasingly challenging with the ever growing integration intensity of semiconductor devices in the sub-22-nm technology node and low $k_{1}$ regime. To this end, resolution enhancement techniques (RETs)^[1,2] become essential for printing a good quality wafer image including modified illumination schemes, rule-based and model-based optical proximity correction (OPC)^[3]. Moving beyond model-based OPC, the inverse lithography technique (ILT)^[4,5] inverts the imaging model and attempts to directly synthesize the optimized mask pattern. With the development of pixelated sources^[6], source and mask optimization (SMO) becomes an integral part of ILT to improve the imaging performance by expanding the solution space of the source and mask with the joint optimization of the illumination and mask shapes^[7,8].

Various computational strategies including pixelated patterns^[7–10], pupil and mask topology compensation^[11], Zernike source representations^[12], wave front modulation^[13–15], and compressive sensing^[16,17] are incorporated into the SMO framework, which is readily solved by gradient-based methods^[18–21]. Special attentions have been paid to dose sensitivity^[18], defocus^[19], and dose-focus matrix^[22]. However, process variability band (PV Band), one important criterion for measuring process manufacturability indicating the physical representation of the layout sensitivity to process variations, is too complicated to be explicitly incorporated into the cost functions. Similarly, edge placement error (EPE) which evaluates the printed image contour under nominal conditions, is often excluded because of lack of differentiable formulations.

Gao et al.^[23] developed objective formulations of EPE and PV Band with a scalar lithographic imaging model. Practically, selections of the step size in gradient-based methods generally face the dilemma where too small step-size subjects slow convergence and too large step-size fluctuation is around the minimal or even divergence. Besides, for sparse source and mask patterns with very different feature frequencies, updating them to the same extent is not appropriate where large updates should be performed for rarely occurring features. Accordingly, adaptive gradient method such as AdaGrad performs smaller updates for frequently occurring features and large updates for infrequent ones, and adaptive moment estimation (Adam) computes adaptive learning rates by keeping exponentially decaying averages of past square gradients and momentum. Therefore stability and the ability to escape suboptimal minimals are duly detected in the updating process.

Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you！Sign up now

This Letter focuses on the application of adaptive gradient methods including Adam and AdaGrad to lithographic SMO, which simultaneously considers pattern design in terms of pattern error (PE), EPE, and process window. We present explicit formulations of differentiable functions for EPE and PV Band, whose closed-form gradients are subsequently developed with vector imaging formation. Source patterns, where usually more sparsity is observed, and mask patterns are updated with AdaGrad and Adam methods, respectively. We also investigate the stability of the optimization process and the ability to escape suboptimal local minima when random initial masks are applied. Simulations show that the proposed SMO approach improves pattern fidelity and the process window with enhanced stability and unaffected initial condition performance.

The wafer imaging process $T$ can be divided into two function blocks, namely the projection optics effects (coupling image formation) in Fig. 1 and resist effects. For a point source $(α_{s}, β_{s})$ emanating a polarized electric field, the coupling image $I_{c}$ can be described as^[3,20] $I_{c} = \frac{1}{J_{sum}} \sum_{α_{s}, β_{s}} J (α_{s}, β_{s}) \sum_{p = x, y, z} {‖ H_{p} (α_{s}, β_{s}) \otimes B (α_{s}, β_{s}) ⊙ M ‖}^{2},$ (1)where $J$ is an $N_{s} \times N_{s}$ scalar matrix representing the source pattern distribution, $J_{sum}$ is the sum of nonzero source intensities, $H_{p} (α_{s}, β_{s}), p = x, y, z$ are referred to as the equivalent filters, $B (α_{s}, β_{s})$ is the diffraction matrix to approximate the mask near-field, and ${‖ \cdot ‖}^{2}$ means taking a pixel-wise square of amplitude. The resist effect can be approximated using a logarithmic sigmoid function $sig (x) = \frac{1}{1 + e^{- a (x - t_{r})}}$ with $a$ being the steepness of the sigmoid function and $t_{r}$ being the threshold. Therefore, the wafer imaging formation $T (\cdot)$ is described as $I = T (J, M) = sig (I_{c})$ .

Figure 1.(a) Schematic of forward lithography. (b) Reflection from and transmission through a stratified medium.

Given a target pattern $I_{0} \in R^{N \times N}$ , the goal of the SMO is to find the optimal source $\hat{J} \in R^{N_{s} \times N_{s}}$ and mask pattern $\hat{M} \in R^{N \times N}$ , which minimize the measured dissimilarity or “score $(S)$ ” between $T (\cdot)$ and $I_{0}$ , namely, $(\hat{J}, \hat{M}) = \min_{J \in R^{N_{s} \times N_{s}}} \min_{M \in R^{N \times N}} S {T (J, M), I_{0}},$ (2)in which the formula of $S$ in this work is defined as $S = γ_{pe} S_{pe} {I_{0}, I} + γ_{epe} S_{epe} {I_{0}, I} + γ_{pv} S_{pv} {I_{0}, I},$ (3)where $S_{pe}$ , $S_{epe}$ , and $S_{pv}$ ensure pattern fidelity, minimize the EPE and the PV Band, respectively, and are weighted by predefined weight parameter $γ = {γ_{pe}, γ_{epe}, γ_{pv}}$ . Parametric transformations $M = 0.5 \times [1 + \cos (ω)]$ and $J = 0.5 \times [1 + \cos (θ)]$ , with $θ \in R^{N_{s} \times N_{s}}$ and $ω \in R^{N \times N}$ , are applied to reduce the binary-constrained optimization problems to unconstrained ones in the updating procedure.

$S_{pe}$ measures the sum of mismatches between $I$ and the desired one $I_{0}$ over all locations. For mathematical convenience, the square of the $l_{2}$ norm is frequently practiced in SMO, leading to the minimization of $S_{pe} (J, M) = 0.5 \times {‖ T (J, M) - I_{0} ‖}^{2} .$ (4)

The gradients of $S_{pe}$ with respect to $ω$ and $θ$ are $\frac{\partial S_{pe}}{\partial ω} = - \frac{a \sin ω}{2 J_{sum}} ⊙ \sum_{α_{s}, β_{s}} J \sum_{p = x, y, z} Real {B^{*} ⊙ {{H_{p}}^{* \circ} \otimes [E_{p} ⊙ (I - I_{0}) ⊙ I ⊙ (1 - I)]}},$ (5) $\frac{\partial S_{pe}}{\partial θ} = - \frac{\sin ω}{2} \sum_{x, y} [a \cdot (I - I_{0}) ⊙ I ⊙ (1 - I) ⊙ \frac{\sum_{p = x, y, z} {| E_{p} |}^{2} - I_{c}}{J_{sum}}],$ (6)where $⊙$ is entry-by-entry multiplication, $*$ is the conjugate operation, $\circ$ rotates the matrix in the argument by $180 °$ in both the horizontal and vertical directions, $\otimes$ is the convolution operation, $1 \in R^{N \times N}$ is the all-ones matrix, and $E_{p} (α_{s}, β_{s}) = H_{p} (α_{s}, β_{s}) \otimes B (α_{s}, β_{s}) ⊙ M$ .

$S_{epe}$ measures the geometrical distance of the image contour between $I_{0}$ and $I$ . However, lack of analytic formulation of a differentiable $S_{epe}$ often complicates the explicit incorporation of EPE minimization. To this end, we formulate EPE as illustrated in Fig. 2(a) to include image difference $D_{sum}$ in the horizontal and vertical inner image and outer image edges from sampled points on horizontal edges (HS) and vertical edges (VS). EPE violation is detected to be one when $D_{sum} \geq t_{e}$ , with $t_{e}$ being a predefined threshold and zero otherwise. $D_{sum}$ is computed for samples on vertical and horizontal edges within $L H$ and $L V$ , horizontal and vertical tolerable EPE segments depicted in Fig. 2(b). $L H$ and $L V$ are calculated according to the pattern edge set (PES) in Fig. 2(c) enwrapping the target pattern edge (TPE) in Fig. 2(d), under possible exposure latitude^[1] describing tolerable target pattern linewidth. Subsequently, $D_{sum}$ is calculated as $D_{sum} (i, j) = {\begin{matrix} \sum_{k = j - \frac{L V}{2}}^{j + \frac{L V}{2}} S_{pe} (i, k) & if (i, j) \in HS \\ \sum_{k = j - \frac{L H}{2}}^{j + \frac{L H}{2}} S_{pe} (k, j) & if (i, j) \in VS \end{matrix},$ (7)where $S_{pe} (i, k)$ is the image difference between sampled points on HS with horizontal coordinate $i$ and points in $LH$ with horizontal coordinate $i$ and vertical coordinate $k$ . With $S_{pe}$ defined in Eq. (4), $S_{pe} (k, j)$ is similarly defined. $S_{epe}$ is defined to be the summation of EPE violations (EPEVs) for all samples on HS and VS as $S_{epe} = \sum_{(HS, VS) \in TPE} EPEVs .$ (8)

Figure 2.(a) EPE measurement illustration. (b) Numerical superposition region. (c) Pattern edge set (PES). (d) Edges of target pattern $I_{02}$ in Fig. 4(c).

For $S_{epe}$ ’s differentiability, another sigmoid function ${sig}_{e} (x) = \frac{1}{1 + e^{- a_{e} (x - t_{e})}}$ is applied to $D_{sum}$ , removing the binary-value constraints on EPE with $a_{e}$ being the steepness and $t_{e}$ being the threshold of ${sig}_{e}$ . Consequently, the gradients of $S_{epe}$ with respect to $ω$ and $θ$ are calculated as $\frac{\partial S_{epe}}{\partial ϕ} = \sum_{(HS, VS) \in TPE} \sum_{(i, j) \in HS or VS} \frac{\partial {sig}_{e} (D_{sum (i, j)})}{\partial ϕ},$ (9)with $ϕ = ω$ or $θ$ and $\frac{\partial {sig}_{e} (D_{sum})}{\partial ϕ} = a_{e} {sig}_{e} (D_{sum}) [1 - {sig}_{e} (D_{sum})] \sum_{k = j - \frac{L V}{2}}^{j + \frac{L V}{2}} \frac{\partial S_{pe}}{\partial ϕ},$ (10)in which $\frac{\partial S_{pe}}{\partial ϕ}$ is defined in Eqs. (5) and (6).

PV Band is a set of edges between the fix-printability areas (FPAs) and non-printability areas (NPAs) under possible process conditions, representing the robustness of process manufacturing. As illustrated in Fig. 3, the formulation of the PV Band in Fig. 3(d) requires a series of Boolean operations to extract the edge placement through all possible printed images from Figs. 3(a) to 3(c), which are extremely cumbersome and difficult to calculate. The red boxes present extracted edges of the target contact pattern, and the gray areas are the printed patterns with the extracted pattern edges in blue. $S_{pv}$ in Eq. (3) is defined as $S_{pv} = (I_{1} \cup I_{2} \cup \dots \cup I_{N_{p} - 1} \cup I_{N_{p}}) \ (I_{1} \cap I_{2} \cap \dots \cap I_{N_{p} - 1} \cap I_{N_{p}}),$ (11)where $I_{1}, I_{2}, \dots, I_{N_{p} - 1}, I_{N_{p}}$ are printed images under $N_{p}$ process conditions, $\cup$ and $\cap$ are union and intersection operations, and the operation $\$ denotes the complement set of $FPA$ in $(1 - NPA)$ . Noting $FPA \subset I_{k}, k = 1, 2, \dots, N_{p}$ , $S_{pv} = (I_{1} \ FPA) \cup (I_{2} \ FPA) \cup \dots \cup (I_{N_{p}} \ FPA) .$ (12)

Figure 3.PV Band demonstration. (a)–(c) Printed images under different process conditions. (d) Computed PV Band. (e) PV Band of the printed images with $I_{02}$ in Fig. 4(c) illuminated by the annular source in Fig. 4(a).

Assuming the edge of the printed pattern is close enough to the desired printed pattern edge when $S_{epe}$ is incorporated in the cost function and replacing $FPA$ with the target pattern $I_{0}$ , $S_{pv}$ is reduced to the average of the summation of the $l_{2}$ norm of image differences to give $S_{pv} = \frac{1}{N_{p}} \sum_{k = 1}^{N_{p}} {‖ I_{k} - I_{0} ‖}^{2} = \frac{1}{N_{p}} \sum_{k = 1}^{N_{p}} S_{p e_{k}},$ (13)with $S_{{pe}_{k}}$ being the image difference under the $k t h$ process condition with $S_{pe}$ defined in Eq. (3). Figure 3(e) shows the PV Band calculated using $FPA = 0$ and $M = I_{0}$ . Therefore, the gradients of $S_{pv}$ with respect to $ω$ and $θ$ can be routinely calculated according to Eqs. (4) and (5) as $\frac{\partial S_{pv}}{\partial ϕ} = \frac{1}{N_{p}} \sum_{k = 1}^{N_{p}} \frac{\partial S_{p e_{k}}}{\partial ϕ} .$ (14)

Gradient-based searching such as steepest gradient descent (SGD) has been a preferred algorithm for the minimization of $S$ in Eq. (3). However, suffering from the sensitivity to step-size $η$ , SGD is often subject to running into unwanted local minimal with small $η$ and divergence if $η$ is too big. Moreover, the sparsity of $\frac{\partial S}{\partial ϕ}, ϕ = ω$ or $θ$ aggregates the dilemma of $η$ selection. Adam method combines the merits of AdaGrad and RMSPro methods, which works well with sparse gradients and naturally performs adaptive adjustments of $η$ . Therefore, in this Letter, AdaGrad and Adam methods are applied to updating the source and mask patterns $θ$ or $ω$ . In the Adam method, $ϕ = ω$ or $θ$ at time-step $t + 1$ is updated as $ϕ_{t + 1} = ϕ_{t} - η Δ ϕ_{t} = ϕ_{t} - η \cdot {\hat{m}}_{t} / (\sqrt{{\hat{v}}_{t}} + ϵ),$ (15)where $ϵ = 10^{- 8}$ is the smoothing term to avoid division by zero, and ${\hat{m}}_{t} = m_{t} / (1 - β_{1}^{t})$ and ${\hat{v}}_{t} = v_{t} / (1 - β_{2}^{t})$ are the bias-corrected moment estimate of first moment $m_{t} = β_{1} \cdot m_{t - 1} + (1 - β_{1}) \cdot g_{t}$ and second moment $v_{t} = v_{t - 1} \cdot β_{2} + (1 - β_{2}) \cdot g_{t}^{2}$ , respectively, with $g_{t} = \frac{\partial S}{\partial ϕ}, g_{t}^{2} = g_{t} \cdot g_{t}$ , and $β_{1}$ , $β_{2}$ being the decay rates.

Assuming after initial optimization (IO) of $ϕ$ which accumulates $m_{t}$ and $v_{t}$ , $ϕ$ reaches a local minimum point at $t = t_{1}$ , where SGD cannot break symmetry, with $g_{t_{1}} \approx 0$ and $m_{t}, v_{t} ≫ g_{t}$ , $Δ ϕ_{t}$ at $t = t_{2}$ can be calculated as $| Δ ϕ_{t} | = \frac{\prod_{t = t_{1}}^{t_{2}} | β_{1} / (1 - β_{1}^{t + 1}) | \cdot | m_{t_{2}} |}{\prod_{t = t_{1}}^{t_{2}} {| β_{2} / (1 - β_{2}^{t + 1}) |}^{1 / 2} \cdot {| v_{t_{2}} |}^{1 / 2}} = \prod_{t = t_{1}}^{t_{2}} | v_{t} | \cdot | Δ ϕ_{t_{2}} |,$ (16)in which $β_{1}^{t}$ , $β_{2}^{t}$ and $v_{t}$ are regarded as the attenuation factors of $m_{t_{2}}$ , $v_{t_{2}}$ . It is therefore concluded that after the IO procedure of accumulating $m_{t}$ and $v_{t}$ , the attenuation factors gradually decrease $m_{t}$ and $v_{t}$ small enough to be close to zero, namely as the first-phase optimization (FPO).

Subsequently, we investigate the absolute value of $Δ ϕ_{t}$ at the end of (FPO) $t = t_{2}$ as $| Δ ϕ_{t} | = \frac{| [β_{1} \cdot m_{t} + (1 - β_{1}) \cdot g_{t}] / (1 - β_{1}^{t}) |}{{[β_{2} \cdot v_{t} + (1 - β_{2}) \cdot g_{t}^{2}] / (1 - β_{2}^{t})}^{1 / 2} + ϵ} = \frac{| ρ m_{t} | \cdot | g_{t} |}{{| ρ v_{t} \cdot g_{t}^{2} |}^{1 / 2} + ϵ},$ (17)where ${ρ m}_{t} = (1 - β_{1}) / (1 - β_{1}^{t})$ and ${ρ v}_{t} = (1 - β_{2}) / (1 - β_{2}^{t})$ are amplification factors with respect to $g_{t}$ and $g_{t}^{2}$ . With $m_{0}$ , $v_{0}$ , and $g_{0}$ close to 0, $| Δ ϕ_{t} | \approx 0.5$ , taking the smoothing term $ϵ = 10^{- 8}$ into account: at $t = t_{2} + 1$ , if $g_{1}$ is close to 0, $m_{1} \approx 0$ and $v_{1} \approx 0$ , the iteration will act similarly to the iteration at $t = t_{2}$ and similarly for the following iterations until $g_{t}$ deviates significantly from zero. We name the above procedure the second-phase optimization (SPO), at the end of which $| Δ ϕ_{t} |$ is big enough to drive the updating of $ϕ$ out of the SPO entering IO to escape the local minimum point.

Numerical simulations are performed on a lithography imaging system with wavelength $λ = 193 nm$ , $NA = 1.35$ , spatial resolution $Δ x = Δ y = 4 nm / pixel$ , $a = 80$ , and $t_{r} = 0.25$ being the steepness and the threshold of the sigmoid function. The system is initially illuminated by an annular source with $σ_{in} = 0.6$ and $σ_{out} = 0.9$ in Fig. 4(a), with target patterns $I_{01}$ , $I_{02}$ in Figs. 4(b) and 4(c). The ranges of process conditions including dose, defocus, and linewidth tolerance are $\pm 2$ , $\pm 50 nm$ , and $\pm 10 %$ , respectively. $H_{p} (α_{s}, β_{s})$ is calculated according to the parameters of the wafer stack given in Table 1. The corresponding $I$ , EPE, and PV Band images when printing $I_{01}$ and $I_{02}$ on the wafer illuminated by $J_{0}$ are given in Figs. 5(a)–5(c) and Figs. 5(d)–5(f), respectively. Severe distortions are observed exhibiting $S_{pe} 4494$ and $5193, S_{epe} 1158$ and 1512 with respect to $I_{01}$ and $I_{02}$ , respectively. Violations of linewidth tolerance are also detected with $S_{pv} 2347$ and 3965 in Figs. 5(c) and 5(f), which has to be compensated for by radical computational techniques. When updating $ϕ = ω$ or $θ$ at time-step $t = t + 1$ using the SGD method with $ϕ_{t + 1} = ϕ_{t} - η_{s} \cdot g_{t},$ (18)where $g_{t} = \frac{\partial S}{\partial ϕ}$ , the step-size $η_{s}$ is set as 0.3, which is repeatedly tested for convergence, and when the proposed approach is applied, $η$ in Eq. (15) and decay rates $β_{1}$ , $β_{2}$ are suggested to be 0.1 and 0.9, 0.999.

Layer	Index	Thickness (nm)
Incident medium	(1.45, 0)
Top anti-reflection	(1.55, 0.0)	35
Photoresist	(1.8, 0.02)	100
Bottom anti-reflection	(1.72, 0.33)	87
Substrate	(0.833, 2.778)

Table 1. Wafer Stack Parameters

View all Tables

Figure 4.(a) Annular source $J_{0}$ with $σ_{i n} = 0.6$ and $σ_{o u t} = 0.9$ . (b), (c) The desired target patterns $I_{01}$ , $I_{02}$ .

Figure 5.Printed wafer images with (a) PE 4494 and (d) PE 5193, EPE images with (b) EPE 1158 and (e) EPE 1512, PV Band images with (c) PV Band 2347 and (f) PV Band 3965 with respect to target patterns $I_{01}$ and $I_{02}$ illuminated by the annular source in Fig. 4(a).

In Fig. 6 where the proposed method and the SGD method are applied to the simulation, the columns represent the optimized source pattern $\hat{J}$ , the optimized mask pattern $\hat{M}$ , the EPE images, and the PV Band images simulated with the optimized $\hat{M}$ illuminated by the optimized $\hat{J}$ . Two weight parameters, $γ_{1} = {0.6, 0.3, 0.1}$ and $γ_{2} = {0.6, 0.1, 0.3}$ , are used that emphasize EPE and PV Band minimization, respectively. Figures 6(a)–6(d) show the simulation results with $I_{01}$ as the target pattern, using the proposed algorithm and the SGD method weighted by $γ_{1}$ and $γ_{2}$ , respectively. The values of $S_{pe}$ , $S_{epe}$ , and $S_{pv}$ of the simulations in row $I_{01}$ of Fig. 5 and Figs. 6(a)–6(d) are recorded in Table 2. Significant improvements of PE, EPE, and PV Band are duly observed to reduce $S_{pe}$ from 4494, $S_{epe}$ from 1158, and $S_{pv}$ from 2347 in Fig. 5(a)–5(c) to $S_{pe} = 614, 540, 586, 490$ , $S_{epe} = 172, 175, 174, 143$ , and $S_{pv} = 2246, 1834, 2211, 1885,$ in Figs. 6(a)–6(d) with target pattern $I_{01}$ .

	Fig. 5	Fig. 6
	row I01	(a)	(b)	(c)	(d)
Spe	4494	614	540	586	490
Sepe	1158	172	175	174	143
Spv	2347	2246	1834	2211	1885

Table 2. Spe, Sepe, and Spv of the Simulations in Figs. 5 and 6

View all Tables

Figure 6.Simulation results with $I_{01}$ as the target pattern. Columns from left to right: the synthesized source pattern $\hat{J}$ , the synthesized mask pattern $\hat{M}$ , the EPE images, and the PV Band images illuminating $\hat{M}$ by $\hat{J}$ . Rows: proposed approach (a) with $γ_{1}$ and (b) with $γ_{2}$ , SGD (c) with $γ_{1}$ and (d) with $γ_{2}$ .

The initial mask $ω_{0}$ in the simulations in Fig. 6 is defined as an $N \times N$ matrix with each element equaling $π / 3$ , which proves feasible for both the proposed approach and the SGD method. However, the initialization value $ω_{0} = π / 3$ and step-size $η_{s} = 0.3$ are time-consumingly decided through many experiments, which greatly increase the workload of the simulations. Alternatively, with random initial masks $ω_{0}$ in Fig. 7, another set of simulations is performed in Fig. 8 with target pattern $I_{01}$ and weight parameter $γ_{2}$ to show the impact of initial masks on the optimization process. The columns present $\hat{J}$ , $\hat{M}$ , the EPE images, and the PV Band images simulated with $\hat{M}$ illuminated by $\hat{J}$ . Two random initial masks $ω_{1}$ and $ω_{2}$ in Figs. 7(c) and 7(d) are, respectively, applied to Figs. 8(a) and 8(b), using the proposed approach, Figs. 8(c) and 8(d) using the SGD method with weight $γ_{2}$ and target pattern $I_{02}$ . The values of $S_{pe}$ , $S_{epe}$ , and $S_{pv}$ of the simulations in row $I_{02}$ of Fig. 5 and Figs. 8(a)–8(d) are recorded in Table 3, where $n . a .$ stands for not available. It is observed that for initial random masks $ω_{1}$ and $ω_{2}$ , the proposed approach still reaches satisfactory local minimum, however, the SGD method starting with $ω_{1}$ and $ω_{2}$ finds it difficult to break symmetry to escape an unwanted local minimum resulting in poor OPC performance, showing great initial condition dependence of the SGD method.

	Fig. 5	Fig. 8		Fig. 5	Fig. 8
	row I01	(a)	(b)	row I02	(c)	(d)
Spe	4494	567	n.a.	5193	468	n.a.
Sepe	1158	178	n.a.	1512	96	n.a.
Spv	2347	1867	n.a.	3965	2472	n.a.

Table 3. Spe, Sepe, and Spv of the Simulations in Figs. 5 and 8

View all Tables

Figure 7.Randomly initialized masks within the range $(0, 1)$ ; (a) $M_{01}$ and (b) $M_{02}$ . (c) $ω_{1}$ and (d) $ω_{2}$ are the transformed parameters.

Figure 8.Simulation results with $I_{01}$ and $I_{02}$ as the target pattern and weight $γ_{2}$ . Rows: (a) and (c) proposed approach with $ω_{1}$ and $ω_{2}$ , (b) and (d) SGD with $ω_{1}$ and $ω_{2}$ as initial masks.

The convergence of $S$ and $S_{pe}$ in the simulations in Fig. 8 is drawn in Figs. 9(a) and 9(b). In Figs. 9(c) and 9(d), special inspections are taken to investigate the convergence of $S_{pe}$ when initial masks $M_{01}$ and $M_{02}$ in Figs. 7(a) and 7(b) are, respectively, applied to Figs. 8(a) and 8(b), Figs. 8(c) and 8(d) with the proposed approach and the SGD method. In Figs. 9(c) and 9(d), with the SGD method, a small $η_{s}$ renders very small values of $η_{s} \cdot g_{t}$ with random initial masks $ω_{1}$ and $ω_{2}$ and inhibits the update of $ϕ_{t}$ to break symmetry when the optimization of $S_{pe}$ hits the local minimum, presenting very poor convergence, while a bigger $η_{s}$ will lead to divergence in later iterations. On the contrary, the proposed algorithm uses bias-corrected first moment and second moment estimates ${\hat{m}}_{t}$ , ${\hat{v}}_{t}$ to constrain the gradients of the objective functions, and therefore, at a certain step when the updating process reaches a local minimum, IO accumulates the moments ${\hat{m}}_{t}$ , ${\hat{v}}_{t}$ and enters the FPO to attenuate ${\hat{m}}_{t}$ , ${\hat{v}}_{t}$ as small enough to be close to 0 to subsequently break symmetry by entering the SPO. Such supersedure of IO, FPO, and SPO in the updating of $ϕ$ can be observed in the Figs. 9(c) and 9(d), showing the ability of the proposed approach to escape unwanted local minima when random initial masks are applied. It should also be mentioned that the simulations in Fig. 8 present similar results for $S_{epe}$ and $S_{pv}$ with weight $γ_{1}$ , showing the generality of the proposed approach.

Figure 9.Convergence of (a) $S$ , (b) $S_{pe}$ of the simulations in Fig. 8, (c) $S_{pe}$ of the simulations in Figs. 8(a) and 8(b), and (d) $S_{pe}$ of the simulations in Figs. 8(c) and 8(d).