Concept and experimental demonstration of physics-guided end-to-end learning for optical communication systems

Qiarong Xiao; Chen Ding; Tengji Xu; Chester Shu; Chaoran Huang

doi:10.1364/PRJ.551798

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Abstract

Driven by advancements in artificial intelligence, end-to-end learning has become a key method for system optimization in various fields, including communications. However, applying learning algorithms such as backpropagation directly to communication systems is challenging due to their non-differentiable nature. Existing methods typically require developing a precise differentiable digital model of the physical system, which is computationally complex and can cause significant performance loss after deployment. In response, we propose a novel end-to-end learning framework called physics-guided learning. This approach performs the forward pass through the actual transmission channel while simplifying the channel model for the backward pass to a simple white-box model. Despite the simplicity, both experimental and simulation results show that our method significantly outperforms other learning approaches for digital pre-distortion applications in coherent optical fiber systems. It enhances training speed and accuracy, reducing the number of training iterations by more than 80%. It improves transmission quality and noise resilience and offers superior generalization to varying transmission link conditions such as link losses, modulation formats, and scenarios with different transmission distances and optical amplification. Furthermore, our new end-to-end learning framework shows promise for broader applications in optimizing future communication systems, paving the way for more flexible and intelligent network designs.

1. INTRODUCTION

End-to-end (E2E) learning is an emerging approach based on deep learning that offers new solutions to complex problems [1]. By using a single neural network (NN) to represent an entire target system, E2E learning bypasses the intermediate steps typically required in traditional methods, thereby simplifying the learning process. For example, in computer vision, E2E learning can simultaneously optimize both the encoder NN for input image compression and the decoder NN for output recovery, achieving better overall system performance compared to optimizing these components separately. E2E learning has become a widely adopted optimization strategy, with applications across various domains, including computer vision, natural language processing, autonomous driving, robot control, computational imaging, and optical computing [2 –7].

The rise of E2E learning has also impacted communication technologies since 2017 [8]. As communication systems grow more complex [9 –15], traditional block-wise optimization methods often fall short in ensuring optimal overall performance [16]. E2E learning has emerged as a promising solution. The process of signal generation and recovery in communication systems is analogous to image compression and reconstruction in computer vision, as both aim to recover input messages at the system output. This similarity makes it natural to apply the E2E learning concept to communication systems by treating the entire system—including the transmitter, receiver, and transmission channel—as an autoencoder, as illustrated in Fig. 1(a). In this framework, the transmitter (Tx) and receiver (Rx) are represented as an encoder and a decoder, respectively, through two separate NN blocks that are trained jointly to learn an intermediate representation robust to channel impairments.

Figure 1.(a) A communication system utilizing end-to-end (E2E) learning can be represented as an autoencoder, which consists of three main components: an encoder (transmitter), a decoder (receiver), and the actual transmission channel. The transmission channel can be various communication systems. (b) Conventional E2E learning method involves accurately modeling the transmission channel first, followed by performing the backpropagation (BP) algorithm in the digital domain. Channel modeling can be achieved using either physics-based approaches, known as “white-box” models, or pure data-driven methods, referred to as “black-box” models. (c) Proposed physics-guided learning: executing the forward pass of backpropagation on the actual transmission channel, while the backward pass estimates the gradient using a simplified white-box model.

In theory, E2E learning enables overall system optimization for any practical channel without the need for extensive analytic evaluations to the physical system, therefore allowing for pursuing the best end-to-end performance [16]. Despite these potential advantages, implementing E2E learning effectively in real-world communication systems remains a significant challenge [17,18]. The main obstacle is that, to use effective training algorithms such as backpropagation (BP), E2E learning requires the entire transmission system to be differentiable. However, the physical transmission channel is inherently non-differentiable, which complicates the direct application of E2E learning.

Sign up for Photonics Research TOC. Get the latest issue of Photonics Research delivered right to you！Sign up now

Most solutions to this challenge are to construct a differentiable digital model that approximates the actual physical channel, enabling the learning process to occur in the digital domain [16,17,19 –39], as illustrated in Fig. 1(b). Differentiable channel models are typically derived through two main approaches. The first approach relies on physical laws, such as using the split-step Fourier method to model nonlinear fiber channels [21,22]. The second approach utilizes data-driven algorithms, where NN models, including generative adversarial networks (GANs) [17,40] and other NN architectures [34,36], approximate the channel. While these methods facilitate E2E learning, they also impose significant demands for accurate modeling, leading to high training costs and complexity. For example, in order to obtain an accurate channel approximation across different channel conditions, data-driven approaches can be both data-intensive and time-consuming [17]. Even with these efforts, discrepancies between the digital model and the actual physical system remain inevitable, resulting in performance degradation in real-world deployments [41]. Additionally, model-free methods, such as reinforcement learning [41 –43] and cubature Kalman filters [44], come with their own challenges, including high training complexity and slow convergence [28]. Therefore, achieving efficient E2E learning with fast, low-cost training while maintaining system performance and signal quality remains a significant challenge.

To address these challenges, we propose a new framework called physics-guided learning, which incorporates the actual physical channel into the training process. By doing so, our method enables a simple and approximate white-box model to outperform a complicated digital model in both training and implementation performance, while also becoming more resilient to system noise. As shown in Fig. 1(c), in our approach, the physical channel is used during the forward pass to generate the output, while the white-box model is employed to compute gradients for backpropagation. Our method is analogous to hardware-in-the-loop optimization, a concept often used to validate simulations of complex systems [45,46]. Our method simultaneously enhances training speed, generalization ability, and signal quality.

We experimentally demonstrate the effectiveness of our approach in short-reach coherent optical fiber systems. The goal is to train an NN-based digital pre-distortion (DPD) module to mitigate impairments from both the transmitter and receiver. Verified through both experiments and simulations, our approach outperforms other mainstream methods during both the training and deployment stages [16 –18]. During training, our method improves efficiency by reducing the number of training iterations by more than 80%. In deployment, our approach demonstrates the strongest capability in addressing nonlinear distortion, shows the highest resilience to noise, and exhibits superior generalization to different link losses, modulation formats, and transmission scenarios. Furthermore, our method offers a brand-new framework for enabling accurate and efficient E2E learning in communication systems, which holds significant potential for broader applications, including long-haul optical systems and hybrid RF/optical wireless systems.

2. WORKING PRINCIPLE OF PHYSICS-GUIDED LEARNING

The principle and process of the proposed physics-guided learning are illustrated in Fig. 1(c). In this framework, the input message $s$ is encoded and transmitted through a transmission channel. The receiver, along with the subsequent digital signal processing (DSP), functions as the decoder, providing the output $\hat{s}$ . The overall goal of the communication system is to reconstruct the message $s$ at the output with minimal error. Therefore, the training objective is to minimize the error $s - \hat{s}$ by simultaneously optimizing the NNs at both the encoder and decoder using backpropagation.

In a standard training procedure using backpropagation, the process involves a forward pass, error calculation, backpropagation of errors through the NN to compute gradients, and, finally, updating the NN parameters. This process requires a precise mathematical model to represent the entire transmission system to compute gradients accurately. However, deriving such an accurate model is challenging and computationally intensive, especially when accounting for complex systems with noises.

To address this problem, our approach uses the actual physical transmission system for the forward pass and error measurement, while employing a differentiable proxy model for backpropagation. A key difference between our method and standard digital-domain learning is that our approach does not require the actual system and the proxy model to be identical. The tolerance for discrepancies is large—the training process decreases the error as long as the angle between the gradient obtained from the proxy model and the gradient of the actual system is less than 90° [47,48]. This allows the proxy model to be significantly simplified to a rough white box, reducing computational complexity with high system performance.

The specific training process is as follows. Before training, we first establish a simple, differentiable proxy model for the physical channel based on physical laws—this model is referred to as a “white box.” During the training process, the output $\hat{s}$ is measured from the actual physical system, and the error is calculated. The gradient of the loss function with respect to the parameters in the encoder and decoder is then derived using the white-box model through the chain rule. Finally, all parameters are updated according to the calculated gradient. This training loop is repeated until the objective function converges. (See Appendix A for the general formulation.)

This training flow benefits E2E learning in two significant ways. First, executing the forward pass through the actual physical channel ensures that the output incorporates all real information. Implicit impairments and features, such as noise and residual effects post-DSP, are automatically included through forward propagation. Non-differentiable operations in the actual system, such as quantization, which typically limits E2E learning [41], are also included. Consequently, our method enhances the deployment accuracy.

Second, grounding the process in real data reduces the need for a precise digital model for gradient estimation, thereby accelerating the training process. Our method can achieve necessary accuracy using a simple white-box model as the proxy channel. Compared to purely data-driven neural network models, known as “black boxes,” white-box models offer greater generalization ability and higher noise resilience.

The following section will demonstrate an example of applying our method for digital pre-distortion in short-reach coherent fiber systems, highlighting its significant advantages at both the training and deployment stages.

3. APPLICATION IN COHERENT OPTICAL SYSTEM

A. Digital Pre-Distortion For Short-Reach Transmission

Coherent optical systems have dominated long-haul transmission over the past decade [15]. Recently, coherent technologies have opened up new development opportunities for short-reach transmissions, especially for beyond Tb/s data-center interconnects and passive optical networks (PONs), to meet the growing demands on data traffic [49 –54]. At such high data rate, transmission performance becomes increasingly susceptible to imperfections in cost-effective transceiver components rather than impairments from fiber [55]. To tackle these issues, digital pre-distortion (DPD) is employed at the Tx side to pre-compensate for signal distortions caused by transceiver devices [25,42,55 –63]. Recent advances have demonstrated the use of NNs to develop DPD modules, with ongoing research focused on efficiently learning DPD parameters while maintaining low costs and high performance [42,55,59,60].

We present that our physics-guided learning approach can effectively optimize the parameters of a DPD module, yielding superior performance compared to mainstream training methods [16 –18]. The operation scheme of our method adapted for DPD training is illustrated in Fig. 2(a). During training, the encoder NN, represented by the green block, functions as the DPD module. The actual physical channel is used for the forward pass to generate the output $y$ of the entire system. The error $e$ is calculated as $s - y$ , where $s$ is the desired signal. This error is then backpropagated through a simplified white-box model to compute the gradient. In the short-reach coherent systems under study, the white-box model we use is simply a combination of mathematical models for RF drivers, an in-phase and quadrature (IQ) modulator, a low-pass filter, and a matched filter, while other device imperfections and DSP modules are ignored (see Fig. 3). Additionally, all physical parameters involved in the channel model are simply extracted from datasheets without additional experimental characterizations. Finally, the parameters of the DPD module are updated based on the estimated gradient. The training loop is repeated until the error converges and no longer decreases.

Figure 2.Schematic diagrams of methods adapted for DPD training. (a) Proposed physics-guided learning method. (b) Prior E2E learning methods. (b-i) Method 1: hybrid-domain learning with a data-driven model [18]. (b-ii) Method 2: digital-domain learning with a complicated physics-based model [16,19 –24" target="_self" style="display: inline;">–24]. (b-iii) Method 3: digital-domain learning with a data-driven model (implemented by alternating training) [17,33,34].

Figure 3.Experimental setup for our physics-guided learning method, configured in an either amplifier-less or 80-km transmission system. The physical transmission system and the Rx-DSP compose the actual channel. The simplified white-box digital channel employs a series of physical models that cover only several components. DPD, digital pre-distortion; AWG, arbitrary waveform generator; IQ-MOD, IQ modulator; VOA, variable optical attenuator; EDFA, erbium-doped fiber amplifier; SSMF, standard single-mode fiber; LO, local oscillator; OSC, oscilloscope; LPF, low-pass filter.

In our experiments, we use an adaptive finite impulse response (FIR) filter as a post-equalizer at the Rx, the coefficients of which are adaptive to the changes in the DPD. We optimize its coefficients jointly with DPD using gradient descent, analogous to training an NN. The function of the FIR is equivalent to a single-layer NN [22]. Similar single-layer NNs have been adopted in prior E2E learning frameworks [22,34,36,37]. While we do not use an NN-based post-equalizer, the method of training the FIR filter in our experiment is also applicable to training an NN-based post-equalizer. Although including an NN-based post-equalizer would further improve the performance (see Appendix C), we do not include it, as it significantly increases DSP complexity, which is not acceptable for the short-reach system we are targeting.

To demonstrate the effectiveness of our method, we compare it comprehensively with three prior E2E learning methods. The first method, illustrated in Fig. 2(b–i), also incorporates the actual physical system during training but relies on a digital NN obtained through a data-driven approach for the backward pass (referred to as Method 1) [18]. This method requires essential alternating training for two NNs, one for the channel model and the other for the DPD module. Since training is performed directly on the real system, updates to the DPD would significantly change the statistics of transmitted signals, which, in turn, cause notable changes in the actual channel response. Therefore, the NN for channel modeling needs to be retrained periodically to approach the true response. This will be demonstrated in Section 3.C.

The second and third methods, shown in Figs. 2(b-ii) and 2(b-iii), are standard model-based E2E learning approaches (referred to as Methods 2 and 3, respectively). Both approaches require a highly accurate digital model. In the method illustrated in Fig. 2(b-ii), the digital model is based on complicated digital representations of physical laws, such as the split-step Fourier method [21,22], to simulate the actual channel response. The third method, depicted in Fig. 2(b-iii), uses a data-driven approach to derive an NN model for the digital representation. There are two common strategies to implement Method 3: one follows alternating training similar to Method 1 [17,33,34], while the other fully pre-trains the channel model first and then fixes the pre-trained channel model when training transceiver modules [36 –39]. Alternating training allows the channel model to track the changes in system response during learning but needs frequent data acquisition from the real channel. In contrast, fully pre-training the channel model enables offline learning but demands a large, diverse dataset to ensure accurate modeling [35]. Therefore, both strategies require a considerable amount of data and time to train the channel model. In our demonstration, we use alternating training for Method 3 in order to draw a direct comparison with Method 1.

B. System Setup

Figure 3 shows the experimental setup for short-reach coherent fiber transmission and the DPD learning flow using our proposed method. To cover diverse short-reach scenarios [49], we examine two systems: an optical-amplifier-less system, accounting for data-center links, and an 80-km transmission system with optical amplification, suitable for access and metro links where amplification is acceptable. The DPD module is trained on the amplifier-less system but evaluated on both systems, showcasing its adaptability to longer transmission scenarios.

Initially, a sequence of transmitted symbols $s$ is up-sampled and pulse-shaped using a root-raised cosine filter. The shaped digital waveforms are then processed by the DPD module. These waveforms are resampled and fed into an arbitrary waveform generator (AWG), which converts them into electrical signals. The signals are subsequently amplified by RF drivers and converted to the optical domain using an IQ modulator. An external cavity laser provides the optical carrier for modulation. The optical signal operates at a symbol rate of 50 Gbaud. In the amplifier-less system, optical signals are sent directly to the coherent receiver and mixed with a local oscillator (LO). The resulting electrical waveforms are sampled by a real-time oscilloscope. The digitized signals undergo Rx-DSP, including resampling, frame synchronization, frequency recovery, matched filtering, equalization, and phase recovery. Finally, the sequence of recovered symbols $y$ is obtained. In the 80-km transmission system, the modulated optical signals are first boosted by an erbium-doped fiber amplifier (EDFA) and filtered by an optical filter to suppress out-of-band noise. After transmission through 80 km of standard single-mode fiber (SSMF), a second EDFA and optical filter are used to compensate for the fiber loss and further filter the signals. The received optical signals are processed similarly to the amplifier-less system but include additional chromatic dispersion (CD) compensation, as indicated by the dashed gray block in Fig. 3. Detailed device parameters are listed in Table 1.Table 1.

Parameters for Experimental Systems

Parameter	Value
AWG (DAC resolution, bandwidth)	8 bits, 45 GHz
RF driver (gain, $V_{sat}$ , bandwidth)	17 dB, 7.8 V, 40 GHz
IQ modulator ( $V_{π}$ , bandwidth)	3.5 V, 22 GHz
Laser/LO (linewidth)	$< 100 kHz$
Coherent receiver (bandwidth)	22 GHz
Oscilloscope (bandwidth)	59 GHz
Symbol rate	50 Gbaud

The amplifier-less system’s transmission performance is primarily impacted by nonlinearities, bandwidth limitations of components at both the transmitter and receiver, and system noises. These include additive noise from the receiver and phase noise caused by non-ideal laser sources. In the 80-km transmission system, more challenges are introduced, such as amplified spontaneous emission (ASE) noise from EDFAs and CD and potential fiber nonlinearities from the longer fiber link. This necessitates the use of a CD compensation block. The DPD module’s role is to address signal distortions mainly originating from transceiver devices. Nevertheless, our DPD can improve signal quality even in the presence of fiber-induced impairments under 80-km transmission conditions, as demonstrated in Section 3.E.3.

To train the DPD using our physics-guided learning method, we transmit signals in the actual amplifier-less channel to perform the forward pass, as indicated by the central blue area in Fig. 3. The backward pass, as a feedback link to the DPD, is designed to reflect only the major distortions from devices, thus greatly simplified compared to the actual channel. As shown in the white box of Fig. 3, the backward pass employs a series of physical models that represent a few key components, including RF drivers, the IQ modulator, the overall bandwidth limitation of the system represented as a low-pass filter (LPF), and a matched filter paired with the pulse shaping before DPD. These physical models are rough. For instance, the IQ modulator is modeled as ideal sinusoidal functions disregarding potential mismatches between the in-phase and quadrature arms. Moreover, all the required parameters are obtained from datasheets rather than measured from actual devices, and noises are excluded from the model. Despite this simplicity, we will demonstrate in the results sections that our learning method effectively guides the DPD to optimize towards the best performances. (See Appendix B for modeling methods of physical models.)

Note that, to explore the simplification limit of channel models, we also attempted to exclude all components and assume the backward pass to be an identity matrix. We observed that training could converge, albeit with poorer performance compared to the white-box model shown in Fig. 3 (resulting in a 34.6% increase in mean square error). This indicates that the angle between the identity matrix and the true gradient is large but still less than 90°. A similar example can be found in Ref. [55].

The three prior learning methods introduced in Section 3.A for comparison are performed on the amplifier-less system with the same DPD NN structure. Methods 1 and 3 use the identical NN structure for both channel modeling and DPD. The NN takes the architecture of feed-forward neural network with an input sliding window [42,60], involving 3108 learnable real-valued parameters. Method 2 uses a complicated physics-based model to match the amplifier-less experimental setup. The model includes basic models of the transmitter and receiver (identical to the white-box model used in our method), system noises (additive white noise and phase noise), and Rx-DSP. Its physical parameters are also extracted from datasheets. Additionally, hyperparameters such as initial learning rates are optimized for each method, which is detailed in Appendix D. (See Appendices A –D for NN structure, channel models, and training details.)

In the results sections, we will first analyze the training complexity of our method and compare it with existing approaches (see Section 3.C). We will then evaluate the performance of the DPD module under the critical factor peak-to-peak voltage (Vpp) at the AWG output that affects the transmission quality of the amplifier-less short-reach system (see Section 3.D). Vpp influences the output swing of the RF drivers and the operation of the IQ modulator. While a higher Vpp can boost optical signal power, it also leads to larger nonlinear distortions from the RF drivers and modulator. We will investigate how effectively our method identifies the optimal Vpp value compared to other methods. Finally, we investigate the generalization ability of our method by showing how the DPD module trained using our approach can adapt to different transmission conditions, including fiber link losses, modulation formats, and the 80-km transmission scenario with optical amplification (see Section 3.E).

C. Evaluation and Comparison of Training Complexity and Accuracy

Here, we demonstrate the advantages of our method in terms of training complexity and accuracy. The amplifier-less system is trained using 32-QAM signals with a 600 mV Vpp at the AWG and 5 dB optical link loss. Figure 4(a) compares the training loss of our method with that of Method 1 and Method 3, where both channel modeling and DPD training processes use mean square error (MSE) as the loss function. We visualize convergence speed as a function of training iterations because it reflects both required training time and training data. Fewer training iterations imply lower training data requirements. Method 2 is not included in this comparison, as it does not need system measurements but produces unacceptably low DPD performance during deployment in the real system. The performance of Method 2 will be discussed in the next section.

Figure 4.Training process comparisons between our method and prior Method 1 (hybrid-domain data-driven method) and Method 3 (digital-domain data-driven method, implemented by alternating training). (a) Training loss versus training iteration in experiments, under the conditions of 5 dB link loss and 600 mV Vpp. (b) Validation MSE versus training iteration in simulations. (b-ii) Zoom-in of (b-i) to compare the required iteration numbers for different methods when reaching the same MSE.

As shown in Fig. 4(a), our method achieves the fastest convergence speed by using a simplified white-box model during training. This contrasts with Methods 1 and 3, which adopt alternating training for two NNs. Despite its simplicity, our method can effectively guide the optimization process in the correct direction, as evidenced by the continuously decreasing training loss. In contrast, Methods 1 and 3 involve alternately training two NNs—one for the channel model and the other for the DPD module—during each round [as illustrated by the different colored regions in Figs. 4(a-ii) and 4(a-iii)]. A high channel modeling loss is observed at the start of the second round (marked by the black dashed circle) in Fig. 4(a-ii), which indicates that the actual channel response changes significantly after DPD training and retraining the channel model is essential. This alternating training process substantially increases both the training time and training data.

It is important to note that training loss alone does not rigorously reflect the true performance of the system, as some methods do not account for real-system effects, such as noise, during training. To accurately evaluate system performance evolution during training, a validation dataset must be tested on the system. The validation results, shown in Fig. 4(b), are periodically assessed by calculating the MSE between the input symbol sequence in the validation dataset and the output signal. As shown in Fig. 4(b-i), our method achieves the lowest MSE with the least training iteration. Remarkably, our method already converges when the two prior methods have just completed their first round of training. The MSE is reduced by 23.3% and 69.2% compared to Method 1 and Method 3, respectively. Notably, Method 3 not only exhibits the slowest convergence speed but also suffers from overfitting, as indicated by the rising MSE, marked by red dashed circles in Fig. 4(b-i). Method 1 avoids the overfitting issue by incorporating the transmission system into the training process. However, alternating training greatly increases the training time. Finally, as shown in Fig. 4(b-ii), our method achieves the same MSE with approximately 80 iterations, compared to around 440 and 720 iterations for Method 1 and Method 3, respectively. This corresponds to a reduction of more than 80% in iteration numbers and required data, as well as a decrease of more than 75% in the number of system measurements. Additionally, the training time is estimated to be reduced from about 20 min to just 3 min. The calculations of data amount, number of system measurements, and training time are detailed in Appendix D.

D. Performance in Nonlinearity Impairment Mitigation

After training, we deploy the DPD models trained by different methods to evaluate their performance in signal pre-equalization. In this section, we compare the methods based on their capability to mitigate nonlinear impairments. This ability is crucial as it allows for higher Vpp values to drive the IQ modulator to result in a higher signal-to-noise ratio (SNR) for the amplifier-less system. While a higher Vpp can increase the optical signal power, it also introduces more significant nonlinear distortions from the RF drivers and modulator. Thus, effective nonlinear impairment mitigation is essential for improving the system SNR. In this experiment, we fix the optical link loss at 5 dB and train separate DPD modules for each Vpp value using 32-QAM signals. After training, we test each DPD module at the specific Vpp value set for the training. The fiber link without DPD serves as the baseline for comparison. The system performance is analyzed by calculating the SNR and bit error rate (BER) based on recovered symbols.

The results are presented in Fig. 5. Our method demonstrates the highest tolerance to nonlinearity and achieves the best SNR with the time-efficient training procedure. As shown in Fig. 5(a), the DPD trained using our method (blue line) consistently delivers the highest SNR across all tested Vpp values. Compared to the baseline (yellow line), it provides an SNR gain of 0.88 dB and improves the optimal Vpp value from 500 to 600 mV, indicating its capability to mitigate severe nonlinear distortions and enhance launch powers. We also observe that the performance gap between prior methods and ours widens as the Vpp increases. This is because higher Vpp levels introduce more severe nonlinear distortions, and the DPD obtained using our method demonstrates a greater ability to compensate for these distortions. The SNR using Method 1 (orange line) is 0.33 dB lower than that of our method. The worse performance arises from inadequate training for channel modeling in Method 1, which introduces additional biases and increases the gap between the model and the actual system. Method 3 exhibits an even lower SNR, indicating that the NN used for channel modeling, which is fixed for digital-domain DPD training, is not accurate. Method 2 (purple line) shows the worst SNR performance, which is even lower than the baseline at some Vpp values. This suggests that its DPD is ineffective, revealing that completely detaching from the measurement of the actual system can result in large modeling errors and significant performance loss during real-system deployment. As a result, our method is the only one that achieves BER values below the 14.8% overhead (OH) forward error correction (FEC) threshold of 0.0125 [50], as shown in Fig. 5(b), outperforming all other methods under comparison.

Figure 5.Performance comparison in impairments mitigation. (a) Calculated SNR versus Vpp of DPDs trained through different methods. (b) Calculated BER versus Vpp, followed by (b-i) and (b-ii) showing the received constellations without and with DPD at their respective optimal Vpp values. (c) Comparison of transmitted signal spectra with and without DPD.

In addition, the DPD obtained through our method can compensate for the bandwidth limitations of the physical system, which always exhibits low-pass characteristics and hinders high-speed transmission. Our DPD counteracts this impairment by boosting the high-frequency components before transmission. As shown in Fig. 5(c), the signal spectrum with our DPD shows peaks at the edge frequencies, in contrast to the flat spectrum of original signals without DPD.

E. Generalization Capability

This section will examine the generalization capability of our method, specifically its ability to adapt the trained DPD to different link conditions that were not included in the training phase. The DPD is first trained in the amplifier-less system with 5 dB link loss and 600 mV Vpp and then tested under different link conditions. Specifically, the evaluations against link losses and modulation formats are conducted in the amplifier-less system, while the DPD’s performance in the 80-km transmission system is evaluated for varying launch powers.

1. Adaptability to Optical Link Losses

We first evaluate the performance of the trained DPD under varying link losses, where the signal experiences different SNRs after detection. Figure 6(a) shows the BER as a function of link losses. Across all evaluated link losses, our method (blue line) consistently achieves the lowest BER. When the link loss is small (less than 5 dB), the BERs of all methods stop decreasing, because residual distortions, such as nonlinearity, become the dominant limiting factors rather than noise. In this case, our method remains the only one that achieves BER values below the 14.8% OH FEC threshold. As the link loss increases, noise becomes the dominant factor. Under these conditions, our method continues to demonstrate lower BER values, while other methods show performance close to or even worse than the baseline, particularly at a 15 dB link loss. Our method achieves a 1.00 dB gain in power budget over the baseline for the 25% OH FEC threshold of 0.04 [50]. The power budget is calculated as the difference between the launched and received optical power, equivalent to the link loss.

Figure 6.(a) BER versus optical link loss for DPD modules from different E2E learning methods. All DPDs were trained at a fixed 5 dB link loss. (b) Noise resilience investigation in comparison with ILA. (b-i) Calculated BER versus link loss of DPDs trained at fixed 3 dB, 5 dB, and 8 dB link loss values, respectively. (b-ii) Zoom-in of (b-i).

Notably, the BER value of Method 3 (green line) increases dramatically as the link loss rises, performing worse than the baseline’s BER starting from a 9 dB link loss. This decline in performance, even worse than the baseline, can be attributed to training biases in the channel model due to a data-driven approach—since noise is not included in the DPD training process, the learned DPD suffers from overfitting and fails to adapt to higher noise levels, leading to degraded performance under higher loss conditions. Incorporating the physical channel in the training can reduce such errors, as demonstrated by the results of our method and Method 1. However, Method 1 still performs worse than ours because the training biases still occur during the NN training for channel modeling. (See Appendix A for the detailed analysis.)

The above results well illustrate the superior performance of our method over prior E2E learning methods. To thoroughly validate the adaptability and optimality of our method in DPD applications, we also conduct a comparison with one traditional DPD training method called the indirect learning approach (ILA). ILA is a practical DPD optimization method owing to its relatively low complexity [56,57]. Unlike E2E learning methods, it circumvents channel modeling by training the DPD module at the Rx side before deploying it to the Tx side. However, ILA may suffer from noise bias and not yield the optimal DPD module [58,59]. Here we investigate the noise resilience of training processes and demonstrate that our method can outperform ILA.

We compare the two methods by training DPD modules at three fixed link losses—3 dB, 5 dB, and 8 dB—and then evaluate system performance under varying link losses. The Vpp is set to 600 mV during both training and testing. The resulting BERs are shown in Fig. 6(b). Our method demonstrates strong resilience to noise variance—the three DPDs, despite being trained under different link losses, exhibit consistent BER performance. The relative differences in BER are less than 6.5%. In contrast, ILA is proved to be more sensitive to noise. The relative differences in BER can be as large as 18.8%. Specifically, the DPD trained at high loss (e.g., 8 dB) cannot adapt to low-loss link conditions. The BER performance using DPD trained with ILA worsens as the link loss increases during training. The most significant BER degradation, from 0.0133 to 0.0158, occurs at a 4 dB link loss, as shown in Fig. 6(b-ii). These results are comparable to those reported by other groups [55,59], suggesting that excessive noise due to high link losses hinders the DPD from effectively learning the inverse channel function. As a result, ILA cannot achieve a BER value lower than the 14.8% OH FEC threshold. When comparing DPD modules from our method and ILA, both trained at 8 dB link loss (dark blue and dark red lines), our method achieves a 28.5% reduction in BER at 4 dB link loss and provides a 0.56 dB gain in power budget over ILA for the 25% OH FEC threshold.

2. Adaptability to Modulation Formats

Next, we evaluate the generalization ability of our method, specifically its ability to apply the trained DPD to different modulation formats that were not included in the training stage. The DPD is initially trained with 32-QAM, and then we test its performance on fiber links using 16-QAM and 64-QAM without retraining the DPD. The results in Fig. 7 show the signals’ SNR at different Vpp levels. Compared to the baseline, our method achieves SNR gains of 0.79 dB and 0.98 dB for 16-QAM and 64-QAM, respectively, both at a Vpp of 600 mV—the same value as for 32-QAM. This indicates that the optimal Vpp obtained from the training with 32-QAM can still be applied to 16-QAM and 64-QAM, which demonstrates the generalization ability of our method. Notably, both our method and the baseline achieve higher SNR in 16-QAM than in 64-QAM. This is because the higher-order QAM is more susceptible to noise as well as channel distortions, leading to lower measured SNR values [64]. Consequently, DPD for nonlinear compensation is more effective for 64-QAM, resulting in a larger SNR gain over the baseline compared to 16-QAM. As a result, the SNR gap between the two modulation formats narrows after applying our DPD, with the reduction equal to the difference in SNR gains (0.19 dB). Similar results can be found in Ref. [64].

Figure 7.Applying the DPD module trained for 32-QAM to other formats without retraining. SNR versus Vpp for 16-QAM and 64-QAM. The insets (i)–(iv) show the received constellations at the optimal Vpp.

3. Adaptability to the 80-km Transmission Scenario

Finally, we evaluate the adaptability of our DPD trained in the amplifier-less system to the 80-km transmission system without retraining. Performance is tested under varying launch powers. Launch power is crucial for this longer transmission scenario, as higher power increases the optical signal-to-noise ratio (OSNR, the power ratio between optical signal and ASE noise) but also leads to larger fiber nonlinearity that distorts signals. We demonstrate that our DPD consistently improves the transmission performance across a wide range of launch powers.

Figure 8 shows the received signal’s SNR over different launch powers, under 32-QAM and 600 mV Vpp. We compare our method with three prior learning methods introduced in Section 3.A. DPD modules trained in the amplifier-less system using different methods are directly tested on the 80-km transmission system without retraining. The results demonstrate that our method outperforms all other methods in SNR crossing all launch power. At the optimal launch power of around 6 dBm, our approach not only attains the highest SNR value but also exhibits the largest SNR gain compared to other methods. These results indicate that our method can adapt to different transmission scenarios best without the need for retraining. It can maintain optimal communication performance even in the presence of ASE noise and fiber nonlinearity.

Figure 8.Performance comparison in the 80-km transmission system. DPD modules trained in the amplifier-less system by different learning methods are tested without retraining.

4. DISCUSSION AND CONCLUSION

In this paper, we introduce a novel end-to-end (E2E) learning framework called physics-guided learning. This approach offers two significant improvements. First, by executing the forward pass through the actual physical channel, our method ensures that the output includes all real information, including implicit impairments and features such as noise, residual effects post-DSP, and non-differentiable operations such as quantization. This integration enhances the overall accuracy and effectiveness of E2E learning. Second, grounding the process in real data reduces the need for a precise digital model for gradient estimation, simplifying the training process and offering greater generalization ability.

Our demonstrations for DPD highlight the advantages of our approach through a comprehensive comparison with existing methods. Our method achieves the fastest training speed by using a simplified white-box model, avoiding the need for alternating training of two complex NNs. This results in more than 80% fewer training iterations compared to previous data-driven methods. Additionally, our approach provides the highest SNR improvement of 0.88 dB for 32-QAM signals in an amplifier-less system. Moreover, our method exhibits enhanced robustness and strong generalization capabilities. It remains resilient to system noise, and the DPD module trained with 32-QAM can be effectively applied to other modulation formats without retraining, achieving SNR gains of 0.79 dB for 16-QAM and 0.98 dB for 64-QAM. When directly applied to an 80-km transmission system, our DPD also achieves impressive SNR gains over prior learning methods. Further improvements could introduce learnable physical parameters into channel models. In this way, channel models can be dynamically tuned to adapt to significant changes in actual systems.

For the practical implementation of our learning framework, training can be conducted on pilot symbol (bit) sequences shared by the Tx and Rx sides for error calculation. The main consideration is how to send gradient information back to the Tx side. Nevertheless, owing to the fast training speed of our method, exchanging gradient information is only required during the brief training period. Therefore, any reliable feedback link can be used without the demands of high data rate. In fact, similar feedback links have been employed in various systems, such as quantum key distribution [65], model-free E2E learning methods [41], and autonomous optical networks [66,67]. Thus, constructing a temporary feedback link is feasible.

From a broader perspective, our framework offers a novel approach for optimization in communication systems and other physical systems. For instance, practical E2E learning in communications often involves DSP modules that cannot be easily replaced by a single NN. These modules are typically non-differentiable. Our method facilitates the joint optimization of neural networks even in the presence of such non-differentiable operations. To extend our method to other communication systems, their channel models for the backward pass need to be designed accordingly.

In conclusion, we have proposed a general strategy for optimizing communication systems that is applicable to various systems. This approach paves the way for developing more flexible and intelligent optical networks and holds promise for future integrated sensing and communications, where designing and optimizing increasingly complex system configurations will be crucial.

APPENDIX A: FORMULATION OF PHYSICS-GUIDED E2E LEARNING

Here we present the general formulation of the E2E learning process and details of our physics-guided learning framework.

1.E2E Learning and the Non-Differentiable Actual Channel

f_{en}

Figure 9.(a) Schematic of E2E learning for a communication system. (b) Proposed physics-guided learning: gradient estimation with a physics-based (white-box) model. (c) Gradient estimation with a data-driven (black-box) model.

s

\frac{\partial L}{\partial θ_{de}}

\frac{\partial L}{\partial θ_{en}} = \frac{\partial x}{\partial θ_{en}} \frac{\partial L}{\partial x} = {[\frac{\partial f_{en}}{\partial θ_{en}} (s, θ_{en})]}^{T} \frac{\partial y}{\partial x} \frac{\partial L}{\partial y} .

\frac{\partial f_{en}}{\partial θ_{en}} (s, θ_{en})

n_{r}

Conventional methods strive to circumvent this problem by constructing a precise digital model to conduct learning in the digital domain. However, accurate modeling leads to high training costs and complexity, while being unable to bypass performance loss after deployment. In response, we propose our physics-guided learning framework.

2.Physics-Guided E2E Learning

{\hat{y}}_{phy}

\hat{C} (\cdot)

θ_{de} \to θ_{de} - η \frac{\partial L}{\partial θ_{de}},

3.Gradient Estimation with a Data-Driven Model

{\hat{y}}_{data}

\frac{\partial {\hat{y}}_{data}}{\partial x} = \frac{\partial {\hat{C}}_{data} (x)}{\partial x} + \frac{\partial {\hat{n}}_{data} (x)}{\partial x} .

Compared with that estimated by a physics-based model [Eq. (A8)], the gradient in Eq. (A14) incorporates an additional noise bias. Such a noise effect reflects only the distribution of its training data and would remain static once the channel model is fixed, perhaps enlarging its discrepancy between the true gradient given by Eq. (A6).

{\hat{y}}_{data}

Actually, noisy labels are reported to be more harmful than noisy inputs for NN training. Related training methods have been widely studied for years [68 - 70]. To obtain better training results, data-driven model-assisted learning methods should utilize complex training techniques and consume plenty of time.

APPENDIX B: SIMULATION SYSTEM AND MODELING METHODS

Details of the simulation setup for the amplifier-less system and key physical models are presented here.

The simulation setup is shown in Fig. 10 . The processing before the simulated transmission channel includes pulse shaping and DPD. The NN structure of the DPD module will be detailed in the next section. The transmission channel consists of three parts, namely the Tx impairments, optical channel, and Rx process. Specifically, we mainly focus on signal distortions caused by Tx impairments, including DAC quantization, nonlinearities in RF drivers and modulator, and the bandwidth limitation represented as a low-pass filter. Except for the DAC, their models are identical to those comprising the backward white box in experiments (shown in the white box of Fig. 3). Here the DAC is non-differentiable, since it is directly modeled as ideal 8-bit quantization with zero derivative. The optical channel is a noisy link, and the Rx process outputs the final recovered symbols.

Figure 10.Simulation setup of the amplifier-less coherent system. The inset shows the NN structure of the DPD module. Tx, transmitter; Rx, receiver.

The entire simulation system shown in Fig. 10, apart from the DAC, also serves as the physics-based channel model for prior Method 2. The major difference between prior Method 2 and our physics-guided learning method is that its learning objective is computed based on the model output instead of the real one, and the backpropagation is executed in the same digital model as the forward pass.

Table 2 lists the physical parameters of the simulation system, which are matched with those in experiments. Details of the main physical models are as follows. Table 2.

Parameters for the Simulation System

Parameter	Value
DAC resolution	8 bits
RF driver (gain, $V_{sat}$ )	17 dB, 7.8 V
IQ modulator $V_{π}$	3.5 V
Tx LPF bandwidth	22 GHz
Normalized reference power	0 dBm
AWGN power	−20 dBm
Laser linewidth	100 kHz
Rx LPF bandwidth	22 GHz
Sample per symbol	2 sps

1.RF Driver Model

V_{out} = \frac{V_{in} \cdot G}{\sqrt[4]{1 + {(\frac{V_{in} \cdot G}{V_{sat}})}^{4}}},

2.IQ Modulator Model

E (t) = E_{0} [\sin (\frac{π V_{I} (t)}{2 V_{π}}) + j \cdot \sin (\frac{π V_{Q} (t)}{2 V_{π}})],

In the experimental setup, the minimum bandwidth limitation of the transmitter is determined by the modulator. To represent the combined effect of temporal distortions and frequency response of components, a first-order Gaussian filter is applied as the low-pass filter of the transmitter. Therefore, the overall transfer function can be understood as a Wiener-Hammerstein (WH) structure [59]. Notably, the same low-pass filter is applied at the receiver.

3.Optical Amplifier-less Channel

x

APPENDIX C: LEARNING WITH AN NN-BASED POST-EQUALIZER

Here we demonstrate that our proposed approach can further improve the performance by including an extra NN-based post-equalizer. Specifically, an NN with the same structure as the DPD is introduced after phase recovery for nonlinear equalization and jointly learned with the Tx DPD module in simulations using our approach. As shown in Fig. 11, our method continues to reduce the MSE after adding the post-equalizer, and the performance is better than optimizing the DPD alone. This result shows the effectiveness and general applicability of our approach. However, we do not include this NN-based post-equalizer in our experiments to avoid increased DSP complexity.

Figure 11.Training process comparison between training DPD alone and joint learning with an NN-based post-equalizer.

APPENDIX D: DPD NEURAL NETWORK AND TRAINING DETAILS

2 m + 1

Previous DPD works used two real-valued NNs to process the real and imaginary parts of complex signals separately [42, 60]. Instead, we implement our DPD as a complex-valued NN through an online toolbox [73] and utilize complex tanh as the activation function for the input and hidden layers. The toolbox is supported by the machine learning library PyTorch, facilitating the use of autodiff functions for backpropagation.

Two prior E2E learning methods (Methods 1 and 3) use the identical NN structure for both channel modeling and DPD, according to Refs. [18, 34, 60]. This structure may not be optimal for the system under investigation. The optimization of NN hyperparameters should employ a systematic design strategy, requiring more time and tricks. This constitutes a part of their training complexity.

s

2^{14}

y

2^{14}

2^{16}

In experiments, we find that the selection of optimizer and initial learning rate significantly influences the training of data-driven channel models. When using an SGD optimizer, a large initial learning rate hinders convergence, while a small one tends to result in local minima with poor modeling performance. In contrast, Adam provides stable training results across a wide range of initial learning rates (0.001-0.01). Consequently, we adopt Adam for both channel modeling and DPD training. We choose the optimal initial learning rates for different training processes by standard grid search with a precision of 0.001 over the range of 0.001-0.01. This ensures the hyperparameters used in each method are optimal. The initial learning rates used for each method are listed in Table 3 . Table 3.

Initial Learning Rate

Method	Channel Model	DPD Module
Our method	N.A.	0.005
Method 1	0.003	0.004
Method 2	N.A.	0.002
Method 3	0.003	0.003

References

[1] V. Mnih, K. Kavukcuoglu, D. Silver. Human-level control through deep reinforcement learning. Nature, 518, 529-533(2015).

[2] I. Sutskever, O. Vinyals, Q. V. Le. Sequence to sequence learning with neural networks. arXiv(2014).

[3] N. Carion, F. Massa, G. Synnaeve. End-to-end object detection with transformers. European Conference on Computer Vision, 213-229(2020).

[4] M. Bojarski, D. Del Testa, D. Dworakowski. End to end learning for self-driving cars. arXiv(2016).

[5] S. Levine, C. Finn, T. Darrell. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res, 17, 1334-1373(2016).

[6] G. Barbastathis, A. Ozcan, G. Situ. On the use of deep learning for computational imaging. Optica, 6, 921-943(2019).

[7] X. Guo, T. D. Barrett, Z. M. Wang. Backpropagation through nonlinear units for the all-optical training of neural networks. Photonics Res., 9, B71-B80(2021).

[8] T. O’Shea, J. Hoydis. An introduction to deep learning for the physical layer. IEEE Trans. Cognit. Commun. Netw., 3, 563-575(2017).

[9] N. Chi, Y. Zhou, Y. Wei. Visible light communication in 6G: advances, challenges, and prospects. IEEE Veh. Technol. Mag., 15, 93-102(2020).

[10] M. Z. Chowdhury, M. K. Hasan, M. Shahjalal. Optical wireless hybrid networks: trends, opportunities, challenges, and research directions. IEEE Commun. Surv. Tuts., 22, 930-966(2020).

[11] W. Shi, Y. Tian, A. Gervais. Scaling capacity of fiber-optic transmission systems via silicon photonics. Nanophotonics, 9, 4629-4663(2020).

[12] B. J. Puttnam, G. Rademacher, R. S. Luís. Space-division multiplexing for optical fiber communications. Optica, 8, 1186-1203(2021).

[13] M. Srinivasan, J. Song, A. Grabowski. End-to-end learning for vcsel-based optical interconnects: state-of-the-art, challenges, and opportunities. J. Lightwave Technol., 41, 3261-3277(2023).

[14] Z. Li, Q. Xie, Y. Zhang. Four-wave mixing based spectral Talbot amplifier for programmable purification of optical frequency combs. APL Photonics, 9, 036101(2024).

[15] E. Agrell, M. Karlsson, F. Poletti. Roadmap on optical communications. J. Opt., 26, 093001(2024).

[16] B. Karanov, M. Chagnon, F. Thouin. End-to-end deep learning of optical fiber communications. J. Lightwave Technol., 36, 4843-4855(2018).

[17] B. Karanov, M. Chagnon, V. Aref. Concept and experimental demonstration of optical IM/DD end-to-end system optimization using a generative model. 2020 Optical Fiber Communications Conference and Exhibition (OFC), 1-3(2020).

[18] Z. Niu, H. Yang, H. Zhao. End-to-end deep learning for long-haul fiber transmission using differentiable surrogate channel. J. Lightwave Technol., 40, 2807-2822(2022).

[19] S. Li, C. Häger, N. Garcia. Achievable information rates for nonlinear fiber communication via end-to-end autoencoder learning. 2018 European Conference on Optical Communication (ECOC), 1-3(2018).

[20] T. Uhlemann, S. Cammerer, A. Span. Deep-learning autoencoder for coherent and nonlinear optical communication. Photonic Networks; 21th ITG-Symposium, 1-8(2020).

[21] S. Gaiarin, F. Da Ros, R. T. Jones. End-to-end optimization of coherent optical communications over the split-step fourier method guided by the nonlinear fourier transform theory. J. Lightwave Technol., 39, 418-428(2021).

[22] Z. Zhai, H. Jiang, M. Fu. An interpretable mapping from a communication system to a neural network for optimal transceiver-joint equalization. J. Lightwave Technol., 39, 5449-5458(2021).

[23] J. Song, C. Häger, J. Schröder. End-to-end autoencoder for superchannel transceivers with hardware impairment. 2021 Optical Fiber Communications Conference and Exhibition (OFC), 1-3(2021).

[24] Z. He, J. Song, C. Häger. Experimental demonstration of learned pulse shaping filter for superchannels. 2022 Optical Fiber Communications Conference and Exhibition (OFC), 1-3(2022).

[25] L. Minelli, F. Forghieri, A. Nespola. A multi-rate approach for nonlinear pre-distortion using end-to-end deep learning in IM-DD systems. J. Lightwave Technol., 41, 420-431(2023).

[26] H. Lee, S. H. Lee, T. Q. S. Quek. Deep learning framework for wireless systems: applications to optical wireless communications. IEEE Commun. Mag., 57, 35-41(2019).

[27] O. Jovanovic, M. P. Yankov, F. Da Ros. End-to-end learning of a constellation shape robust to channel condition uncertainties. J. Lightwave Technol., 40, 3316-3324(2022).

[28] A. Rode, B. Geiger, L. Schmalen. Geometric constellation shaping for phase-noise channels using a differentiable blind phase search. 2022 Optical Fiber Communications Conference and Exhibition (OFC), 1-3(2022).

[29] B. M. Oliveira, M. S. Neves, F. P. Guiomar. End-to-end deep learning of geometric shaping for unamplified coherent systems. Opt. Express, 30, 41459-41472(2022).

[30] M. Schaedler, S. Calabrò, F. Pittalà. Neural network assisted geometric shaping for 800 Gbit/s and 1 Tbit/s optical transmission. 2020 Optical Fiber Communications Conference and Exhibition (OFC), 1-3(2020).

[31] V. Aref, M. Chagnon. End-to-end learning of joint geometric and probabilistic constellation shaping. 2022 Optical Fiber Communications Conference and Exhibition (OFC), 1-3(2022).

[32] V. Neskorniuk, A. Carnio, V. Bajaj. End-to-end deep learning of long-haul coherent optical fiber communications via regular perturbation model. 2021 European Conference on Optical Communication (ECOC), 1-4(2021).

[33] H. Ye, L. Liang, G. Y. Li. Deep learning-based end-to-end wireless communication systems with conditional gans as unknown channels. IEEE Trans. Wirel. Commun., 19, 3133-3143(2020).

[34] Y. Xu, L. Huang, W. Jiang. End-to-end learning for 100G-PON based on noise adaptation network. J. Lightwave Technol., 42, 2328-2337(2024).

[35] Y. Xu, X. Guan, W. Jiang. Low-complexity end-to-end deep learning framework for 100G-PON. J. Opt. Commun. Netw., 16, 1093-1103(2024).

[36] J. Shi, W. Niu, Z. Li. Optimal adaptive waveform design utilizing an end-to-end learning-based pre-equalization neural network in an UVLC system. J. Lightwave Technol., 41, 1626-1636(2023).

[37] J. Shi, Z. Li, J. Jia. Waveform-to-waveform end-to-end learning framework in a seamless fiber-terahertz integrated communication system. J. Lightwave Technol., 41, 2381-2392(2023).

[38] S. Xing, Z. Li, C. Huang. End-to-end deep learning for a flexible coherent pon with user-specific constellation optimization. J. Opt. Commun. Netw., 16, 59-70(2023).

[39] A. Sun, Z. Li, J. Jia. End-to-end deep-learning-based photonic-assisted multi-user fiber-mmwave integrated communication system. J. Lightwave Technol., 42, 80-94(2023).

[40] H. Yang, Z. Niu, S. Xiao. Fast and accurate optical fiber channel modeling using generative adversarial network. J. Lightwave Technol., 39, 1322-1333(2021).

[41] F. A. Aoudia, J. Hoydis. Model-free training of end-to-end communication systems. IEEE J. Sel. Areas Commun., 37, 2503-2516(2019).

[42] J. Song, Z. He, C. Häger. Over-the-fiber digital predistortion using reinforcement learning. 2021 European Conference on Optical Communication (ECOC), 1-4(2021).

[43] J. Song, C. Häger, J. Schröder. Model-based end-to-end learning for WDM systems with transceiver hardware impairments. IEEE J. Sel. Top. Quantum Electron., 28, 7700114(2022).

[44] O. Jovanovic, M. P. Yankov, F. Da Ros. Gradient-free training of autoencoders for non-differentiable communication channels. J. Lightwave Technol., 39, 6381-6391(2021).

[45] D. Bullock, B. Johnson, R. B. Wells. Hardware-in-the-loop simulation. Transp. Res. Emerg. Technol., 12, 73-89(2004).

[46] Y. Peng, S. Choi, N. Padmanaban. Neural holography with camera-in-the-loop training. ACM Trans. Graph., 39, 185(2020).

[47] T. P. Lillicrap, D. Cownden, D. B. Tweed. Random synaptic feedback weights support error backpropagation for deep learning. Nat Commun, 7, 13276(2016).

[48] L. G. Wright, T. Onodera, M. M. Stein. Deep physical neural networks trained with backpropagation. Nature, 601, 549-555(2022).

[49] K. Zhong, X. Zhou, J. Huo. Digital signal processing for short-reach optical communications: a review of current technologies and future trends. J. Lightwave Technol., 36, 377-400(2018).

[50] F. Buchali, M. Chagnon, K. Schuh. Amplifier less 400 Gb/s coherent transmission at short reach. 2018 European Conference on Optical Communication (ECOC), 1-3(2018).

[51] G. RizzelliMartella, A. Nespola, S. Straullu. Scaling laws for unamplified coherent transmission in next-generation short-reach and access networks. J. Lightwave Technol., 39, 5805-5814(2021).

[52] D. Tauber, B. Smith, D. Lewis. Role of coherent systems in the next DCI generation. J. Lightwave Technol., 41, 1139-1151(2023).

[53] X. Zhou, R. Urata, H. Liu. Beyond 1 Tb/s intra-data center interconnect technology: IM-DD or coherent?. J. Lightwave Technol., 38, 475-484(2019).

[54] S. Bernal, M. Dumont, E. Berikaa. 12.1 terabit/second data center interconnects using O-band coherent transmission with QD-MLL frequency combs. Nat. Commun., 15, 7741(2024).

[55] H. Jiang, M. Fu, Y. Zhu. Digital pre-distortion using a Gauss-Newton-based direct learning architecture for coherent optical transmitters. Opt. Lett., 48, 1706-1709(2023).

[56] C. Eun, E. J. Powers. A new volterra predistorter based on the indirect learning architecture. IEEE Trans. Signal Process., 45, 223-227(1997).

[57] P. W. Berenguer, M. Nolle, L. Molle. Nonlinear digital pre-distortion of transmitter components. J. Lightwave Technol., 34, 1739-1745(2016).

[58] H. Paaso, A. Mammela. Comparison of direct learning and indirect learning predistortion architectures. IEEE International Symposium on Wireless Communication Systems, 309-313(2008).

[59] G. Paryanti, H. Faig, L. Rokach. A direct learning approach for neural network based pre-distortion for coherent nonlinear optical transmitter. J. Lightwave Technol., 38, 3883-3896(2020).

[60] V. Bajaj, F. Buchali, M. Chagnon. Deep neural network-based digital pre-distortion for high baudrate optical coherent transmission. J. Lightwave Technol., 40, 597-606(2022).

[61] T. Sasai, M. Nakamura, E. Yamazaki. Wiener-Hammerstein model and its learning for nonlinear digital pre-distortion of optical transmitters. Opt Express, 28, 30952-30963(2020).

[62] R. Emmerich, M. Sena, R. Elschner. Enabling S-C-I-band systems with standard C-band modulator and coherent receiver using coherent system identification and nonlinear predistortion. J. Lightwave Technol., 40, 1360-1368(2022).

[63] X. Lu, M. Zhao, L. Qiao. Non-linear compensation of multi-CAP VLC system employing pre-distortion base on clustering of machine learning. 2018 Optical Fiber Communications Conference and Exposition (OFC), 1-3(2018).

[64] R. Elschner, R. Emmerich, C. Schmidt-Langhorst. Improving achievable information rates of 64-GBd PDM-64QAM by nonlinear transmitter predistortion. Optical Fiber Communication Conference, M1C.2(2018).

[65] V. Scarani, H. Bechmann-Pasquinucci, N. J. Cerf. The security of practical quantum key distribution. Rev. Mod. Phys., 81, 1301-1350(2009).

[66] D. Rafique, L. Velasco. Machine learning for network automation: overview, architecture, and applications. J. Opt. Commun. Netw., 10, D126-D143(2018).

[67] X. Liu, Y. Zhang, Y. Chen. Digital twin modeling and controlling of optical power evolution enabling autonomous-driving optical networks: a Bayesian approach. Adv. Photonics, 6, 026006(2024).

[68] S. Reed, H. Lee, D. Anguelov. Training deep neural networks on noisy labels with bootstrapping. arXiv(2014).

[69] B. Han, Q. Yao, X. Yu. Co-teaching: robust training of deep neural networks with extremely noisy labels. arXiv(2018).

[70] H. Song, M. Kim, D. Park. Learning from noisy labels with deep neural networks: a survey. IEEE Trans. Neural Netw. Learn. Syst., 34, 8135-8153(2023).

[71] C. Rapp. Effects of HPA-nonlinearity on a 4-DPSK/OFDM-signal for a digital sound broadcasting signal. ESA Spec. Publ., 332, 179-184(1991).

[72] G. Li, P. Yu. Optical intensity modulators for digital and analog applications. J. Lightwave Technol., 21, 2010-2030(2003).

[73] M. W. Matthès, Y. Bromberg, J. de Rosny. Learning and avoiding disorder in multimode fibers. Phys. Rev. X, 11, 021060(2021).

微信扫一扫：分享

微信扫一扫：分享