- Photonics Research
- Vol. 11, Issue 11, 1802 (2023)
Abstract
1. INTRODUCTION
Adaptive optics (AO), which can compensate wavefront distortions, has wide applications in astronomical observation [1], laser communication [2], and biological imaging [3]. For a typical AO system, a wavefront sensor (WFS) measures the wavefront distorted by atmospheric turbulence, and a controller converts the distorted wavefront into the voltage of a deformable mirror (DM) for correction. According to the phase conjugate principle, the subsequent wavefront is compensated by the DM immediately in a closed-loop correction [4]. However, the AO system always has a time delay between the measurement and compensation, due to the conversion and transmission of measured signals, as well as the response of the DM. Generally, the time lapse is , which is basically equivalent to two to three captured frames of the WFS [5]. Therefore, the wavefront correction mismatch will significantly increase correction errors, especially when the turbulence changes rapidly. Facing some ever-changing turbulence scenarios, such as fast-moving orbital objects and high-speed laser communication, it is more essential and critical to alleviate the time-delay problem for stable performance [6,7]. Hence, the predictive AO technique, which predicts future wavefronts from earlier ones, is proposed to cover the delayed time for synchronous correction.
Although the evolution of turbulence is random and unpredictable on a long-term scale, it is still continuous and predictable within a short time interval. Its statistical temporal characteristics appear to be particularly important in predictive AO. Since Jorgenson and Aitken demonstrated the predictability of atmospheric wavefront [8], many prediction methods have been proposed to improve the prediction accuracy. Current approaches can be categorized into two groups: linear and non-linear. Linear prediction (LP) algorithms, such as the linear minimum mean square error (LMMSE), were investigated first based on the statistical knowledge of atmosphere [9,10]. By establishing the linear equations between the previous and future wavefronts, the corresponding transfer matrix for distortion compensation can be calculated quickly [11]. However, the matrix is constant and hence it cannot be used to correct the dynamic atmospheric turbulence. Therefore, it needs recalculation when the atmospheric conditions change, resulting in its inapplicability for real applications. Another Fourier transform prediction in LP algorithms [12] is established based on Taylor’s frozen flow hypothesis. It needs to dynamically monitor the windspeed along the propagation path, increasing the complexity of the AO system. Overall, these LP algorithms are difficult to handle the non-linear relationships of turbulence, keeping them away from optimal performance.
In contrast, non-linear prediction algorithms, which are always machine-learning-based, extract the non-linear features to improve accuracy of prediction. They are mainly divided into two types: time-series prediction and spatiotemporal prediction. The former applies to multivariate data on time series, such as the wavefront slope [13], Zernike coefficients [14], and DM voltage [15]. Noteworthy, the multi-layer perceptron (MLP) has been explored for turbulence prediction since 1992 [8]. However, the MLP overly focused on the regression fitting and relatively ignored the continuity of temporal features. More recent long short-term memory (LSTM) can partially make up for this weakness [16]. Disappointingly, the lack of spatial constraints causes the distribution of prediction not to match the true wavefront. Comparatively, spatiotemporal prediction algorithms pay more attention to spatial features, and the conventional convolutions are useful to deal with the structured discrete signals. The well-known convolutional neural network (CNN) is involved in dealing with sequential wavefront phases for better corrections. Chen
Sign up for Photonics Research TOC. Get the latest issue of Photonics Research delivered right to you!Sign up now
To deal with the time-delay problem in AO, inspired by the widely used deep learning [21–23], we propose a highly robust spatiotemporal wavefront prediction algorithm with the mixed graph neural network (MGNN) in predictive AO. To better extract and utilize the spatiotemporal non-linear features of turbulence, the temporal feature catcher (TFC) and spatial feature analyzer (SPA) are both designed. In the feature extraction process, conventional convolution is no longer applicable to graph structure, so the graph convolution is introduced in the structure. Innovatively, we delve into the Zernike model and discover the consistency between the covariance matrix of coefficients and the graph structure. The covariance matrix is a statistical invariant in the real time-varying atmosphere and reveals the relationship between Zernike coefficients of real turbulence. Therefore, we incorporate the consistency to promote graph constraints of the covariance matrix and weight learning of the transformation matrix internally. With the optimization of graph structure, the MGNN can learn more spatial features than common networks from limited data and can generate output with high resemblance to real turbulence. The prediction accuracy of the MGNN and the robustness to varying atmospheric conditions, including the generalization from simulation to experiment, are discussed and verified. In experimental verification, the MGNN trained with simulated data can achieve an approximate effect of that trained with real turbulence. As a result, our proposed algorithm can achieve lower correction error and better robustness than conventional predictive AO algorithms, let alone conventional AO.
2. PRINCIPLE
A. Covariance Matrix of Zernike Polynomial Model
According to the Zernike polynomial, the wavefront phase measured by the WFS can be expressed as
Ignoring the scaling factor in Eq. (2), is a statistical invariant between these Zernike coefficients from turbulence. The elements of its off-diagonal are non-zero, and the complete representation is in Appendix A. The relevance of coefficients is non-negligible, but it does not meet our expectations because it makes the coefficients prediction more complex. Therefore, we introduce the Karhunen–Loève (K-L) functions [25], and the wavefront phase of one frame can be also expressed as
Specifically, the covariance matrix is a diagonal matrix due to the independence of elements and has a relationship with as .
B. Spatiotemporal Wavefront Prediction Model
In the predictive AO system, we can get a sequence of coefficient vectors over time easily by the WFS. These time-continuous vectors of historical wavefronts form the inputs of the prediction algorithm, and the prediction model can be expressed as
According to the prediction model, we establish the flowchart of spatiotemporal wavefront prediction in Fig. 1. In fact, we can only collect the continuous turbulence for a while and separate them into and based on different moments for optimization. These separated vectors are paired one by one and assembled to compose the training dataset in Fig. 1(a). For predictive AO, its prediction timespan must cover the time delay of measurement and correction, and the calculation delay of the algorithm. For an AO system with a frame rate of hundreds of hertz, the actual response time is often 2–3 times of the sampling period of a camera. In other words, its time delay is two to three frames, so we set in our algorithm for prediction and compensation. The number of previous frames is set as for balance, because more frames represent larger data volume and longer processing time. According to Eq. (3), the covariance matrix , which is highly similar to the adjacency matrix of a graph, represents the connections between coefficients. So, we can regard coefficients as nodes and describe their connections with edges in a graph. In Fig. 1(b), the graph embedding is performed from the covariance matrix , applying physical constraints to the predicted coefficients in the MGNN. The graph embedding process is expressed as
Figure 1.Flowchart of spatiotemporal wavefront prediction. (a) Training dataset configuration. (b) Graph embedding from the covariance matrix of coefficients. (c) In the testing process, Zernike coefficients of previous 10 frames are inputs, and the predicted coefficients are obtained by the trained MGNN.
C. Mixed Graph Neural Network
The architecture of the MGNN is presented in Fig. 2. Input tensors of size (, , , ) are successively processed by the start convolutional layer, TFC, SFA, and end convolutional layer in Fig. 2(a). The start and end convolutional layers consist of several conventional convolutional operations with kernel. , , , and are the batch size, channel, width, and height of the tensors (, , , and ). The width depends on the orders of polynomials to describe one-frame wavefront, and the height depends on the number of previous frames before. In Fig. 2(b), the TFC applies two branches of inception units to extract high-level temporal features. Inspired by GoogleNet [27], the inception unit combines four convolutional operations with different receptive fields for discovering different temporal patterns. One branch is followed by a “tanh” activation function, while the other takes “sigmoid” to control information flow like the gate function in the LSTM [18]. The inception unit is repeated five times in succession while Fig. 2(b) only presents one. These features fuse together after calculating Hadamard’s product. With different receptive fields, the TFC can extract the characteristics as a whole from the coefficient tensors of previous frames, and predict the future coefficients based on temporal information. However, the predicted features of the TFC may not be real or effective, due to the assumption of coefficient independence in the TFC. It is necessary to modulate the predicted features reasonably, so we propose the SFA in Fig. 2(c). The SFA also applies two branches of graph units to optimize spatial features. With the adjacent matrix in Eq. (9), one branch of the graph unit layer can be expressed as
Figure 2.Network architecture of the MGNN. (a) Overall framework. (b) Details of temporal feature catcher. (c) Details of spatial feature analyzer. (
Specifically, the architecture design of the MGNN is unique and innovative, leaving aside the details of data flow. From previous related works, a satisfactory prediction algorithm requires non-linear fitting ability and spatiotemporal feature extraction ability for accurate prediction, and sufficient parameter capacity and information modeling ability for robust generalization. However, existing algorithms, including state-of-the-art deep learning-based ones, focus more on accurate prediction with similar data, leading to poor generalization performance in various environments. Limited data for training cannot cover all variations of turbulence, so it is implausible to predict unfamiliar turbulence only with data-driven learning. Indeed, the prediction accuracy within limited data can be improved by special structures with enough training, but the generalization capability needs the introduction of physical rules for further enhancement, which was never considered before. Although lots of networks utilize spatiotemporal information in prediction, they essentially assume that all elements are independent, which is different from the distribution of real atmosphere. In the MGNN, Zernike polynomials are introduced to reduce data dimensions and export covariance matrix. The concept of graph is introduced due to its high consistency with the covariance matrix. The derived adjacency matrix can emphasize the relevant coefficients in iterative optimization, while others are not affected because this matrix is highly sparse. Further, the weight also plays an important role in transforming these coefficients, just like the transformation matrix in Eq. (6). They add physical constraints to spatial features, making output closer to the real turbulence statistically. For temporal features, one catcher has fewer parameters because of the dimension reduction of wavefront with the Zernike model. As a result, our proposed MGNN is applicable to the accurate prediction of various atmospheric environments, because it builds a general mapping relationship within limited data.
D. Data Preparation
Figure 3(a) shows the optical setup model of the AO system, and Fig. 3(b) represents differences between conventional and predictive AO systems in a closed-loop wavefront correction. In an AO system, the collimated laser goes through the atmospheric turbulence, so its wavefront is disturbed. The distorted wavefront is measured and demodulated by the WFS, and the corresponding control voltages of the DM, Zernike coefficients, and phase distribution of wavefront are obtained. The conventional AO directly sends the control voltages to the DM for correction with an inevitable time delay. The predictive AO utilizes these above measured signals for prediction with algorithms, and the predicted signals, representing the future wavefront, are sent to the DM for correction. The subsequent data acquisition and analysis both follow these two figures, including the simulation and experiment.
Figure 3.(a) Optical setup model of the AO system. TGP, turbulence generating pool; WFS, wavefront sensor; DM, deformable mirror. (b) Data flow of conventional AO (gray lines) and predictive AO (blue lines). (c) Simulated atmospheric turbulence measuring process.
To get the training dataset in Fig. 1(a), the object-oriented MATLAB adaptive optics model is developed to effectively simulate the AO system under different conditions [28]. In simulation, a simple wavefront measurement system in Fig. 3(c) is created, which consists of a point source at infinity, an atmospheric medium, and a WFS. The light from a point source at infinity goes through the atmospheric medium and is measured by a WFS. The WFS has a microlens array, a resolution of pixels, and a measurement frame rate of 600 Hz. Three large random phase screens implementing the von Kármán statistics are generated within the atmosphere medium. These phase screens are distributed at transmission altitudes of [0, 4, 10] km with Taylor’s hypothesis. The Fried parameter of the phase screens follows the normal distributions as (15, 2) cm, and the windspeed is for the training dataset. The wind directions are sampled randomly from . These phase screens shift randomly with the given windspeed. In each frame measurement, partial phase screens are integrated along the light propagation path and ultimately measured by the WFS. At last, we get the Zernike coefficients of 8000 frames wavefront for training. According to the prediction model, the first 10 frames are used as inputs, and the 13th frame is used as the label. The output of the MGNN is the Zernike coefficient of one-frame wavefront phase, so its orders can be adjusted according to actual needs. In the distortion of turbulence, low-order aberrations dominate and can be effectively represented by the Zernike model. Therefore, coefficients of 2–15 orders Zernike polynomial are used to represent the wavefront here for verification, and more orders are also feasible with similar performance. Coefficients of all test sets are normalized by the same rule as in the training set in advance and are restored after prediction.
To verify the prediction accuracy of the algorithms, we generate additional coefficients of 2000 frames under the same condition as test set 1. To study the robustness to varying atmospheric environments, we also generate another four test sets with 2000 frames. For example, the windspeed is changed as in test set 2, and the Fried parameter is changed as (5, 2) cm, (10, 2) cm, and (20, 2) cm, in test sets 3, 4, and 5, respectively.
Furthermore, the experimental data is also considered to verify the prediction performance of algorithms trained by simulation. The experimental optical setup is basically consistent with Fig. 3(a). A 1.5 m long turbulence generating pool (TGP) is heated at the bottom and cooled at the top and generates Rayleigh–Bénard convection randomly. The experimental turbulence is quite different from the simulated ones, which is based on Taylor’s hypothesis and von Kármán statistical model, so it can demonstrate the robustness of the algorithms effectively. The 532 nm laser is collimated, goes through the TGP, and is then captured by a WFS. The WFS has a microlens array, a resolution of pixels, and a measurement frame rate of 420 Hz. Finally, we get the coefficients of 800 frames wavefront with the Fried parameter and random windspeed as test set 6.
For comparison, we train the MGNN, LMMSE, and LSTM with the same train set, and test them using the above six test sets. All algorithms are implemented by PyTorch 1.7 and Python 3.7.1, and the computer has an Intel Core i7-10700K CPU, 32 GB of RAM, and an NVIDIA GeForce GTX 1080Ti GPU. The L1 loss function for the MGNN and LSTM, and L2 loss function for the LMMSE are used with the Adam optimizer, and their dropout rates are 0.2. For the MGNN, the learning rate is , and the epoch is set as 100. For the LMMSE and LSTM, the learning rate and epoch are and 200, respectively. The inputs of all algorithms are the Zernike coefficients of the previous 10 consecutive frames, and the output is the coefficients of the future third frame. The training of the MGNN takes for 100 epochs, and the forward reasoning in testing takes for 2000 frames. On average, the time consumption of one single frame is approximately 0.862 ms, which is less than the sampling time of one frame by the WFS. In contrast, the trained LMMSE and LSTM take 0.385 ms and 0.183 ms for one-frame prediction, respectively. The parameter quantity and floating-point operations per second (FLOPS) of the LMMSE, LSTM, and MGNN are 9153 and 3.08 kFLOPS, 36221 and 1.48 kFLOPS, 337825 and 38.5 MFLOPS, respectively. It is obvious that the MGNN is more complex and needs more computing resources, due to its repetitive TFC and SFA structures. The increase in parameter quantity also corresponds to the strong robustness of the MGNN, which is the key point we will focus on later.
3. ROBUSTNESS VERIFICATION AND DISCUSSION
After adequate optimization, our proposed MGNN was compared with the LMMSE and LSTM on the above six test sets. In addition, the conventional AO was also performed with time delay in wavefront correction for comparison. The prediction accuracy is quantified by the root mean square error (RMS) of wavefronts, which can be expressed as
A. Robustness to Turbulence under the Same Conditions
Figure 4.Comparison of one-frame prediction accuracy by three algorithms. (a), (b) Predicted Zernike coefficients and corresponding wavefronts of future third frame in test set 1 as two samples. MGNN, our proposed method; LSTM, one non-linear algorithm; LMMSE, one linear algorithm; AE, absolute error.
Figure 5.Comparison of overall prediction performance in different test sets. (a1) Histograms and normal curves of RMS values in test set 1.
B. Robustness to Turbulence with Changing Windspeed
In the natural atmosphere, the windspeed changes easily without a specific range, so we must consider the robustness of algorithms to the unknown turbulence with changing windspeed. Test set 2 is generated through three phase screens with a higher windspeed, which represents faster changes in turbulence. On the basis of calculation way above, their histograms and RMS curves of consecutive frames are shown in Figs. 5(b1) and 5(b2). Compared with Fig. 5(a1), the RMS values of conventional AO significantly increased, due to the rapidly changing turbulence. Within the time delay of three frames, the turbulence undergoes faster and greater changes, resulting in greater correction errors. In comparison, the predictive AO is effective in reducing the correction errors, and our MGNN is still the most stable one from Figs. 5(b1) and 5(b2). All prediction algorithms have similar mean values of RMS, which are higher than those in Fig. 5(a1). The changes of windspeed introduce faster and more complex changes in turbulence, resulting in a significant impact on prediction. The new inherent changing model does not match the temporal pattern obtained by training, making prediction more difficult for the algorithms. Overall, the effect of predictive AO is still held, and the stability advantage of the MGNN exists here.
C. Robustness to Turbulence with Changing Fried Parameters
The Fried parameter represents the coherence length of atmospheric turbulence, which is one of the key parameters to quantify the atmosphere. To verify the robustness to unknown turbulence with changing Fried parameters, we take the test sets 1, 3, 4, and 5 for comparison. Here the Fried parameter is generated by the random number of normal distributions with mean and standard deviation , while smaller represents stronger turbulence. Figure 6 shows the histograms and corresponding normal curves calculated from RMS values. It is obvious that the blue normal curves are leftmost and have the narrowest peak width in test sets 3, 4, and 5, which means the MGNN has the lowest prediction errors and the most stable performance. In contrast, the performance of the LSTM and LMMSE is unsatisfactory, because the RMS curves of them are further right and wider than that of conventional AO. As shown in Table 1, the MGNN has almost all the lowest values in mean and SD values from the RMS, compared with the other two universal algorithms. With the help of the MGNN, the mean and SD values of the RMS for conventional AO are reduced by 54.2% and 58.6% at most, respectively. The robustness of our MGNN is highlighted here thanks to the specially designed network, which is helpful to learn more general spatiotemporal patterns from limited simulation data.
Figure 6.Comparison of overall prediction performance using test sets with changing Fried parameters. Histograms and normal curves of RMS values are counted in test sets 3, 4, 1, 5, respectively. MGNN: blue; LSTM: red; LMMSE: orange; conventional AO: gray. Test sets 3, 4, and 5: Fried parameters change.
D. Robustness to Turbulence in Experiment
After verifying the robustness on simulated data, the MGNN is also expected to have stable performance on experimental data. Under the constriction of supervised learning, the hardware and time cost are required to obtain enough data from the experiment for deployment. In addition, the turbulence is uncontrollable in the experiment, resulting in the uncertain richness in the training set. Therefore, the capability of generalization from simulated data to experimental data is highly important for reducing costs and difficulty. In our experiment, the actual parameters are different from those in simulation, which is persuasive for the robustness of the MGNN. Figures 7(a1) and 7(a2) show the histograms, the corresponding normal curves, and the RMS curves of consecutive frames like above. Similarly, the performance of the LSTM and LMMSE is unacceptable, due to their negative optimization to the correction results of conventional AO. From Fig. 7(a2), when turbulence changes slowly and the correction error of conventional AO is relatively low, the prediction algorithm should maintain this state in predictive AO. While turbulence undergoes a sudden change, the prediction algorithm needs to reduce the correction error for the stable state. Obviously, our MGNN achieves this goal while the other two algorithms do not. In fact, the LMMSE and LSTM are also trained with the simulated data and have fewer parameter quantities than the MGNN. They recognize the turbulence changes in the training set but are unable to establish a general mode of turbulence evolution. Therefore, they still make predictions based on familiar but incorrect relationships, resulting in huge errors in the experimental data. As shown in Fig. 7(b), the Zernike coefficients and the absolute error of the MGNN are the best. From Table 1, the mean and SD values of the RMS in predictive AO with the MGNN are 75.8% and 58.3% of those in conventional AO for frozen turbulence. To further demonstrate its robustness, we collected 8000 frames as experimental data from the TGP, and retrained the proposed network MGNN-E. In Fig. 7(c1), the RMS curves of the MGNN and MGNN-E from the 300th to 600th frames in test set 6 are shown for comparison, and the complete statistical results are displayed in the box plots in Fig. 7(c2). From these two figures, the correction performance with the MGNN is similar to that with the MGNN-E. The MGNN establishes the internal laws of turbulence evolution from the simulated data, so it has sufficient robustness for the generalization from simulation to experiment.
Figure 7.Robustness to the experimental data. (a1) Histograms and normal curves of RMS values in test set 6. (a2) RMS curves of consecutive frames with or without prediction in (a1). MGNN: blue; LSTM: red; LMMSE: orange; conventional AO: gray. Black arrow: emphasis in comparison. (b) Predicted Zernike coefficients and corresponding wavefronts of future third frame in test set 6. AE, absolute error. (c1) RMS curves of consecutive frames with prediction by the MGNN and MGNN-E. MGNN-E (green): MGNN trained by the experimental data. (c2) Box plots of RMS in (c1).
Furthermore, we take the closed-loop wavefront correction numerically by the experimental data, as shown in Fig. 8. The correction process is divided into three stages: without AO, with conventional AO, and with predictive AO. The corresponding wavefronts measured by the WFS are shown in Fig. 8(a), and the RMS curves of consecutive frames for three stages are shown in Fig. 8(b). Intuitively, our MGNN still has good robustness to the experimental data, although it is optimized by simulated data and has never seen these real turbulence data before. Both simulated and experimental data follow the statistical invariant covariance matrix. Therefore, the physical constraints with the Zernike model enable the MGNN to reveal and establish the generic spatiotemporal evolution patterns of turbulence during optimization. To verify the effect of prediction on wavefront correction, we load the wavefront phases of three stages onto a spatial light modulator (SLM). The grid is imaged, and the focal spots are obtained with the phase modulation of the SLM, as shown in Fig. 8(c). To visualize the imaging changes, the phases are appropriately amplified in the same way. With predictive AO, the focal spot is the most concentrated and the grid is the clearest. They are similar to these of ideal cases without wavefront disturbance. Overall, the MGNN has good robustness to the experimental data. With the help of the MGNN, the wavefront correction error can be further reduced, which improves the light focusing and imaging effect.
Figure 8.Closed-loop correction wavefront error in experiment. (a) Three stages of the AO system and their corresponding corrected wavefronts. (b) RMS curves of three stages in closed-loop correction. Without AO: black line; conventional AO: gray line; predictive AO with MGNN: blue line. (c) Focal spot and grid imaging of three stages.
4. CONCLUSIONS
In this paper, we have demonstrated a mixed graph neural network for spatiotemporal wavefront prediction. The temporal feature catcher and spatial feature analyzer are both specially designed to deal with temporal and spatial features, respectively. The Zernike model plays an important role in reducing data dimensions and exporting covariance matrix. Our MGNN takes the invariant covariance matrix of Zernike coefficients as physical constraints and introduces the graph neural network for spatial feature analysis. Compared with the LMMSE and LSTM, our MGNN has the best stability and predictive correction in different test sets, including simulated turbulences under the same condition, with the changing windspeed, with the changing Fried parameters, and the real turbulence in experiment. It can perform well with stable and reasonable prediction in practical AO systems and achieve similar performance to that trained in real turbulence, even if it is only trained with simulated data. Our proposed MGNN innovatively introduces the statistical rules of physical turbulence into network design and exhibits astonishing generalization robustness. Its stable prediction ability is applicable to wavefront correction of AO in astronomical observation, laser communication, and microscopic imaging.
APPENDIX A
According to Noll’s representation, the covariance value between any two coefficients and in of one frame can be expressed as
According to Eq. (
Specifically, by ignoring the scaling factor , the covariance matrix is a constant matrix as follows:
References
[16] S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Comput., 9, 1735-1780(1997).
[18] X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, W. C. Woo. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS), 802-810(2015).
[26] Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, C. Zhang. Connecting the dots: multivariate time series forecasting with graph neural networks. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 753-763(2020).
[27] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, E. Dumitru, V. Vincent, A. Rabinovich. Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-9(2015).
Set citation alerts for the article
Please enter your email address