
- Chinese Optics Letters
- Vol. 23, Issue 4, 041101 (2025)
Abstract
Keywords
1. Introduction
It was reported that over 80 diagnostics had been installed at the 100-kJ-level ShenGuang III laser facility. Shock speeds and merger time can be measured using the VISAR diagnostic. This instrument measures the propagation speed of the leading shock for continuous time and shows the shock mergers as velocity jumps when subsequent shocks overtake the preceding one[1,2]. Traditional VISAR diagnostic systems lack the capacity to achieve both high temporal resolution and two-dimensional (2D) information acquisition. Yang et al.[3] developed the compressed ultrafast photography for a 2D velocity interferometer system for any reflector (CUP-VISAR)[4] to surmount this limitation by utilizing spatial domain coding, data compression, and image inversion techniques to capture high-speed transient phenomena. This advancement facilitates the calculation of the inertial confinement fusion (ICF) velocity field, prediction of compression dynamics, and practical experimentation. Figure 1 illustrates the configuration of the CUP-VISAR system. Initially, the probe light emitted by the probe light source passes through the beam splitter BS1 and lens L1, reaching the target surface for reflection. The probe light, carrying Doppler frequency shift information, retraces its original path and enters the interferometer via reflection from BS2, generating interference fringes. These fringes are then directed through the BS7 beam splicing mirror into the CUP system, where they are encoded by the digital micro-mirror device (DMD). Subsequently, the encoded fringes are captured and superimposed by the streak camera, which simultaneously encodes, compresses, and records the 2D spatial information of the shock wave. Observing the ICF 2D shock wave velocity field presents several challenges: (1) High temporal and spatial resolutions for transient processes increase aliasing in streak camera data, reducing the signal sampling rate. (2) Enhancing the signal-to-noise ratio (SNR) requires considerations of light source brightness, detector sensitivity, and optical system resolution, necessitating a larger coding aperture relative to detector pixels. Specifically, DMD projection employs a
Figure 1.Basic structure of the CUP-VISAR system for measuring shock wave velocity.
Creating a reliable linear observation matrix is essential for the CUP system to guarantee image inversion and efficient data compression. Yang et al.[5,6] proposed an optimization method using a genetic algorithm, capturing dynamic scene information through random coding and refining it to enhance image inversion quality. This method, however, incurs significant computational and temporal costs for large-scale data processing. Conversely, deep learning methods optimize coding matrices by learning intrinsic structures and features from extensive datasets. Iliadis et al.[7,8] proposed the “deep binary mask” architecture, where the encoder learns binary weights to create a coding matrix, enhancing the average peak signal-to-noise ratio (PSNR) by approximately 1 dB compared to the random mask. Marquez et al.[9] developed the deep high-dimensional adaptive network (D-HAN) method, generating a coding matrix for training data while reducing its 2D cross-correlation.
In this Letter, we deal with the obstacle of accurate feature extraction hindered by encoding aperture aliasing in the CUP-VISAR system when analyzing multiple sampled frames. A deep learning-based coding aperture design framework is introduced for a 2D shock wave velocity field diagnosis system. Employing a convolutional variational auto-encoder (CVAE) network, the framework captures velocity fringe features and integrates frequency domain features, thereby enhancing signal compression sampling efficiency, mitigating aliasing from a large coding aperture, and improving reconstruction accuracy. Various reconstruction methods were employed to assess the recovery effect. The significance of adjusting code transmittance to enhance reconstruction accuracy is highlighted, particularly in the presence of experimental noise.
Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you!Sign up now
2. Coding Aperture Design Based on the Convolutional Variation Auto-Encoder Network
The CVAE network maps fringe image inputs to a low-dimensional latent variable space and achieves feature fusion by mixing frequency domain features. Figure 2 illustrates the comprehensive framework of the model. Mean and variance are utilized to generate the binary mask of the desired dimension, which is then expanded to match the input dimension. During sampling, the coding matrix is multiplied by the original image data and overlaid to produce the sampled measurement, which serves as the input for the reconstruction method. To calculate the loss function, we need to find the average absolute error between the input data and reconstructed output, the binary cross-entropy loss between the input data and decoder output, and the Kullback–Leibler (KL) divergence loss between the prior distribution and the latent variable distribution. Model parameters are updated using backpropagation, resulting in the generation of a binary coding matrix that captures the properties of input data.
Figure 2.Framework for coding matrix design utilizing a CVAE network.
Figure 3.Network architecture diagram for encoding and decoding.
To ensure that the loss function
If
Equation (3) comprises two components of the loss function: The first component is the KL divergence between the latent variable distribution and the prior distribution, and the second component is the binary cross-entropy loss between the decoder’s output and the original image data.
By utilizing the reparameterization technique, the issue of non-differentiability in random sampling can be addressed. This involves transforming the sampling of
The encoder extracts the distribution features
The fusion features undergo a linear mapping resulting in the calculation of mean and variance:
By training the model, the distribution features of the original fringe data were obtained and determined by the mean and variance. Based on these distribution features, a basic binary mask was generated with dimensions matching the defined potential variable
Utilizing backpropagation to calculate the gradient and update the model parameters minimizes the loss function. Introducing the average absolute error between the reconstruction result and the original image data as a term in the loss function, Eq. (3) can be rewritten as
The training procedure is segmented into four steps:
- 1.Initialization of the optimizer and gradient zeroing.
- 2.Forward propagation, which involves encoding, feature fusion, and coding matrix generation using mean and variance.
- 3.Backpropagation, which is performed to calculate the gradient regarding layer activation. Binary cross-entropy loss and KL divergence were calculated, and the total loss was determined using the reconstruction results obtained from the coding matrix generated in Step 2.
- 4.A model parameter update, leveraging the optimizer to update parameters based on gradients.
The training algorithm comprises two steps: initially focusing on coding matrix generation, followed by reconstruction using this matrix. A subsequent training phase aims to enhance the reconstruction quality by adjusting the coding matrix.
3. Simulation and Experimental Findings
3.1. Simulation verification and analysis
A coding matrix with a coding aperture of
Figure 4.Mask. (a) Random mask; (b) 300 epoch feature-free fusion mask; (c) 300 epoch feature fusion mask.
Figure 5.Velocity fringe reconstruction of shock wave diagnosis by four algorithms with different masks. (a1) Random mask; (a2) measurement with the random mask; (a3)–(a6) 25th frame reconstruction of the random mask; (b1) feature-free fusion mask; (b2) measurement with the feature-free fusion mask; (a3)–(a6) 25th frame reconstruction of the feature-free fusion mask; (c1) feature fusion mask; (c2) measurement with feature fusion mask; (c3)–(c6) 25th frame reconstruction of the feature fusion mask.
The simulation of fringe image (25th frame) reconstruction utilized three distinct coding matrices and four different algorithms as shown in Fig. 5. From a visual perspective, the images reconstructed by the four algorithms using the random coding matrix and the feature-free fusion coding matrix still exhibit small-scale blurring at the edges. Although the ADMM-TV and TVAL3 algorithms produce smoother edges, they result in the loss of fringe image details. In comparison, when the coding matrix generated after feature fusion is used for reconstruction, the results of the four algorithms all enhance the detailed representation of the fringe contours, producing a more complete reconstruction of the fringe image than the previous two coding matrices. Subsequently, the performance of the reconstructed 25th frame is analyzed in detail. The PSNR values for the 25th frame fringes reconstructed by ADMM-TV, TVAL3, E3DTV, and ADMM-UNet algorithms with a random coding matrix are 17.48, 18.57, 24.35, and 24.59 dB, respectively. The structural similarity (SSIM) values are 62.85%, 70.73%, 78.42%, and 87.04%, respectively. The PSNR values for the 25th frame fringes reconstructed by ADMM-TV, TVAL3, E3DTV, and ADMM-UNet algorithms with a random coding matrix are 17.48, 18.57, 24.35, and 24.59 dB, respectively. The SSIM values are 62.85%, 70.73%, 78.42%, and 87.04%, respectively. For the generated feature-free fusion coding matrix restoration fringe (25th frame), the PSNR values increased by 0.35, 0.76, 1.48, and 1.43 dB compared to the random coding matrix with the algorithms mentioned above, while the SSIM values improved by 10.88%, 5.87%, 10.10%, and 2.08%. The PSNR values of 25th frame fringes recovered by the feature fusion coding matrix improved by 1.11, 1.99, 2.24, and 2.94 dB, with corresponding SSIM enhancements of 11.66%, 8.71%, 10.79%, and 2.81%. The coding matrix generated by our network has enhanced the reconstruction performance of the algorithms, surpassing that of the random coding matrix.
3.2. Experimental verification and analysis
Accounting for ambient noise, the dynamic response range constraint of the streak camera, and the presence of speckle noise in the constructed CUP-VISAR system, we completed a static verification experiment for the designed encoding mask. The fringe pattern was directly projected onto the corresponding mask in the optical system. These fringes were encoded by the coding plate, then passed through the 4f system, amplified by the lenses, and directed through mirrors to the streak camera. The streak camera recorded the compressed and encoded fringe data using a fully open slit (4 mm) without lens bias. Enhancing the transmittance of the coding matrix can improve the reception and transmission of previous weak light signals, augment the distinction between fringe data and background noise, and thereby enhance the SNR of the entire system. Excessive light transmittance may cause system saturation and diminish measurement accuracy. It is crucial to adjust light transmittance to maintain signal acquisition efficiency and overall system performance stability. Figure 6 outlines the specifications of the fabricated two group masks under the condition that the mask pixel lengths are 25.8 and 51.6 µm. The intensities of the light signal, which correspond to the light intensity passing through the fabricated coding matrices, were acquired and compared. The experimental results demonstrate that, under the two mask sizes, the average periodic voltage of the generated code increased by 12.56% and 49.03% compared to the random code, that is, the transmittance is increased by 12.56% and 49.03%.
Figure 6.Experimental mask plate specification drawing. The coding-aperture ratio of the mask is 7:3. (a) Feature fusion mask; (b) random mask.
The effect of the transmittance of the coding matrix on the quality of reconstruction is further verified through experiments. Various coding aperture ratios (A:B, where A indicates transmittance aperture numbers and B indicates opacity aperture numbers) were employed to generate the coding matrix. In the experiment, the diagnosis fringe pattern has a sample data dimension of
Figure 7.Velocity fringe reconstruction performance in various coding aperture ratios. (a1), (b1) PSNR performance comparison for different algorithms; (a2), (b2) SSIM performance comparison for different algorithms.
4. Conclusion
This letter presents a deep learning framework for the coding aperture design of the CUP-VISAR system in ICF research. Utilizing a CVAE network and frequency domain features, it generates a large aperture coding with fringe features, mitigating aliasing and improving detail extraction. Specifically, the ADMM-TV, TVAL3, E3DTV, and ADMM-UNet algorithms yield increased PSNR values of restored fringes (25th frame) by 1.11, 1.99, 2.24, and 2.94 dB compared to random coding matrices. Correspondingly, the SSIM values are augmented by 11.66%, 8.71%, 10.79%, and 2.81%. To account for noise in the experimental environments and the limited dynamic response range of the streak camera, a simulation-based 2D arbitrary reflector velocity interferometer imaging experiment is conducted. Results confirm that optimal average PSNR and SSIM peaks are attained when the coding aperture ratio is 7:3. Specifically, utilizing the ADMM-UNet algorithm with the generated coding matrix enhances the average PSNR peak by 2.56 dB and the average SSIM peak by 9.55% compared to regular 5:5 ratio matrix. In the comparative analysis, the average PSNR peak value of the coding matrix generated with this framework demonstrates a 1.65 dB increase compared to the random coding matrix. Similarly, the average SSIM peak value exhibits an 8.60% enhancement, while the transmission value in the large-size mask experiment rises by 49.03%. The proposed feature fusion and CVAE-based deep coding aperture design framework for CUP-VISAR real data shock wave diagnosis provides a theoretical foundation and practical utility for ICF research. In the ICF diagnostic process, the training set for the model should include as many features of experimental fringe data as possible. Prior to formal experiments, historical experimental datasets are input as extensively as possible. The purpose is to generate a more optimized encoding mask by learning the latent statistical distribution characteristics of real fringe data, thereby improving the modulation efficiency of the beam. This process ensures that the encoding matrix exhibits high adaptability and robustness in specific ICF experiments.
References
[9] M. Marquez, Y. Lai, X. Liu et al. Deep-learning supervised snapshot compressive imaging enabled by an end-to-end adaptive neural network. IEEE J. Sel. Top. Signal Process., 16, 688(2022).
[10] J. Ma, X. Y. Liu, Z. Shou et al. Deep tensor ADMM-Net for snapshot compressive imaging. IEEE/CVF International Conference on Computer Vision (ICCV)(2019).

Set citation alerts for the article
Please enter your email address