Phase unwrapping is an indispensable step for many optical imaging and metrology techniques. The rapid development of deep learning has brought ideas to phase unwrapping. In the past four years, various phase dataset generation methods and deep-learning-involved spatial phase unwrapping methods have emerged quickly. However, these methods were proposed and analyzed individually, using different strategies, neural networks, and datasets, and applied to different scenarios. It is thus necessary to do a detailed comparison of these deep-learning-involved methods and the traditional methods in the same context. We first divide the phase dataset generation methods into random matrix enlargement, Gauss matrix superposition, and Zernike polynomials superposition, and then divide the deep-learning-involved phase unwrapping methods into deep-learning-performed regression, deep-learning-performed wrap count, and deep-learning-assisted denoising. For the phase dataset generation methods, the richness of the datasets and the generalization capabilities of the trained networks are compared in detail. In addition, the deep-learning-involved methods are analyzed and compared with the traditional methods in ideal, noisy, discontinuous, and aliasing cases. Finally, we give suggestions on the best methods for different situations and propose the potential development direction for the dataset generation method, neural network structure, generalization ability enhancement, and neural network training strategy for the deep-learning-involved spatial phase unwrapping methods.

- Advanced Photonics Nexus
- Vol. 1, Issue 1, 014001 (2022)
Abstract
Video Introduction to the Article
1 Introduction
Estimation of absolute (true) phase is an important but challenging problem in many imaging or measurement techniques, such as optical interferometry (OI),1 magnetic resonance imaging (MRI),2 fringe projection profilometry (FPP),3,4 and interferometric synthetic aperture radar (InSAR)5,6 (see Fig. 1).
The importance lies in that the estimated absolute phase of the aforementioned applications is directly proportional to the desired physical quantities such as the distribution of thickness or refractive index for OI, the distribution of magnetic susceptibility or the velocity of blood flow for MRI, the three-dimensional (3D) distribution of the object surface for FPP, and the surface height of the topography or ground deformation for InSAR.
The challenge lies in that the initial phase of the aforementioned applications is limited in the range of
Sign up for Advanced Photonics Nexus TOC. Get the latest issue of Advanced Photonics Nexus delivered right to you!Sign up now
Figure 1.Phase unwrapping in OI,
The wrapped phase
Spatial phase unwrapping, i.e., getting
Unfortunately, in practical applications, noise, discontinuity and aliasing violate the Itoh condition and fail the phase unwrapping. Thus, in the traditional spatial phase unwrapping methods, the path-following method determines better integration paths by branch cuts,8 quality maps,9 etc., to avoid the influence of these invalid pixels, and the optimization-based method obtains the absolute phase by minimizing the difference between absolute phase gradients and wrapped phase gradients, as follows:
In fact, the purpose of the traditional spatial phase unwrapping methods is to avoid the negative impact of invalid points as much as possible, and it is mainly suitable for nonserious noise and some discrete discontinuous or aliasing points. In some extreme cases such as the presence of severe noise or locally isolated discontinuous or aliasing regions, the traditional methods will become ineffective.
Jin et al.20 first proposed the use of deep learning to solve illposed inverse problems in imaging. This idea of using a deep neural network to learn the mapping relationship from input space to output space from paired datasets makes it possible to solve phase unwrapping in the aforementioned extreme cases. Thus, in the past 4 years, many deep-learning-involved methods have quickly emerged for spatial phase unwrapping that are still effective in the cases of noise, discontinuity, and aliasing due to not being constrained by the Itoh condition.21
These deep-learning-involved methods use different strategies to achieve phase unwrapping by supervised optimization of neural networks with specific datasets:
Figure 2.Datasets of the deep-learning-involved phase unwrapping methods, for (a) dRG, (b) dWC, and (c) dDN. “
In addition, the absolute phase dataset generation methods used in deep-learning-involved methods can be divided into the following four categories:
As shown in Fig. 3, the overall process of these deep-learning-involved phase unwrapping methods can be summarized as follows:
Figure 3.Overall process of deep-learning-involved phase unwrapping methods.
Since these methods are proposed individually using different phase unwrapping strategies and dataset generation methods and applied to different scenarios, a comprehensive cross-comparison and a comparison with traditional methods are lacking, which obscures the true potential of deep-learning-involved phase unwrapping and dataset generation methods. In the present review, Sec. 2 offers a summary and classification of deep-learning-involved phase unwrapping methods; in Sec. 3, the dataset generation methods are summarized and classified, their performance and characteristics are compared, and the rules and tips of dataset generation are given; in Sec. 4, using the same dataset, the performance of the deep-learning-involved and traditional phase unwrapping methods in ideal, noisy, discontinuous, aliasing, and mixed cases is compared and summarized; in Sec. 5, the advantages and limitations of the existing deep-learning-involved phase unwrapping methods are summarized, and a deep learning phase unwrapping idea with joint supervision of dataset and physical-model is further proposed and demonstrated.
To give interested readers a quick start, we present a step-by-step guide to applying deep learning to phase unwrapping in the Supplemental Material, with dRG and RME as an example.
2 Phase Unwrapping Methods with Deep Learning
2.1 Deep-Learning-Performed Regression (dRG) Method
Phase unwrapping can be treated as a regression problem in which a neural network directly learns the mapping relationship between the wrapped phase and the absolute phase.22
Figure 4.Illustration of the dRG method.
Dardikman et al.22,47 introduced the dRG method for simulated steep cells directly through a residual-block-based convolutional neural network (CNN), verified the dRG method with the congruence for real cells and compared the dRG method with the deep-learning-performed wrap count method.23 In 2019, we proposed a phase dataset generation method in which we tested the trained network on real samples, analyzed the network generalization ability by the middle layer visualization, and verified the superiority of the dRG method in noisy and aliasing cases by comparing it with the traditional methods.24,48 He et al.25 and Ryu et al.26 tested the phase unwrapping performance of the bidirectional recurrent neural network (RNN) and 3D-ResNet with MRI data. Dardikman-Yoffe et al.27 open-sourced their real sample dataset and verified that the congruence could improve the accuracy and robustness of the dRG method in the case of a small number of wrap counts. Qin et al.28 used a larger-capacity Res-UNet to obtain higher accuracy and proposed two evaluation indices. Perera and De Silva29 and Park et al.30 tested the phase unwrapping performance of the long short-term memory (LSTM) network and generative adversarial network (GAN). Zhou et al.31 improved the robustness and efficiency of deep learning phase unwrapping by adding preprocessing and postprocessing. Xu et al.32 improved the accuracy and robustness of phase unwrapping in an end-to-end case by using a composite loss function and adding more skip connections to Res-UNet. Zhou et al.33 used the GAN in InSAR phase unwrapping.
2.2 Deep-Learning-Performed Wrap Count (dWC) Method
According to Eq. (1), if the wrap counts
Figure 5.Illustration of the dWC method.
This idea was introduced by Liang et al.34 and Spoorthi et al.35 Spoorthi et al.35 proposed a phase dataset generation method and used the generated dataset to train a SegNet to predict the wrap count, which was postprocessed by clustering-based smoothness to alleviate the classification imbalance. Zhang et al.36 performed phase unwrapping by three networks sequentially for wrapped phase denoising, wrap count predicting, and postprocessing, respectively. Zhang et al.37,49 verified the performance of the network DeepLab-V3+ in the dWC method and proposed using refinement for postprocessing. Wu et al.38,50 enhanced the simulated phase dataset by adding the noise from real data and used the multiscale context and the full resolution residual block (FRRes-UNet) to further optimize the UNet in the Doppler optical coherence tomography. Also, Spoorthi et al. improved the prediction accuracy of the wrap count of the method in Ref. 35 by introducing the priori-knowledge of absolute phase values and gradients into the loss function.39 Zhao et al.40 used a residual autoencoder network (RAEN) to predict the wrap counts, used an image-analysis-based postprocessed method to alleviate the classification imbalance, and adopted iterative-closest-point phase data stitching method to realize dynamic resolution. Zhu et al.41 used the dWC method with postprocessing to do phase unwrapping in inertial confinement fusion target interferometric measurement. Vengala et al.42,43,51 used the Y-Net to reconstruct the wrap count and pure wrapped phase at the same time. Zhang and Li44 added atrous spatial pyramid pooling, positional self-attention, and edge-enhanced block to the neural network to get a higher accuracy and stronger robustness.
2.3 Deep-Learning-Assisted Denoising (dDN) Method
As mentioned in Sec. 1, in the traditional methods, we can either use complicated algorithms to directly unwrap or circumvent the invalid pixels or filter the invalid pixels before using the simplest algorithm to do phase unwrapping.7,13,14 For the latter approach, the neural network can also be used for the invalid pixels filtering.45
Yan et al.45 proposed using a neural network to denoise the real and imaginary parts of CAF for FPP. Then they used the line-scanning method to do phase unwrapping on the noise-filtered wrapped phase. In their work, denoising was performed by extracting the noise components from the noisy real and imaginary parts. After verification, we found that using neural networks to directly extract the pure real and imaginary components had the same performance. Thus, here, we adopt the direct way as shown in Fig. 6. In addition to this way of denoising directly using the neural network, the input of the neural network (the noisy real and imaginary parts) can also be semantically segmented to distinguish the sample area and the background area. Then the semantic segmentation results can be used as prior knowledge to make the neural network focus on the sample area.
Figure 6.Illustration of the dDN method.
2.4 Summary of the Deep-Learning-Involved Methods
For clarity, we summarize all the deep-learning-involved phase unwrapping methods mentioned above in Table 1, where “RME,” “GFS,” “ZPS,” and “RDR” indicate the dataset generation methods, which will be introduced in Sec. 3. In this table, except for the phase unwrapping methods introduced above (column 1), we also include the network structures (column 5), the training datasets (column 6), and the loss functions (column 7), which directly affect the final effect of phase unwrapping.
Method | Date | Author | Ref. | Network | Dataset | Loss function |
dRG | 2018 | Dardikman and Shaked | — | — | — | |
Dardikman et al. | ResNet | RDR | MSE | |||
2019 | Wang et al. | Res-UNet | RME | MSE | ||
He et al. | 3D-ResNet | — | — | |||
Ryu et al. | RNN | — | Total variation + error variation | |||
2020 | Dardikman-Yoffe et al. | Res-UNet | RDR | MSE | ||
Qin et al. | Res-UNet | RME | MAE | |||
2021 | Perera and De Silva | LSTM | GFS | Total variation + error variation | ||
Park et al. | GAN | RDR | MAE + adversarial loss | |||
Zhou et al. | UNet | RDR | MAE + residues | |||
2022 | Xu et al. | MNet | RME | MAE and MS-SSIM | ||
Zhou et al. | GAN | RDR | MAE + adversarial loss | |||
dWC | 2018 | Liang et al. | — | — | — | |
Spoorthi et al. | SegNet | GFS | CE | |||
2019 | Zhang et al. | UNet | ZPS | CE | ||
Zhang et al. | DeepLab-V3+ | ZPS | CE | |||
2020 | Wu et al. | FRRes-UNet | GFS | CE | ||
Spoorthi et al. | Dense-UNet | GFS | MAE + residues + CE | |||
Zhao et al. | RAENet | ZPS | CE | |||
2021 | Zhu et al. | DeepLab-V3+ | ZPS | CE | ||
2022 | Vengala et al. | TriNet | GSF | MSE + CE | ||
Zhang and Li | EESANet | GSF | Weighted CE | |||
dDN | 2020 | Yan et al. | ResNet | ZPS | MSE |
Table 1. Summary of deep-learning-involved phase unwrapping methods. “—” indicates “not available.”
The network structure is not within the scope of the comparison of this paper, so we choose Res-UNet as a unified network structure that has been widely used in different optical applications with excellent performance.51
It is well-known that training datasets will affect network performance. Simulation is another convenient and important option for dataset generation, in addition to using real samples to obtain absolute phases. Thus, we will evaluate the effectiveness of the dataset generation methods as a preparation work in Sec. 3, and then compare different phase-unwrapping methods in different cases such as noise, discontinuity, and aliasing in Sec. 4.
In all these comparisons, the following three indices are used for accuracy estimation:
2.5 Implementation of the Deep-Learning-Involved Methods
For dRG and dWC, the Res-UNet is selected from Ref. 72 (see Sec. S1 in the Supplemental Material), in which the inception module is introduced into the residual block.24,47,51,78 For dDN, ResNet is obtained by removing the max-pooling and transposed-convolution layers from Res-UNet and changing the channel number of all middle layers to 64. The dWC method is regarded as a classification/segmentation problem of eight categories, including the wrap counts from 0 to 7.
We use the adaptive moment estimation based optimization to train all the networks. The batch size is 16 and the learning rate is 0.01 (85% drop per epoch if the learning rate is
All the networks are implemented by Pytorch 1.0 based on Python 3.6.1, which is performed on a PC with Core i7-8700K CPU, 16 GB of RAM, and NVIDIA GeForce GTX 1080Ti GPU.
3 Phase Dataset Generation Methods
Due to the fundamental importance of datasets in deep learning, we introduce the absolute phase generation methods (Sec. 3.1) to prepare the datasets for deep-learning-involved phase unwrapping (Sec. 3.2) and evaluate their richness (Sec. 3.3).
3.1 Absolute Phase Generation Methods
We review three absolute phase generation methods, all of which undergo two steps: generate a random phase and then linearly scale it into a range of
3.1.1 Random matrix enlargement (RME)
The RME method first generates small square matrices with different sizes (randomly set from
Figure 7.An example of the RME method.
3.1.2 Gaussian functions superposition (GFS)
The GFS method generates a random number of Gaussian functions (from 1 to 20) with different means (randomly set from 2 to 127 for
Figure 8.An example of the GFS method.
3.1.3 Zernike polynomials superposition (ZPS)
The ZPS method generates a random number of matrices by the Zernike polynomials with different coefficients (randomly set from 0 to 1) of the first 30 orders (the first coefficient is set to zero), and then superposes these matrices by addition or subtraction.36,37,40,41,45 The ZPS method is similar to the GFS method except that the Gaussian functions are replaced by Zernike polynomials in which the continuity is guaranteed by Zernike polynomials, while the random strengths of these polynomials introduce the randomness.
3.1.4 Real data reprocessing (RDR)
The RDR method obtains the absolute phases of real samples and then reprocesses them. The typical ways include: (i) using successful unwrapped results from a traditional phase unwrapping method23,27,30,31,33; (ii) using optical methods offering absolute phase, such as dual-wavelength (or even multiwavelength) digital holography method79; (iii) using phase-unwrapping-free methods such as the transport of intensity equation (TIE) method.80
3.2 Dataset Generation
Now that the absolute phase
Accordingly, we generate the respective datasets denoted as D_RME, D_GFS, and D_ZPS for three different absolute phase generation methods. The simulated datasets in this section do not contain any invalid pixels. As for the real dataset (the fourth dataset, D_RDR), after being obtained by the digital holography and the least squares (LS) phase unwrapping method,11 the absolute phases of real samples are shifted so that the minimum value is equal to zero as the network GT. The wrapped phases are then calculated by Eqs. (7)–(9) as the network input. The real samples contain candle flames, pits of different arrangements, grooves of different shapes, and tables of different shapes. The sizes of the datasets are as follows: for D_RME, D_GFS, and D_ZPS, each contains 20,000 pairs for training and 2000 pairs for testing; D_RDR contains 421 pairs for testing. Given that a higher
Datasets | Size | Proportion of | Proportion of | Proportion of |
Training part of D_RME | 20,000 | 50% | 20% | 30% |
Testing part of D_RME | 2000 | 2/3 | 1/6 | 1/6 |
Training part of D_GSF | 20,000 | 50% | 20% | 30% |
Testing part of D_GSF | 2000 | 2/3 | 1/6 | 1/6 |
Training part of D_ZPS | 20,000 | 50% | 20% | 30% |
Testing part of D_ZPS | 2,000 | 2/3 | 1/6 | 1/6 |
D_RDR for testing | 421 | — | — | — |
Table 2. Summary of datasets. “—” indicates “not available.”
3.3 Dataset Selection
3.3.1 Initial richness indication of different datasets
Shannon entropy,81,82 as a measurement of the uncertainty of a random variable, can quantitatively characterize the amount of information contained in a dataset, which affects the generalization ability of the trained neural network. We compute the Shannon entropy of the absolute phases from the D_RME, D_GFS, and D_ZPS, as shown in Fig. 9. The means of the entropy from high to low are from RME (mean 4.41), GFS (mean 4.24), and ZPS (mean 4.03). This result gives an initial indication of the better information richness in D_RME and D_GFS.
Figure 9.Entropy histogram of absolute phases from the D_RME, D_GFS, and D_ZPS.
3.3.2 Statistical gradient distribution of different datasets
The gradient (difference) distribution of absolute phases can also reflect the richness of the dataset. Thus, we first calculate the sum of the absolute values of the horizontal difference and vertical difference for each absolute phase in the dataset, and then obtain the statistical average gradient distribution (SAGD) of the dataset. Figure 10 shows the SAGD maps of different datasets, from which we can see that the SAGD values of the four corners of D_RME and D_GSF are low as shown by the red arrow, D_ZPS only has high SAGD value in the four corners as shown in the red circle, and the SAGD value of D_RDR is higher in the lower middle region as shown in the red circle.
Figure 10.SAGD maps of different datasets. Red arrows and circles indicate low and high SAGD values, respectively.
3.3.3 Comparison with network’s performance trained by different datasets
We train the same Res-UNet by D_RME, D_GFS, and D_ZPS. As a result, we have three trained networks named RME-Net, GFS-Net, and ZPS-Net, respectively. These three networks are then tested by all the testing datasets and the results are shown in Table 3. We observe that the networks trained with a single type of dataset (RME-Net, GFS-Net, and ZPS-Net) perform best on the test data from the same dataset, and slightly worse on those from other datasets including RDR. This is the so-called generalization problem. For generalization capability, RME-Net is similar to GFS-Net and better than ZPS-Net, which is consistent with the Shannon entropy. What’s more, the congruence operation can significantly improve the accuracy of the networks (see Sec. S3 in the Supplemental Material).
D_RME | D_GFS | D_ZPS | D_RDR | ||
RME-Net | 0.0910 | 0.0982 | 0.1336 | 0.1103 | |
GSF-Net | 0.2263 | 0.0985 | 0.1133 | 0.1184 | |
ZPS-Net | 2.5148 | 0.4221 | 0.0821 | 0.8245 | |
RME-Net | 0.0507 | 0.1037 | 0.2320 | 0.1003 | |
GSF-Net | 0.4571 | 0.0234 | 0.1077 | 0.1557 | |
ZPS-Net | 2.8249 | 0.6252 | 0.0220 | 1.1405 | |
PFS | RME-Net | 0.0010 | 0.0085 | 0.1270 | 0.0594 |
GSF-Net | 0.1485 | 0.0020 | 0.0560 | 0.0333 | |
ZPS-Net | 0.6525 | 0.4075 | 0.0010 | 0.4679 |
Table 3.
To further analyze the error distribution of the neural network and explore its relationship with the SAGD of the dataset, we calculate the mean error map for each network on each testing dataset. As shown in Fig. 11, RME-Net generalizes well to all testing datasets, GSF-Net generalizes poorly to D_RME, and ZPS-Net generalizes poorly to all non-ZPS testing datasets. This is consistent with the results in Table 3. Furthermore, since the SAGD value on the four corners of D_RME and D_GSF is low but the SAGD value on the four corners of D_ZPS is high, the mean error of RME-Net and GSF-Net on D_ZPS is higher on the four corners as indicated by the small red circles. Similarly, since D_ZPS has a low SAGD value in the regions where the SAGD value of D_RDR is high, the mean error of ZPS-Net on D_RDR is larger in the corresponding regions as indicated by the large red ellipses.
Figure 11.Mean error maps for each network. Red circles indicate high mean error value.
Therefore, the best performing RME method is selected and further improved to obtain a dataset with a uniform SAGD. Specifically, based on the RME method, we enlarge the small square matrices into a size of
Figure 12.(a) SAGD maps for D_RME and D_RME1, (b) mean error maps for RME-Net and RME1-Net. Red arrows indicate low SAGD value. Red circles indicate high mean error value and orange circles indicate the comparison part.
For more visualization, in Fig. 13, we show the results of RME1-Net for the four testing datasets with the maximum, medians, and minimum
Figure 13.Partial display of results from RME1-Net. “Max”, “Med,” and “Min” represent specific results with maximal, median, and minimal
Based on the above evaluation, we select the RME as the dataset generation method to do the following testing for different phase unwrapping methods. Furthermore, the congruence operation is always considered due to its high effectiveness. To facilitate the reader to further understand the deep-learning-involved phase unwrapping, we present a step-by-step example in Sec. S5 of the Supplemental Material, which includes dataset generation, neural network making, training, and testing.
4 Comparison of Deep-Learning-Involved Phase Unwrapping Methods
4.1 Dataset Preparation
We generate the absolute phase by the RME method. Then Eqs. (7) and (10) are used to get the ideal dataset
To generate the noisy dataset, we add the noise of different degrees to
To generate the discontinuity-containing dataset, we place a rectangular area with a random size and phase value of
To generate the aliasing-containing dataset, we simply increase
To generate the mixed dataset (containing all three of noise, discontinuity, and aliasing), first, we simply increase
For each noise-containing case, the Gaussian noise with a random standard deviation
For clarity, Table 4 summarizes the different cases (column 1), the datasets (column 2), the networks (column 3), and the used loss functions (column 4). Due to the increase in
Cases | Datasets | Networks | Loss functions |
Ideal case (Sec. | dRG-I | MAE | |
dWC-I | CE + MAE | ||
Noisy case (Sec. | dRG-N | MAE | |
dWC-N | CE+MAE | ||
{ | dDN-N | MAE | |
Discontinuous case (Sec. | dRG-D | MAE | |
dWC-D | CE + MAE | ||
Aliasing case (Sec. | dRG-A | MAE | |
dWC-A | CE + MAE | ||
Mixed case (Sec. | dRG-M | MAE | |
dWC-M | CE + MAE |
Table 4. Summary of networks and corresponding datasets. The form of the dataset is {Input, GT}. The last letter of the network name is the case (“
4.2 Comparison in the Ideal Case
After training, we test the dRG-I and dWC-I. The accuracy evaluation of the networks is shown in Table 5 and Fig. 14. Both dRG with congruence and dWC provide high success rates, with PFS as low as 0.0015 and 0.0025. It should be noted that all three traditional methods, such as line-scanning, congruence of LS, quality guided (QG),10 can get completely correct results, i.e., their PFS is 0. Therefore, on one hand, the deep-learning-involved methods do provide satisfactory results just by some training, and prompt us to further examine and compare them in other more challenging cases; on the other hand, the presence of failed results is annoying, although it is already the state-of-the-art results that we have tried to achieve by now. Ensuring that the neural network can correctly unwrap the wrapped phase with any shape is the focus of future research.
dRG-I | dRG-I-C | dWC-I | |
0.0989 | 0.0005 | 0.0008 | |
0.0515 | 0.0157 | 0.0251 | |
PFS | 0.0015 | 0.0015 | 0.0025 |
PIP | 0.0044 | 0.0044 | 0.0054 |
Table 5.
Figure 14.Results for the (a) dRG-I and (b) dWC-I in the ideal case. “Max,” “Med,” and “Min” represent specific results with maximal, median, and minimal
To test the accuracy and adaptation of the neural network to samples with different heights, a testing dataset with
Figure 15.
4.3 Comparison in the Noisy Case
After training, 2000 pairs of samples in the testing datasets are used to test the corresponding networks, whose accuracy evaluation is shown in Table 6 and Fig. 16. We have the following observations:
dRG-N (GT) | dRG-N-C (GT1) | dWC-N (GT1) | dDN-N (GT) | dDN-N-C (GT1) | |
0.1367 | 0.0285 | 0.0435 | 0.0883 | 0.0229 | |
0.1154 | 0.1148 | 0.1197 | 0.2915 | 0.3056 | |
PFS | 0.2525 | 0.2525 | 0.2840 | 0.1976 | 0.1976 |
PIP | 0.0013 | 0.0013 | 0.0014 | 0.0108 | 0.0088 |
Table 6.
Figure 16.Results for (a) dRG-N, (b) dWC-N, and (c) dDN-N in the noisy case. “GT” represents the pure GT (pure absolute phase), while “GT1” represents the noisy GT (noisy absolute phase). “Max,” “Med,” and “Min” represent specific results with maximal, median, and minimal
Figure 17.Results in different noise levels. Solid and dashed lines represent the deep-learning-involved and traditional methods, respectively.
To test the performance under different noise levels, we generate another noise-increasing dataset by adding Gaussian noise to five pure absolute phases in which the SNR of the wrapped phase gradually decreases from 20 to
Deep-learning-involved methods and WFT-QG can cope with the noise level with the wrapped phase SNR as low as 0 or even lower, but WFT-QG needs to be premised on suitable manually found hyperparameters that usually require several attempts. Other traditional methods can only maintain high accuracy in low to medium noise where the SNR of the wrapped phase is
4.4 Comparison in the Discontinuous Case
After training, we use the testing datasets to test the corresponding networks and traditional methods, whose accuracy evaluation is shown in Table 7 and Fig. 18. All the indices in Table 7 are calculated from pixels outside the square area. We have the following observations:
dRG-I | dRG-D | dRG-D-C | dWC-I | dWC-D | Line-scanning | LS | QG | |
2.0230 | 0.1230 | 0.0261 | 1.2209 | 0.0219 | 3.8054 | 1.3655 | 2.4204 | |
1.7817 | 0.1636 | 0.1827 | 1.3777 | 0.1543 | 3.7172 | 1.0408 | 2.5014 | |
PFS | 0.8120 | 0.0770 | 0.0770 | 0.7385 | 0.0785 | 0.9405 | 0.7120 | 0.8565 |
PIP | 0.2407 | 0.0112 | 0.0112 | 0.1128 | 0.0077 | 0.4400 | 0.1073 | 0.2789 |
Table 7.
Figure 18.Results for (a) dRG-I, (b) dWC-I, (c) dRG-D, (d) dWC-D, (e) line-scanning, (f) LS, and (g) QG methods in the discontinuous case. “Max,” “Med,” and “Min” represent specific results with maximal, median, and minimal
4.5 Comparison in the Aliasing Case
After training, we test the dRG-A, dWC-A, line-scanning, LS, and QG methods, whose accuracy evaluation is shown in Table 8 and Fig. 19. We have the following observations:
dRG-A | dRG-A-C | dWC-A | Line-scanning | LS | QG | |
0.1958 | 0.0078 | 0.0107 | 40.5128 | 6.7199 | 39.8846 | |
0.1390 | 0.1503 | 0.1612 | 21.0695 | 3.1294 | 23.0389 | |
PFS | 0.0075 | 0.0075 | 0.0120 | 0.9820 | 0.9895 | 0.9895 |
PIP | 0.0765 | 0.0765 | 0.0467 | 0.9102 | 0.5705 | 0.8369 |
Table 8.
Figure 19.Results for (a) dRG-A, (b) dWC-A, (c) line-scanning, (d) LS, and (e) QG methods in the aliasing case. “Max,” “Med,” and “Min” represent specific results with maximal, median, and minimal
4.6 Comparison in the Mixed Case
After training, we test the dRG-M, dWC-M, line-scanning, LS and QG methods, whose accuracy evaluation is shown in Table 9 and Fig. 20. We have the following observations:
dRG-M | dRG-M-C | dWC-M | Line-scanning | LS | QG | |
0.2362 | 0.1266 | 0.2206 | 38.4389 | 10.8350 | 39.4653 | |
0.3101 | 0.3790 | 0.4618 | 21.0695 | 3.6269 | 18.1084 | |
PFS | 0.3740 | 0.3740 | 0.4810 | 1.0000 | 1.0000 | 1.0000 |
PIP | 0.0106 | 0.0106 | 0.0107 | 0.9569 | 0.7600 | 0.9107 |
Table 9.
4.7 Performance Summary
Overall, each pixel of the unwrapped phase obtained by dRG and LS has a small error, which can be eliminated by the congruence operation, while the error of the unwrapped phase obtained by dWC, dDN, line-scanning, and QG is either 0 or integer multiples of
Figure 20.Results for (a) dRG-M, (b) dWC-M, (c) line-scanning, (d) LS, and (e) QG methods in the mixed case. “Max,” “Med,” and “Min” represent specific results with maximal, median, and minimal
For the ideal case, all traditional methods (line-scanning, LS, and QG) can achieve perfect results, while deep-learning-involved methods (dRG, dWC, and dDN) cannot guarantee perfect completion of phase unwrapping of any shape due to their noninfinite generalization ability. The line-scanning method is most recommended due to its lowest computational cost.
With the introduction of invalid points (noise, discontinuities, and aliasing), the accuracy of deep-learning-involved methods has a corresponding slight drop but is basically lower than that of traditional methods.
For slight noise (
For discontinuous, aliasing, and even mixed cases, dDN and all traditional methods become powerless due to the Itoh condition, while dRG and dWC can still achieve satisfactory results after targeted training. Therefore, dRG and dWC are recommended as the only options.
To sum up, in Table 10, we present the performance of different methods in the different cases mentioned above.
Cases | dRG | dWC | dDN | Line-scanning | LS | QG | WFT-QG |
Ideal | ✓ | ✓ | ✓ | ✓✓ | ✓ | ✓ | — |
Slight noise | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Moderate noise | ✓✓ | ✓✓ | ✓ | ✗ | ✗ | ✗ | ✓✓ |
Severe noise | ✓✓ | ✓✓ | ✓ | ✗ | ✗ | ✗ | ✓✓ |
Discontinuity | ✓✓ | ✓✓ | — | ✗ | ✗ | ✗ | — |
Aliasing | ✓✓ | ✓✓ | — | ✗ | ✗ | ✗ | — |
Mixed | ✓✓ | ✓✓ | — | ✗ | ✗ | ✗ | — |
Table 10. Performance statistics in the ideal, noisy, discontinuous, and aliasing cases. “✓” represents “capable.” “✓✓” represents “best and recommended.” “✗” represents “incapable.” “—” indicates “not applicable.”
5 Conclusion and Outlook
We have reviewed and compared the phase datasets generation methods (RME, GFS, and ZPS) and deep-learning-involved phase unwrapping methods (dRG, dWC, and dDN). When comparing deep-learning-involved phase unwrapping methods, the traditional methods were also tested for reference. For the dataset selection, the modified RME method is more recommended. The traditional phase unwrapping methods are more reliable and efficient in ideal, slightly, and moderately noisy cases, except WFT-QG, which can get satisfactory accuracy in all noisy cases. The deep-learning-involved methods are more suitable for situations when most of the traditional methods are powerless, such as severe noisy, discontinuous, and aliasing cases.
We aim to provide a relatively uniform and fair condition for different dataset generation methods and phase unwrapping methods. In actual use, the parameter ranges of the three dataset generation methods (such as the initial matrix size in the RME method, the number of Gaussian functions in the GFS method, and the order number of the Zernike coefficient in the ZPS method) can be further expanded for more richness. For the fairness of comparisons and the efficiency of network training, we selected 20,000 as the number of training samples. More samples in practical applications will surely bring further improvement in network performance. In addition to the dataset generation methods mentioned above, the GAN is expected to generate more data based on a small amount of reliable data, thereby further increasing the richness of the dataset.84 Once the training is complete, the size of the neural network input is fixed to the size of the training data. In practical applications, the size of the data to be unwrapped is generally different from that of the neural network input. To dynamically adapt the resolution, we propose to first divide the entire data into multiple overlapping subregions of fixed size, then do phase unwrapping for each subregion by the neural network, and finally use the stitching algorithm to obtain the entire absolute phase.40,85,86
The neural networks used in the above comparations are all based on Res-UNet. Some other types of network structures (such as Bayesian network,87 dynamic convolution,88 and attention UNet89) may further improve the accuracy of phase unwrapping.
All the deep-learning-involved methods mentioned in this paper rely on dataset supervision to learn the mapping relationship from input to GT. The wrapped phase can be unwrapped by passing through the trained neural network only once, but its generalization ability for samples with different shapes is not infinite.
Different types of objects contain different shape features in their phase distributions. For example, as a common object of interferometry, the phase distribution of the lens surface has a style of slow fluctuation and low-frequency information. As a common object of InSAR, the phase distribution of mountainous terrain has a style of obvious tortuous and high-frequency information. As a common object of holographic microscopy, the phase distribution of cells usually has a style that considers both high-frequency and low-frequency information. Therefore, we believe that a promising way to solve the problem of generalization ability is transfer learning.90 The transfer learning way is, first, pretraining a standard neural network with a simulated dataset; then, for a specific target object, using the real dataset of this type of object to fine-tune the neural network; finally, unwrapping the wrapped phase through the fine-tuned neural network at one time.
But what if there is no real dataset of the target object? Different from the dataset-supervision method, inspired by deep image priori,91 Yang et al.92 used the Itoh condition to guide the convergence of the neural network as a physical-model supervision method, without GT and with stronger generalization ability for samples with different shapes. However, the initialized network does not include any mapping relationship from the wrapped phase to the absolute phase, leading to a large number of iterations for each phase unwrapping. Similar to transfer learning, we propose to combine the dataset supervision with physical-model supervision, which can significantly reduce the number of iterations. That is, pretraining the neural network by the simulation-generated dataset, and then fewer iterations are required for the physical-model supervision method.
Here, we show the preliminary results of this idea. To make the input and the neural network related, we change the input of the neural network in Ref. 92 from a random vector to a wrapped phase. We use the method in Sec. 4.1 to generate a training dataset containing 50,000 pairs of data,
As shown in Fig. 21, by the pretrain loss function,
Figure 21.Schematic diagram of pretraining and retraining.
From the retrain loss plot of the five networks shown in Fig. 22, we can find that the convergence speed of the pretrained networks is significantly faster than the initialized one without pretraining. Specifically, the initialized network without pretraining requires 500 epochs to converge, while the pretrained neural networks can converge in
Figure 22.Loss plot of pretrained and initialized networks.
This idea of pretraining with dataset supervision and then retraining with physical-model supervision also has great potential in other fields, such as phase imaging,93 coherent diffractive imaging,94 and holographic reconstruction.95
Kaiqiang Wang received the bachelor’s and PhD degrees from the Northwestern Polytechnical University in 2016 and 2022, respectively. His research interests include computational imaging and deep learning.
Qian Kemao is currently an associate professor at the School of Computer Science and Engineering, Nanyang Technological University, Singapore. He received his bachelor’s, master’s, and PhD degrees from the University of Science and Technology of China in 1994, 1997, and 2000, respectively. He has published more than 150 scientific articles and a monograph. His research interests include optical metrology, image processing, computer vision, and computer animation.
Jianglei Di is currently a professor at the School of Information Engineering, Guangdong University of Technology, Guangzhou, China. He received his BS, MS, and PhD degrees from NPU in 2004, 2007, and 2012, respectively. His research interests include digital holography, optical information processing, optical precision measurement and deep learning.
Jianlin Zhao is currently a professor at the School of Physical Science and Technology, Northwestern Polytechnical University (NPU), Xi’an, China. He received his BS and MS degrees from NPU in 1981 and 1987, respectively. He received his PhD in optics from Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an, in 1998. He is the director of Shaanxi Key Laboratory of Optical Information Technology, and MOE Key Laboratory of Material Physics and Chemistry under Extraordinary Conditions. His research interests include light field manipulation, imaging, information processing, and applications.
References
[3] X. Su, W. Chen. Fourier transform profilometry: a review. Opt. Lasers Eng., 35, 263-284(2001).
[4] C. Zuo et al. Deep learning in optical metrology: a review. Light Sci. Appl., 11, 39(2022).
[7] K. Itoh. Analysis of the phase unwrapping algorithm. Appl. Opt., 21, 2470(1982).
[12] Q. Kemao. Windowed Fourier transform for fringe pattern analysis. Appl. Opt., 43, 2695(2004).
[14] K. Qian. Windowed Fringe Pattern Analysis(2013).
[16] D. C. Ghiglia, M. D. Pritt. Two-Dimensional Phase Unwrapping: Theory, Algorithms, and Software(1998).
[22] G. Dardikman, N. T. Shaked. Phase unwrapping using residual neural networks, CW3B.5(2018).
[24] K. Wang et al. One-step robust deep learning phase unwrapping. Opt. Express, 27, 15100(2019).
[25] J. J. He et al. Deep spatiotemporal phase unwrapping of phase-contrast MRI data, 1962(2019).
[26] K. Ryu et al. Development of a deep learning method for phase unwrapping MR images, 4707(2019).
[29] M. V. Perera, A. De Silva. A joint convolutional and spatial quad-directional LSTM network for phase unwrapping, 4055-4059(2021).
[34] R. Liang et al. Phase unwrapping using segmentation(2018).
[46] D. E. Rumelhart, G. E. Hinton, R. J. Williams. Learning internal representations by error propagation. California Univ. San Diego La Jolla Inst for Cognitive Science, 399-421(1988).
[47] K. He et al. Deep residual learning for image recognition, 770-778(2016).
[48] N. Navab, O. Ronneberger, P. Fischer, T. Broxet?al.. U-Net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, 234-241(2015).
[49] L.-C. Chen et al. Encoder-decoder with atrous separable convolution for semantic image segmentation, 801-818(2018).
[54] N. Borhani et al. Learning to see through multimode fibers. Optica, 5, 960(2018).
[60] S. Feng et al. Fringe pattern analysis using deep learning. Adv. Photon., 1, 025001(2019).
[63] W. Yin et al. Temporal phase unwrapping using deep learning. Sci. Rep., 9, 20175(2019).
[81] C. E. Shannon. A mathematical theory of communication. Bell Syst. Tech. J., 27, 379-423(1948).
[83] K. Qian. Updated WFT for fringe analysis(2017).
[84] I. Goodfellow et al. Generative adversarial nets(2014).
[87] I. Guyon, A. Kendall, Y. Galet?al.. What uncertainties do we need in Bayesian deep learning for computer vision?. Advances in Neural Information Processing Systems(2017).
[88] Y. Chen et al. Dynamic convolution: attention over convolution kernels, 11030-11039(2020).
[89] O. Oktay et al. Attention U-net: learning where to look for the pancreas(2018).
[90] Z. Ghahramani, J. Yosinski et al. How transferable are features in deep neural networks?. Adv. in Neural Inform. Process. Syst.(2014).
[91] D. Ulyanov, A. Vedaldi, V. Lempitsky. Deep image prior, 9446-9454(2018).
[93] F. Wang et al. Phase imaging with an untrained neural network. Light Sci. Appl., 9, 77(2020).

Set citation alerts for the article
Please enter your email address