Harnessing the magic of light: spatial coherence instructed swin transformer for universal holographic imaging

Xin Tong; Renjun Xu; Pengfei Xu; Zishuai Zeng; Shuxi Liu; Daomu Zhao

doi:10.1117/1.AP.5.6.066003

Journals >Advanced Photonics >Volume 5 >Issue 6 >Page 066003 > Article

Advanced Photonics
Vol. 5, Issue 6, 066003 (2023)

Harnessing the magic of light: spatial coherence instructed swin transformer for universal holographic imaging

Xin Tong^1、2, Renjun Xu², Pengfei Xu¹, Zishuai Zeng¹, Shuxi Liu¹, and Daomu Zhao^1、*

Author Affiliations

¹Zhejiang University, School of Physics, Zhejiang Province Key Laboratory of Quantum Technology and Device, Hangzhou, China

²Zhejiang University, Center for Data Science, Hangzhou, China

show less

DOI: 10.1117/1.AP.5.6.066003 Cite this Article Set citation alerts

Xin Tong, Renjun Xu, Pengfei Xu, Zishuai Zeng, Shuxi Liu, Daomu Zhao. Harnessing the magic of light: spatial coherence instructed swin transformer for universal holographic imaging[J]. Advanced Photonics, 2023, 5(6): 066003 Copy Citation Text

EndNote(RIS)

BibTex

Plain Text

show less

Principle and performance of TWC-Swin method. (a) LPR. SC modulation can adjust the SC by changing the distance D. Holographic modulation is used to load the phase hologram. The LPR generates two outputs, one for calculating SC and the other for network input. HWP, half-wave plate; PBS, polarized beam splitter; L, lens; RD, rotating diffuser; SLM, spatial light modulator; F, filter. D, distance between L1 and RD. (b) The detailed flow of the TWC-Swin method. The swin adapter can select the optimal model from the model space by obtaining SC. The color picture represents a case in progress. (c) Swin-model space and architecture of the swin model. The architecture of M1−M11 is the same; only the weights are different. The weights are obtained by network training at different distances. (d) The correspondence between SC and swin-model space. See Table S1 in the Supplementary Material for detailed data. (e) Inputs and outputs of the swin model with different SCs. (f) SSIM and PCC of swin-model outputs at different SCs. (g) Training and test data acquisition process. The training data did not contain any turbulence. (h) SSIM and PCC of swin-model outputs at different turbulent scenes.

Fig. 1. Principle and performance of TWC-Swin method. (a) LPR. SC modulation can adjust the SC by changing the distance

D

. Holographic modulation is used to load the phase hologram. The LPR generates two outputs, one for calculating SC and the other for network input. HWP, half-wave plate; PBS, polarized beam splitter; L, lens; RD, rotating diffuser; SLM, spatial light modulator; F, filter. D, distance between L1 and RD. (b) The detailed flow of the TWC-Swin method. The swin adapter can select the optimal model from the model space by obtaining SC. The color picture represents a case in progress. (c) Swin-model space and architecture of the swin model. The architecture of

M_{1} - M_{11}

is the same; only the weights are different. The weights are obtained by network training at different distances. (d) The correspondence between SC and swin-model space. See Table S1 in the Supplementary Material for detailed data. (e) Inputs and outputs of the swin model with different SCs. (f) SSIM and PCC of swin-model outputs at different SCs. (g) Training and test data acquisition process. The training data did not contain any turbulence. (h) SSIM and PCC of swin-model outputs at different turbulent scenes.

Download full size | View in the Article

Qualitative analysis of our method’s performance at the different SCs. Input, raw image captured by CMOS1. Output, image processed by the network. (a)–(k) Different SCs: (a) D=f1, SC is 0.494; (b) D=1.1f1, SC is 0.475; (c) D=1.2f1, SC is 0.442; (d) D=1.3f1, SC is 0.419; (e) D=1.4f1, SC is 0.393; (f) D=1.5f1, SC is 0.368; (g) D=1.6f1, SC is 0.337; (h) D=1.7f1, SC is 0.311; (i) D=1.8f1, SC is 0.285; (j) D=1.9f1, SC is 0.25; and (k) D=2f1, SC is 0.245. D means the distance between L1 and RD in the LPR and f1 is the focal length of L1. Our method can achieve improved image quality under low SC (Video 1, MP4, 1.5 MB [URL: https://doi.org/10.1117/1.AP.5.6.066003.s1]).

Fig. 2. Qualitative analysis of our method’s performance at the different SCs. Input, raw image captured by CMOS1. Output, image processed by the network. (a)–(k) Different SCs: (a)

D = f_{1}

, SC is 0.494; (b)

D = 1.1 f_{1}

, SC is 0.475; (c)

D = 1.2 f_{1}

, SC is 0.442; (d)

D = 1.3 f_{1}

, SC is 0.419; (e)

D = 1.4 f_{1}

, SC is 0.393; (f)

D = 1.5 f_{1}

, SC is 0.368; (g)

D = 1.6 f_{1}

, SC is 0.337; (h)

D = 1.7 f_{1}

, SC is 0.311; (i)

D = 1.8 f_{1}

, SC is 0.285; (j)

D = 1.9 f_{1}

, SC is 0.25; and (k)

D = 2 f_{1}

, SC is 0.245.

D

means the distance between L1 and RD in the LPR and

f_{1}

is the focal length of L1. Our method can achieve improved image quality under low SC (Video 1, MP4, 1.5 MB [URL: https://doi.org/10.1117/1.AP.5.6.066003.s1]).

Download full size | View in the Article

Fig. 3. Average results of the evaluation indices for each test data set. The coherence is 0.368. Results of other coherences are provided in Fig. S2 in the Supplementary Material. All evaluation indices demonstrate that our method possesses strong image restoration ability under low SC.

Download full size | View in the Article

Fig. 4. Qualitative analysis of our method’s performance across varying intensities of (a) oceanic and (b) atmospheric turbulence. The network trained with coherence as physical prior information can effectively overcome the impact of turbulence on imaging and improve image quality. (O1)–(O5) mean oceanic turbulence phase and (A1)–(A5) mean atmospheric turbulence phase. (O1)

χ_{t} = 10^{- 9} K^{2} / s

, coherence is 0.491. (O2)

χ_{t} = 10^{- 7} K^{2} / s

, coherence is 0.482. (O3)

χ_{t} = 2 \times 10^{- 7} K^{2} / s

, coherence is 0.447. (O4)

χ_{t} = 4 \times 10^{- 7} K^{2} / s

, coherence is 0.404. (O5)

χ_{t} = 10^{- 6} K^{2} / s

, coherence is 0.373. (A1)

C_{n}^{2} = 10^{- 14} m^{3 - α}

, coherence is 0.507. (A2)

C_{n}^{2} = 1.5 \times 10^{- 13} m^{3 - α}

, coherence is 0.459. (A3)

C_{n}^{2} = 2.5 \times 10^{- 13} m^{3 - α}

, coherence is 0.43. (A4)

C_{n}^{2} = 3.5 \times 10^{- 13} m^{3 - α}

, coherence is 0.403. (A5)

C_{n}^{2} = 5 \times 10^{- 13} m^{3 - α}

, coherence is 0.378. Other parameter settings of the turbulent power spectrum function can be found in Table S2 in the Supplementary Material (Video 2, MP4, 36.4 MB [URL: https://doi.org/10.1117/1.AP.5.6.066003.s2]).

Download full size | View in the Article

Fig. 5. Visualization of performance of different methods. The SSIM is shown in the bottom left corner. Our method presents the best performance, which is shown by smoother images with lower noise. (a) Sample selected with the WED data set and magnified insets of the red bounding region. (b) Sample selected with Flickr data set and magnified insets of the red bounding region. The pure swin model can be obtained by removing the postprocessing block of the swin model (Video 3, MP4, 0.6 MB [URL: https://doi.org/10.1117/1.AP.5.6.066003.s3]).

Download full size | View in the Article

Fig. 6. Performance between different methods on various data sets with SC being 0.494. Our model outperforms other methods across various data sets and indices.

Download full size | View in the Article

Fig. 7. (a), (b) Performance comparison between different methods at various turbulent scenes. (A1)

C_{n}^{2} = 10^{- 14} m^{3 - α}

, coherence is 0.506. (A2)

C_{n}^{2} = 1.5 \times 10^{- 13} m^{3 - α}

, coherence is 0.459. (O1)

χ_{t} = 10^{- 9} K^{2} / s

, coherence is 0.491. (O2)

χ_{t} = 10^{- 7} K^{2} / s

, coherence is 0.482. Note that all methods are trained with coherence as physical prior information and improve image quality under turbulence conditions. This demonstrates that incorporating appropriate physical prior information can help the network cope with multiscene tasks.

Download full size | View in the Article


SC	SSIM	PCC
BSD	CelebA	Flickr	WED	DIV	BSD	CelebA	Flickr	WED	DIV
Input_ $f_{1}$ , SC = 0.494	0.5893	0.5943	0.4296	0.6155	0.4625	0.9368	0.9575	0.9210	0.9146	0.8753
Output_ $f_{1}$	0.8984	0.8908	0.8523	0.9019	0.8940	0.9807	0.9893	0.9848	0.9930	0.9819
Input_ $1.3 f_{1}$ , SC = 0.419	0.5775	0.5415	0.3917	0.6245	0.4184	0.8953	0.9303	0.8588	0.9149	0.8043
Output_ $1.3 f_{1}$	0.9189	0.8842	0.8676	0.8997	0.8918	0.9843	0.9928	0.9880	0.9928	0.9827
Input_ $1.5 f_{1}$ , SC = 0.368	0.6178	0.5394	0.2777	0.5677	0.3892	0.8957	0.9211	0.8396	0.8961	0.8144
Output_ $1.5 f_{1}$	0.8906	0.8513	0.8171	0.8541	0.8622	0.9691	0.9881	0.9783	0.9869	0.9680
Input_ $1.7 f_{1}$ , SC = 0.311	0.6040	0.5017	0.3183	0.5510	0.4136	0.8303	0.9035	0.8511	0.8568	0.7979
Output_ $1.7 f_{1}$	0.8624	0.7791	0.7483	0.8013	0.8038	0.9644	0.9787	0.9702	0.9759	0.9583
Input_ $2 f_{1}$ , SC = 0.245	0.4881	0.4469	0.3073	0.5271	0.3643	0.8072	0.8817	0.7557	0.8326	0.7196
Output_ $2 f_{1}$	0.8146	0.7540	0.6962	0.7722	0.7572	0.9431	0.9713	0.9505	0.9631	0.9341
Ground truth	1	1	1	1	1	1	1	1	1	1

Table 1. Quantitative analysis of evaluation indices (SSIM and PCC) at different SCs and test samples^a. f1 is the focal length of L1. SC means spatial coherence of the light source.

View in the Article


Oceanic turbulence	SSIM	PCC
BSD	CelebA	Flickr	WED	DIV	BSD	CelebA	Flickr	WED	DIV
Input (O1)	0.5331	0.6773	0.6810	0.6016	0.7018	0.8978	0.9404	0.8876	0.9096	0.8718
Output (O1)	0.8088	0.7916	0.8368	0.8077	0.8172	0.9303	0.9707	0.9334	0.9560	0.9044
Input (O2)	0.5098	0.6566	0.6690	0.5716	0.5371	0.8855	0.9329	0.8786	0.8970	0.8494
Output (O2)	0.7823	0.7609	0.8015	0.7819	0.8005	0.9211	0.9611	0.9209	0.9448	0.8901
Input (O3)	0.4950	0.6538	0.6575	0.5455	0.5281	0.8764	0.9313	0.8585	0.8916	0.8371
Output (O3)	0.7191	0.7169	0.8434	0.7378	0.7984	0.8896	0.9413	0.8871	0.9344	0.8793
Input (O4)	0.4796	0.6408	0.6474	0.5034	0.5074	0.8774	0.9245	0.8576	0.8664	0.8130
Output (O4)	0.7060	0.6932	0.7287	0.6718	0.7217	0.8847	0.9379	0.8835	0.8892	0.8213
Input (O5)	0.4519	0.6041	0.6202	0.4446	0.4945	0.8456	0.9075	0.8287	0.8281	0.7631
Output (O5)	0.6899	0.6721	0.7225	0.6286	0.6958	0.8909	0.9415	0.8888	0.8839	0.8152
Ground truth	1	1	1	1	1	1	1	1	1	1

Table 2. Quantitative analysis of evaluation indices (SSIM and PCC) at different oceanic turbulence intensities^a.

View in the Article


Atmospheric turbulence	SSIM	PCC
BSD	CelebA	Flickr	WED	DIV	BSD	CelebA	Flickr	WED	DIV
Input (A1)	0.5738	0.6821	0.6988	0.6495	0.6338	0.9014	0.9404	0.8929	0.9160	0.9766
Output (A1)	0.7798	0.7741	0.8337	0.8161	0.8231	0.9361	0.9564	0.9215	0.9574	0.9116
Input (A2)	0.5311	0.6513	0.6727	0.5743	0.5701	0.8797	0.9264	0.8676	0.8896	0.8279
Output (A2)	0.7312	0.6938	0.7699	0.6960	0.7581	0.8920	0.9353	0.8924	0.9141	0.8643
Input (A3)	0.5083	0.6383	0.6785	0.5348	0.5720	0.8688	0.9202	0.8493	0.8747	0.8081
Output (A3)	0.6615	0.6797	0.7427	0.6362	0.7369	0.8843	0.9392	0.8708	0.8919	0.8418
Input (A4)	0.4965	0.6264	0.6635	0.5202	0.5575	0.8590	0.9161	0.8364	0.8673	0.8040
Output (A4)	0.6915	0.6751	0.7287	0.6336	0.7273	0.8789	0.9308	0.8705	0.8855	0.8331
Input (A5)	0.4959	0.6153	0.6595	0.4840	0.5407	0.8524	0.9080	0.8263	0.8493	0.7862
Output (A5)	0.6761	0.6893	0.7201	0.6127	0.6802	0.8719	0.9465	0.8875	0.8749	0.8255
Ground truth	1	1	1	1	1	1	1	1	1	1