• Photonics Research
  • Vol. 10, Issue 8, 1848 (2022)
Lishun Wang1,2, Zongliang Wu3, Yong Zhong1,2,4,*, and Xin Yuan3,5,*
Author Affiliations
  • 1Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu 610041, China
  • 2University of Chinese Academy of Sciences, Beijing 100049, China
  • 3Research Center for Industries of the Future and School of Engineering, Westlake University, Hangzhou 310030, China
  • 4e-mail: zhongyong@casit.com.cn
  • 5e-mail: xyuan@westlake.edu.cn
  • show less
    DOI: 10.1364/PRJ.458231 Cite this Article Set citation alerts
    Lishun Wang, Zongliang Wu, Yong Zhong, Xin Yuan, "Snapshot spectral compressive imaging reconstruction using convolution and contextual Transformer," Photonics Res. 10, 1848 (2022) Copy Citation Text show less
    Reconstructed real data of Legoman, captured by snapshot SCI systems in Ref. [20]. We show reconstruction results of 12 spectral channels, and compare our proposed method with the latest self-supervised method (PnP-DIP-HSI [23]) and the method based on maximum a posteriori (MAP) estimation (DGSMP algorithm [24]). As can be seen from the purple and green areas in the plot, our method reconstructs a clearer image, the PnP-DIP-HSI method produces some artifacts, and the DGSMP method loses some details.
    Fig. 1. Reconstructed real data of Legoman, captured by snapshot SCI systems in Ref. [20]. We show reconstruction results of 12 spectral channels, and compare our proposed method with the latest self-supervised method (PnP-DIP-HSI [23]) and the method based on maximum a posteriori (MAP) estimation (DGSMP algorithm [24]). As can be seen from the purple and green areas in the plot, our method reconstructs a clearer image, the PnP-DIP-HSI method produces some artifacts, and the DGSMP method loses some details.
    Schematic diagrams of CASSI system.
    Fig. 2. Schematic diagrams of CASSI system.
    Architecture of the proposed GAP-CCoT. (a) GAP-net with N stages; G(·) represents the operation of Eq. (6), D(·) represents a denoiser, and v(0)=HTg. (b) CCoT-net, the proposed denoising network plugged into GAP algorithm. (c) Convolution branch and Transformer branch; the output is connected with concatenation. (d) Convolution block with channel attention; c represents the output number of convolution channels. (e) Contextual Transformer block. (f) Pixelshuffle algorithm for fast upsampling.
    Fig. 3. Architecture of the proposed GAP-CCoT. (a) GAP-net with N stages; G(·) represents the operation of Eq. (6), D(·) represents a denoiser, and v(0)=HTg. (b) CCoT-net, the proposed denoising network plugged into GAP algorithm. (c) Convolution branch and Transformer branch; the output is connected with concatenation. (d) Convolution block with channel attention; c represents the output number of convolution channels. (e) Contextual Transformer block. (f) Pixelshuffle algorithm for fast upsampling.
    Reconstruction results of GAP-CCoT and other spectral reconstruction algorithms (λ-net, HSSP, TSA-net, GAP-net, DGSMP, PnP-DIP-HSI) in scene 3 and scene 9. Zoom in for better view.
    Fig. 4. Reconstruction results of GAP-CCoT and other spectral reconstruction algorithms (λ-net, HSSP, TSA-net, GAP-net, DGSMP, PnP-DIP-HSI) in scene 3 and scene 9. Zoom in for better view.
    Architecture of the proposed Stacked CCoT. The input of the network is HTg, and CCoT-net is the same as in Fig. 3(b).
    Fig. 5. Architecture of the proposed Stacked CCoT. The input of the network is HTg, and CCoT-net is the same as in Fig. 3(b).
    Effect of stage number on SCI reconstruction quality.
    Fig. 6. Effect of stage number on SCI reconstruction quality.
    Reconstruction results of GAP-CCoT and other spectral reconstruction algorithms (λ-net, TSA-net, GAP-net, DGSMP, PnP-DIP-HSI) in two real scenes (scene 1 and scene 2).
    Fig. 7. Reconstruction results of GAP-CCoT and other spectral reconstruction algorithms (λ-net, TSA-net, GAP-net, DGSMP, PnP-DIP-HSI) in two real scenes (scene 1 and scene 2).
    Reconstructed frame of our method and other algorithms (GAP-TV, DeSCI, PnP-FFDNet, U-net, BIRNAT, RevSCI) on six benchmark datasets.
    Fig. 8. Reconstructed frame of our method and other algorithms (GAP-TV, DeSCI, PnP-FFDNet, U-net, BIRNAT, RevSCI) on six benchmark datasets.
    AlgorithmsScene 1Scene 2Scene 3Scene 4Scene 5Scene 6Scene 7Scene 8Scene 9Scene 10Average
    TwIST [6]24.8119.9921.1430.3021.6822.1617.7122.3921.4322.8722.44±3.32
    0.7300.6320.7640.8740.6880.6600.6940.6820.7290.5950.704±0.077
    GAP-TV [7]25.1320.6723.1935.1322.3122.9017.9823.0023.3623.7023.73±4.45
    0.7240.6300.7570.8700.6740.6350.6700.6240.7170.5510.685±0.088
    DeSCI [8]27.1522.2626.5639.0024.8023.5520.0320.2923.9825.9425.35±5.38
    0.7940.6940.8770.9650.7780.7530.7720.7400.8180.6660.785±0.087
    HSSP [19]31.4831.0928.9634.5628.5330.8328.7130.0930.4328.7830.35±3.79
    0.8580.8420.8320.9020.8080.8770.8240.8810.8680.8420.852±0.049
    λ-net [9]30.8226.3029.4236.2727.8430.6924.2028.8629.3227.6629.14±3.20
    0.8800.8460.9160.9620.8660.8860.8750.8800.9020.8430.886±0.035
    TSA-net [71]31.2626.8830.0339.9028.8931.3025.1629.6930.0328.3230.15±3.92
    0.8870.8550.9210.9640.8780.8950.8870.8870.9030.8480.893±0.033
    PnP-DIP-HSI [23]32.7027.2731.3240.7929.8130.4128.1829.4534.5528.5231.30±3.98
    0.8980.8320.9200.9700.9030.8900.9130.8850.9320.8630.901±0.038
    GAP-net [20]33.0329.5233.0441.5930.9532.8827.6030.1732.7429.7332.13±3.81
    0.9210.9030.9400.9720.9240.9270.9210.9040.9270.9010.924±0.021
    DGSMP [24]33.2632.0933.0640.5428.8633.0830.7431.5531.6631.4432.63±3.07
    0.9150.8980.9250.9640.8820.9370.8860.9230.9110.9250.917±0.024
    SSI-ResU-Net (v1) [10]34.0630.8533.1440.7931.5734.9927.9333.2433.5831.5533.17±3.34
    0.9260.9020.9240.9700.9390.9550.8610.9490.9310.9340.929±0.030
    Ours35.1735.9036.9142.2532.6134.9533.4633.1335.7532.4335.26±2.89
    0.9380.9480.9580.9770.9480.9570.9230.9520.9540.9410.950±0.014
    Table 1. Average PSNR in dB (upper entry in each cell) and SSIM (lower entry in each cell) of Different Algorithms on 10 Synthetic Datasetsa
    AlgorithmParams (106)FLOPs (109)PSNR (dB)SSIM
    λ-net [9]66.16514.3329.250.886
    TSA-net [71]44.25135.0330.150.893
    GAP-net [20]2.8954.1632.130.924
    DGSMP [24]3.76647.2832.630.917
    SSI-ResU-Net (v1) [10]1.2581.9833.170.929
    GAP-CCoT-S32.6831.8433.890.934
    GAP-CCoT-S98.0495.5235.260.950
    Table 2. Computational Complexity and Average Reconstruction Quality of Several SOTA Algorithms on 10 Synthetic Datasets
    MaskPSNR (dB)SSIM
    Mask used in training35.26±2.890.950±0.014
    New mask 135.10±2.920.949±0.015
    New mask 235.06±2.910.948±0.015
    New mask 335.06±2.910.949±0.015
    New mask 435.02±2.920.948±0.014
    New mask 534.99±2.900.948±0.014
    Table 3. Average PSNR and SSIM Results on 10 Synthetic Data with Different Masks
    AlgorithmsPSNR (dB)SSIM
    Stacked CCoT w/o CoT32.86±3.010.924±0.021
    GAP-CCoT w/o CoT34.13±2.950.933±0.019
    Stacked CCoT34.27±2.940.936±0.018
    GAP-CCoT35.26±2.890.950±0.014
    Table 4. Ablation Study: Average PSNR and SSIM Values of Different Algorithms on 10 Synthetic Data
    Stage NumberParams (106)FLOPs (109)PSNR (dB)SSIM
    32.6831.8433.890.934
    54.4753.0634.300.936
    76.2574.2934.860.940
    98.0495.5235.260.950
    1210.72127.3535.430.951
    1513.41159.1935.540.952
    Table 5. Computational Complexity and Average Reconstruction Quality of GAP-CCoT on 10 Synthetic Data with Different Stages
    Loss FunctionPSNR (dB)SSIM
    LAD35.480.952
    MSE35.260.950
    Table 6. Average PSNR and SSIM Results on 10 Synthetic Data with Different Loss Functions
    AlgorithmPSNR (dB)SSIMRunning Time (s)
    GAP-TV [7]26.73±4.330.858±0.0824.201 (CPU)
    PnP-FFDNet [74]29.70±6.750.892±0.0713.010 (GPU)
    DeSCI [8]32.65±7.070.935±0.0476180 (CPU)
    BIRNAT [75]33.31±5.900.951±0.0270.165 (GPU)
    U-net [76]29.45±4.750.882±0.0570.031 (GPU)
    GAP-net-U-net-S12 [20]32.86±5.920.947±0.0300.03 (GPU)
    MetaSCI [77]31.72±5.720.926±0.0400.025 (GPU)
    RevSCI [78]33.92±6.020.956±0.0250.190 (GPU)
    Ours33.53±5.900.954±0.0260.064 (GPU)
    Table 7. Extending Our Method for Video Compressive Sensing: Average PSNR, SSIM, and Running Time per Measurement of Different Algorithms on Six Benchmark Datasets
    AlgorithmParams (106)FLOPs (109)PSNR (dB)SSIM
    BIRNAT [75]4.13390.5633.310.951
    U-net [76]0.8253.6329.450.882
    GAP-net-U-net-S12 [20]5.6287.5832.860.947
    MetaSCI [77]2.8954.1631.720.926
    RevSCI [78]5.66766.9533.920.956
    Ours10.51113.7533.530.954
    Table 8. Computational Complexity and Average Reconstruction Quality of Several SOTA Algorithms on Six Grayscale Benchmark Datasets
    Lishun Wang, Zongliang Wu, Yong Zhong, Xin Yuan, "Snapshot spectral compressive imaging reconstruction using convolution and contextual Transformer," Photonics Res. 10, 1848 (2022)
    Download Citation