• Photonics Research
  • Vol. 9, Issue 11, 2277 (2021)
Zhihong Zhang1、2、†, Chao Deng1、2、†, Yang Liu3, Xin Yuan4、6, Jinli Suo1、2、*, and Qionghai Dai1、2、5
Author Affiliations
  • 1Department of Automation, Tsinghua University, Beijing 100084, China
  • 2Institute for Brain and Cognitive Science, Tsinghua University, Beijing 100084, China
  • 3Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
  • 4Westlake University, Hangzhou 310024, China
  • 5Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
  • 6e-mail: xyuan@westlake.edu.cn
  • show less
    DOI: 10.1364/PRJ.435256 Cite this Article Set citation alerts
    Zhihong Zhang, Chao Deng, Yang Liu, Xin Yuan, Jinli Suo, Qionghai Dai. Ten-mega-pixel snapshot compressive imaging with a hybrid coded aperture[J]. Photonics Research, 2021, 9(11): 2277 Copy Citation Text show less
    Our 10-mega-pixel video SCI system (a) and the schematic (b). Ten high-speed (200 fps) high-resolution (3200×3200 pixels) video frames (c) reconstructed from a snapshot measurement (d), with motion detail in (e) for the small region in the blue box of (d). Different from existing solutions that only use an LCoS or a mask (thus with limited spatial resolution), our 10-mega-pixel spatio-temporal coding is generated jointly by an LCoS at the aperture plane and a static mask close to the image plane.
    Fig. 1. Our 10-mega-pixel video SCI system (a) and the schematic (b). Ten high-speed (200 fps) high-resolution (3200×3200 pixels) video frames (c) reconstructed from a snapshot measurement (d), with motion detail in (e) for the small region in the blue box of (d). Different from existing solutions that only use an LCoS or a mask (thus with limited spatial resolution), our 10-mega-pixel spatio-temporal coding is generated jointly by an LCoS at the aperture plane and a static mask close to the image plane.
    Pipeline of the proposed large-scale HCA-SCI system (left) and the PnP reconstruction algorithms (right). Left: During the encoded photography stage, a dynamic low-resolution mask at the aperture plane and a static high-resolution mask close to the sensor plane work together to generate a sequence of high-resolution codes to encode the large-scale video into a snapshot. Right: In the decoding, the video is reconstructed under a PnP framework incorporating deep denoising prior and TV prior into a convex optimization (GAP), which leverages the good convergence of GAP and the high efficiency of the deep network.
    Fig. 2. Pipeline of the proposed large-scale HCA-SCI system (left) and the PnP reconstruction algorithms (right). Left: During the encoded photography stage, a dynamic low-resolution mask at the aperture plane and a static high-resolution mask close to the sensor plane work together to generate a sequence of high-resolution codes to encode the large-scale video into a snapshot. Right: In the decoding, the video is reconstructed under a PnP framework incorporating deep denoising prior and TV prior into a convex optimization (GAP), which leverages the good convergence of GAP and the high efficiency of the deep network.
    Illustration of the multiplexed mask generation. For the same scene point, its images generated by different sub-apertures (marked as blue, yellow, and red, respectively) intersect the mask plane with different regions and are thus encoded with corresponding (shifted) random masks before summation at the sensor. The multiplexing would raise the light flux for high SNR recording, while doing so only with slight performance degeneration.
    Fig. 3. Illustration of the multiplexed mask generation. For the same scene point, its images generated by different sub-apertures (marked as blue, yellow, and red, respectively) intersect the mask plane with different regions and are thus encoded with corresponding (shifted) random masks before summation at the sensor. The multiplexing would raise the light flux for high SNR recording, while doing so only with slight performance degeneration.
    Multiplexing pattern schemes used in our experiments (taking Cr=6 for an example). Top row: multiplexing patterns for simulation experiments. Each pattern contains 50% open sub-apertures, and each sub-aperture is a 512×512 binning macro pixel on the LCoS. Bottom row: multiplexing patterns for real experiments. Each pattern contains an open circle with a radius of about 400 pixels, and the circles in adjacent patterns have a rotation of 360/Cr degrees.
    Fig. 4. Multiplexing pattern schemes used in our experiments (taking Cr=6 for an example). Top row: multiplexing patterns for simulation experiments. Each pattern contains 50% open sub-apertures, and each sub-aperture is a 512×512 binning macro pixel on the LCoS. Bottom row: multiplexing patterns for real experiments. Each pattern contains an open circle with a radius of about 400 pixels, and the circles in adjacent patterns have a rotation of 360/Cr degrees.
    Reconstruction results and comparison with state-of-the-art algorithms on simulated data at different resolutions (left: 256×256, middle: 512×512, right: 1024×1024) and with different compression ratios (top: Cr=10, bottom: Cr=20). The BIRNAT results are not available for 512×512 and 1024×1024 since the model training will be out of memory. See Visualization 1, Visualization 2, Visualization 3, Visualization 4, Visualization 5, and Visualization 6 for the reconstructed videos.
    Fig. 5. Reconstruction results and comparison with state-of-the-art algorithms on simulated data at different resolutions (left: 256×256, middle: 512×512, right: 1024×1024) and with different compression ratios (top: Cr=10, bottom: Cr=20). The BIRNAT results are not available for 512×512 and 1024×1024 since the model training will be out of memory. See Visualization 1, Visualization 2, Visualization 3, Visualization 4, Visualization 5, and Visualization 6 for the reconstructed videos.
    Noise robustness comparison between multiplexed and non-multiplexed masks.
    Fig. 6. Noise robustness comparison between multiplexed and non-multiplexed masks.
    Reconstruction results of the PnP–TV–FastDVDNet on real data captured by our HCA-SCI system (Cr=6, 10, 20, and 30). Note the full frames are of 3200×3200, and we plot small regions about 400×400 in size to demonstrate the high-speed motion.
    Fig. 7. Reconstruction results of the PnP–TV–FastDVDNet on real data captured by our HCA-SCI system (Cr=6, 10, 20, and 30). Note the full frames are of 3200×3200, and we plot small regions about 400×400 in size to demonstrate the high-speed motion.
    Reconstruction comparison between the GAP–TV, PnP–FFDNet, and PnP–TV–FastDVDNet on real data captured by our HCA-SCI system (Cr=6, 10, 20, and 30). Note the full frames are of 3200×3200, and we plot small regions 512×512 in size to demonstrate the high-speed motion. See Visualization 7 for the reconstructed videos.
    Fig. 8. Reconstruction comparison between the GAP–TV, PnP–FFDNet, and PnP–TV–FastDVDNet on real data captured by our HCA-SCI system (Cr=6, 10, 20, and 30). Note the full frames are of 3200×3200, and we plot small regions 512×512 in size to demonstrate the high-speed motion. See Visualization 7 for the reconstructed videos.
    ScalesAlgorithmsFootballHummingbirdReadySteadyGoJockeyYachtRideAverage
    256×256GAP–TV27.82, 0.828029.24, 0.791823.73, 0.749931.63, 0.871226.65, 0.805627.81, 0.8093
    PnP–FFDNet27.06, 0.826425.52, 0.691221.68, 0.685931.14, 0.849323.69, 0.703525.82, 0.7513
    PnP–TV–FastDVDNet31.31, 0.912331.19, 0.826426.18, 0.827631.36, 0.881728.90, 0.884129.79, 0.8664
    BIRNAT34.67, 0.971934.33, 0.954629.50, 0.938936.24, 0.971131.02, 0.943133.15, 0.9559
    512×512GAP–TV29.19, 0.885428.32, 0.788725.94, 0.791831.30, 0.871826.59, 0.793928.27, 0.8263
    PnP–FFDNet28.57, 0.895228.02, 0.836324.32, 0.745729.81, 0.824823.45, 0.679326.83, 0.7963
    PnP–TV–FastDVDNet30.92, 0.933332.24, 0.883427.04, 0.824632.11, 0.883927.87, 0.848730.04, 0.8748
    1024×1024GAP–TV30.63, 0.902229.16, 0.845928.92, 0.869831.59, 0.895329.03, 0.847029.87, 0.8720
    PnP–FFDNet29.87, 0.902327.70, 0.786927.70, 0.848329.88, 0.841225.55, 0.721128.14, 0.8200
    PnP–TV–FastDVDNet30.35, 0.926531.71, 0.890929.42, 0.891331.59, 0.901430.44, 0.871330.70, 0.8963
    Table 1. Average Results of PSNR in dB (left entry in each cell) and SSIM (right entry in each cell) by Different Algorithms (Cr=10)a
    ScalesAlgorithmsFootballHummingbirdReadySteadyGoJockeyYachtRideAverage
    256×256GAP–TV25.01, 0.754426.33, 0.689320.48, 0.632628.13, 0.831823.56, 0.712924.70, 0.7242
    PnP–FFDNet21.67, 0.665722.13, 0.583517.27, 0.534027.78, 0.799420.39, 0.602421.85, 0.6370
    PnP–TV–FastDVDNet27.83, 0.845928.65, 0.752023.28, 0.738129.51, 0.859726.34, 0.823527.12, 0.8038
    512×512BIRNAT27.91, 0.902128.58, 0.880023.79, 0.827931.35, 0.946726.14, 0.858527.55, 0.8830
    GAP–TV23.97, 0.817924.50, 0.671922.12, 0.697526.99, 0.829723.13, 0.693024.14, 0.7420
    PnP–FFDNet22.00, 0.766123.62, 0.724519.35, 0.613325.32, 0.792419.48, 0.541821.95, 0.6876
    PnP–TV–FastDVDNet25.63, 0.885228.36, 0.777823.80, 0.749928.79, 0.855325.36, 0.778426.39, 0.8093
    1024×1024GAP–TV24.82, 0.835325.53, 0.729624.98, 0.812826.63, 0.838825.80, 0.775925.55, 0.7985
    PnP–FFDNet23.55, 0.809823.02 0.603922.48, 0.770224.48, 0.796821.67, 0.641423.04, 0.7244
    PnP–TV–FastDVDNet26.26, 0.872928.68, 0.807626.31, 0.839929.18, 0.877328.07, 0.819427.70, 0.8434
    Table 2. Average Results of PSNR in dB (left entry in each cell) and SSIM (right entry in each cell) by Different Algorithms (Cr=20)a
    RequireH, y.
    1:  Initialize: v(0),λ0,ξ<1,k=1,K1,KMax.
    2:  while Not Converge andkKMaxdo
    3:   Update x by Eq. (7).
    4:   Update v:
    5:   ifkK1then
    6:    v(k)=DTV(x(k))
    7:   else
    8:    v=DTV(x(k))
    9:    v(k)=DFastDVDNet(v)
    Table 3. PnP–TV–FastDVDNet for HCA-SCI
    Zhihong Zhang, Chao Deng, Yang Liu, Xin Yuan, Jinli Suo, Qionghai Dai. Ten-mega-pixel snapshot compressive imaging with a hybrid coded aperture[J]. Photonics Research, 2021, 9(11): 2277
    Download Citation