Physics and data-driven alternative optimization enabled ultra-low-sampling single-pixel imaging

Yifei Zhang; Yingxin Li; Zonghao Liu; Fei Wang; Guohai Situ; Mu Ku Chen; Haoqiang Wang; Zihan Geng

doi:10.1117/1.APN.4.3.036005

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Abstract

Single-pixel imaging (SPI) enables efficient sensing in challenging conditions. However, the requirement for numerous samplings constrains its practicality. We address the challenge of high-quality SPI reconstruction at ultra-low sampling rates. We develop an alternative optimization with physics and a data-driven diffusion network (APD-Net). It features alternative optimization driven by the learned task-agnostic natural image prior and the task-specific physics prior. During the training stage, APD-Net harnesses the power of diffusion models to capture data-driven statistics of natural signals. In the inference stage, the physics prior is introduced as corrective guidance to ensure consistency between the physics imaging model and the natural image probability distribution. Through alternative optimization, APD-Net reconstructs data-efficient, high-fidelity images that are statistically and physically compliant. To accelerate reconstruction, initializing images with the inverse SPI physical model reduces the need for reconstruction inference from 100 to 30 steps. Through both numerical simulations and real prototype experiments, APD-Net achieves high-quality, full-color reconstructions of complex natural images at a low sampling rate of 1%. In addition, APD-Net’s tuning-free nature ensures robustness across various imaging setups and sampling rates. Our research offers a broadly applicable approach for various applications, including but not limited to medical imaging and industrial inspection.

Keywords

alternative optimization deep learning single-pixel imaging

Video Introduction to the Article

1 Introduction

Single-pixel imaging (SPI) is an innovative approach in computational imaging, drawing significant interest due to its unique utilization of a single-pixel detector for image capture.1^–6 As a straightforward and economical substitute for traditional imaging methods, SPI carries the potential to transform various domains, including medical imaging and industrial inspection, especially in low-light conditions and extreme wavelengths.7^–14 Despite its cost-effectiveness, hardware efficiency, and robustness against noise interference, SPI’s practical applications have been largely confined to academic settings.6 As a bucket detector lacks spatial resolution, SPI necessitates numerous samplings. For instance, reconstructing a $100 \times 100 - pixel$ image without low-sampling techniques typically demands 10,000 sampling times. Although compressive sensing provides a solution by enabling image reconstruction from fewer samples than pixels, this approach introduces an ill-posed inverse problem, leading to significant information loss and diminished image quality.5^,6^,15^–17 Reducing the sampling numbers while maintaining the imaging quality is the key challenge of SPI research.

To solve the inverse SPI problem, various reconstruction algorithms have been developed. Model-based methods seek the conformity between observed and simulated observations from the imaging model, using convex optimization.5^,18^–22 Simple, handcrafted priors such as total variation, low rank, or sparsity in the gradient domain are introduced to further ensure prediction uniqueness. However, these idealized priors often fall short of reflecting reality, leading to unsatisfactory reconstruction results. Conversely, learning-based methods offer an opportunity to utilize implicit yet more powerful image priors in a data-driven manner.23^–31 They leverage the statistical properties of vast image datasets to establish a direct mapping between observed data and the underlying images. Millions of parameters define this mapping within a deep neural network (DNN) such as convolutional neural networks or the recently emerged transformer architecture. The network is refined through end-to-end training using paired datasets. Although previous research has shown the learned prior’s superior capability in SPI reconstruction, their performance in ultra-low sampling rates is still far from satisfactory.

In 2022, Wang et al.32 proposed physics-informed deep learning (PIDL), an innovative SPI reconstruction framework that combines model-based and learning-based strategies. PIDL fuses data and physics priors during the training phase of the DNN. The weights in the neural network are adjusted based on the SPI measurement and the physical imaging model, making the DNN physics-compatible. During the inference phase, only the trained DNN, as the data-driven prior, is utilized. This PIDL approach has proven successful, requiring fewer samples and yielding more reliable SPI reconstruction results. Following this strategy, many other works have used the physics model to finetune the DNN for improved performance.33^–40 However, this paradigm still has room for improvement. The dual function of reversing the SPI modulation and restoring degraded images limits the efficiency of prior learning and reduces generality during inference. To our knowledge, few methods can restore high-quality images from less than 5% sampling, especially when the total number of measurements is fewer than 1000. Furthermore, the generality of these methods is limited. As the DNN weights are tuned to fit specific imaging setups, laborious retraining is required when there are changes in sampling rates or modulation patterns, making these methods less practical.

Sign up for Advanced Photonics Nexus TOC. Get the latest issue of Advanced Photonics Nexus delivered right to you！Sign up now

In this paper, we solve the low-sampling and generality problem by emphasizing the decoupling of learned and physics priors for SPI reconstruction. Our motivation stems from the recently emerged diffusion model,41 which generates high-quality images with a gradual and iterative process. Previous research has shown the potential of diffusion models as a data-driven image prior to enhance SPI reconstruction. Mao et al.27 developed a diffusion network for ghost imaging that generates diverse images from a single observation, whereas Song et al.42 applied a diffusion model to Fourier SPI with iterative consistency constraints. However, the current diffusion-based methods rely on end-to-end training tied to specific imaging setups, which demands extensive data preparation and limits practical performance.

Different from previous approaches, we propose to employ a pretrained diffusion model as a general image prior and combine it with the specific physics model for high-quality SPI reconstruction. Conceptually illustrated in Fig. 1, a pretrained diffusion model acquires the common probabilistic distribution across extensive training datasets, serving as an implicit deep image prior for image reconstruction. During inference, it generates new images by step-by-step sampling from the learned image distribution. By introducing the physics constraint of SPI as corrective guidance throughout the inference, the uniqueness of the reconstruction can be significantly enhanced, and the target image can be reasonably reconstructed.

Figure 1.Physics imaging model meets data-driven diffusion prior. (a) Diffusion models are novel generative models that produce images progressively in $T$ steps, following the gradient of learned image distributions. Although they produce high-quality images, they are often stuck in local optima and deviate from the true image observed by SPI. (b) Our APD-Net uses the forward model as a measurement consistency constraint to iteratively guide the generation process. The alternative supervision from dual-priors in harmony can significantly improve the low-sampling performance of SPI reconstruction.

In light of this, we introduce the alternating optimization with physics and data-driven diffusion network (APD-Net) framework and exemplifies this framework in ultra-low-sampling SPI. APD-Net treats the interplay between physical constraints and learned diffusion priors as a co-optimization problem. Illustrated in Fig. 2, the physics imaging prior corresponds to the SPI imaging process defined by the modulation patterns for observation. As for the learned diffusion prior, numerous high-quality images are utilized for unsupervised prior learning of a diffusion network, capturing the task-agnostic statistical models of natural images. During the inference stage, iteratively, the trained diffusion model adjusts the image toward the statistical regression of massive images. Then, the physics prior is injected via an additional physics-guided projection step to enforce the data consistency of the prediction. These alternative dual-prior updates reconcile the discrepancy between the general priors for natural images and the distinct attributes of a specific observation. As a result, APD-Net achieves reconstructions that are consistent with statistics and uphold physical preciseness and interpretability.

Figure 2.Overall framework of APD-Net. APD-Net integrates two distinct image priors for SPI reconstruction, namely, (a) physics prior and (b) data-driven diffusion prior. The physics imaging prior, defined by specific imaging setups, can be explicitly represented with matrix operations. The diffusion prior, on the other hand, is free from SPI setups and is implicitly learned from numerous high-quality images. During (c) the SPI reconstruction, the intermediate variable $x_{0 | t}$ from the diffusion model is iteratively refined with a measurement consistency projection step to inject the physics imaging prior into the reconstruction process for ultra-low-sampling SPI reconstruction.

Apart from the advancement in imaging quality, APD-Net brings pronounced improvements in its generality. Previous PIDL approaches typically embed physics constraints into the network training process by updating model parameters through backpropagation. APD-Net, however, separates image prior learning from physics-based constraints. During inference, the physics model is applied dynamically as a plug-in constraint, rather than as a component of network training. This means that regardless of the measurement conditions (e.g., different sampling rates or illumination conditions), the learned prior remains unchanged, and the reconstruction process adapts solely by incorporating the corresponding physics model.

For improved realism of colors at extremely low sampling rates, we propose a unique expanded Bayer encoding to enhance the performance of APD-Net. Furthermore, the physics model is employed to create a rough estimation of the target image as a reasonable starting point for reconstruction, which relieves the computation load of diffusion reconstruction. The number of iterations is reduced to one-third of the original diffusion model without losing reconstruction quality. Extensive experiments on both simulated data and a real SPI prototype reveal that APD-Net outperforms both conventional and other learning-based methods by a large margin both semantically and quantitatively, with an unprecedented sampling rate as low as 1%.

2 Mathematical Model of SPI

For a grayscale SPI system, the underlying original image $X \in R^{n_{x} \times n_{y}}$ is modulated by predefined masks ${M_{i}}_{i = 1}^{N} \in R^{n_{x} \times n_{y}}$ . Then, by integrating the modulated pixel value across the space, the single-pixel detector captures a series of intensity values ${y_{i}}_{i = 1}^{N} \in R$ . The process for one measurement can be expressed as $y_{i} = \sum_{n_{x}} \sum_{n_{y}} (M_{i} ⊙ X + z_{i}),$ (1)where $⊙$ denotes the Hadamard (element-wise) multiplication and $Z_{i}$ denotes the measurement noise. Alternatively, the sampling process can be represented by a vectorized formulation. We vectorize $y = [y_{1}, \dots, y_{N}]^{⊺} \in R^{N}$ , $z = [z_{1}, \dots, z_{N}]^{⊺} \in R^{N}$ , and $x = vec (X)^{⊺} \in R^{n_{x} n_{y}}$ . Then, the sampling process with sensing matrix $A = [vec (M_{1}); \dots; vec (M_{N})] \in R^{N \times n_{x} n_{y}}$ is defined as $y = Ax + z .$ (2)

For color SPI imaging, a Bayer filter is incorporated into the modulation patterns to encode color information into grayscale SPI observations.43^,44 In a Bayer filter, four neighboring pixels in red (R), green (G), and blue (B) form a full-color unit. The missing colors of each pixel can be restored using a demosaicing algorithm, which interpolates the missing color from nearby pixels. However, this approach often fails in low-sampling scenarios. When the sampling ratio is low, the reliable image resolution is limited, and the minimal modulation pattern is often larger than 1 pixel, which damages the pixel-wise information crucial for color restoration. To address this, we propose extending the Bayer coding pattern by spatially expanding the patterns with a scaling factor $γ$ .

For example, given an image $X \in R^{n_{x} \times n_{y}}$ when $γ = 2$ , the expanded Bayer pattern becomes as in Eq. (3) $M_{Bayer} = [\begin{matrix} R & G & R & G & \dots \\ G & B & G & B & \dots \\ R & G & R & G & \dots \\ G & B & G & B & \dots \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \end{matrix}], M_{eBayer} = [\begin{matrix} R & R & G & G & \dots \\ R & R & G & G & \dots \\ G & G & B & B & \dots \\ G & G & B & B & \dots \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \end{matrix}] .$ (3)

$M_{eBayer} \in R^{n_{x} \times n_{y}}$ offers the spectral modulation of the scene. With a dot-production operation with the full-color target scene, $M_{eBayer}$ turns the RGB scene into a raw Bayer image. Physically, this modulation can be achieved by assigning spatially different colors to the active structural illuminations in SPI. Compared with the conventional Bayer pattern, the expanded pattern focuses more on the larger-scale color distribution and is thus more robust to the corruption of pixel-wise resolution of ill-posed extremely low sampling. The forward model for color SPI becomes Eqs. (4) and (5): $y_{i} = \sum_{n_{x}} \sum_{n_{y}} (M_{i} ⊙ M_{eBayer} ⊙ X + z_{i}),$ (4) $y = A_{SPI} A_{eBayer} x + z .$ (5)

3 Proposed APD-Net

The task of low-sampling SPI is to reverse the forward measuring process, i.e., the reconstruction of the underlying image $x$ from partial single-pixel observations $y$ . Such a reverse process is determined by two factors. The first is the specific physics imaging model, which can be explicitly defined, designed, and calibrated. Different imaging setups, modulation patterns, or sampling rates establish different correspondences between $y$ and $x$ . Thus, the reconstruction of $x$ from $y$ should also be specific and controlled by the physics model. The second is the prior of general natural image distribution, which can be learned in a data-driven manner. Given the ill-posedness of the low-sampling SPI, there is no one-to-one mapping between $y$ and $x$ . Considering the distinct characteristics of the physics model and learned image distributions, APD-Net divides the role of these two priors and conquers them separately. A diffusion model is employed as a general image-centric prior to generate natural and realistic images, whereas the physics model guarantees the conformity of the predicted images with the SPI observations $y$ . With an alternative co-optimization process, APD-Net effectively collaborates both learning-based diffusion and physics constraints for SPI reconstruction.

3.1 Diffusion Models as General Image-Centric Priors

The diffusion model is a type of generative model that has gained significant attention in the field of computer vision. In APD-Net, we employ a pretrained diffusion network as a task-agnostic image-centric prior, representing the general distribution of natural images. Following DDIM,45 the forward diffusion process is modeled as adding noise to the image over a series of steps. Given the original data $x_{0}$ , the noisy data $x_{t}$ at step $t$ can be obtained explicitly, given as $x_{t} = \sqrt{{\bar{α}}_{t}} x_{0} + \sqrt{1 - {\bar{α}}_{t}} ϵ,$ (6)where ${\bar{α}}_{t}$ is a coefficient controlling the amount of noise and $ϵ \in R^{n_{x} \times n_{y}}$ is the additive noise sampled from a normal distribution $N (0, I)$ . Notably, ${\bar{α}}_{t}$ is designed such that a larger $t$ corresponds to stronger noise and produces a more severely degraded image. By adding noises to the original image, the visual details and textures of the image are gradually erased. To reconstruct the images, a denoising process for inverse diffusion is applied to the noisy data to restore the visual details iteratively $x_{t - 1} = \sqrt{{\bar{α}}_{t - 1}} x_{0 | t} + \sqrt{1 - {\bar{α}}_{t - 1}} ϵ_{θ} (x_{t}, t),$ (7) $x_{0 | t} = \frac{1}{\sqrt{{\bar{α}}_{t}}} [x_{t} - \sqrt{1 - {\bar{α}}_{t}} ϵ_{θ} (x_{t}, t)],$ (8)where $ϵ_{θ} (x_{t}, t)$ is the noise predicted by a deep denoising network and $x_{0 | t}$ can be interpreted as a clear image estimated at time step $t$ . At each step, Eq. (7) first generates a noisy image by adding noises to $x_{0 | t}$ with a decreasing noise intensity, controlled by ${\bar{α}}_{t}$ . Then, Eq. (8) denoises this image with the trained diffusion model. As $t$ decreases, $x_{t}$ gradually becomes clearer and sharper and eventually converges to $x_{0}$ , the target output image of the diffusion model.

The diffusion model has drawn much interest because of their strong image-generation capability. As the probability distribution of natural images can be fully captured by a well-trained diffusion network, it is capable of generating new images that satisfy the natural image distribution. When a two-dimensional (2D) image is given, the iterative noising and denoising cycle in the diffusion process enhances its visual quality with progressive refinement, effectively adding details, reducing noise, and fixing distortions. This process is task-agnostic and irrelevant to specific imaging systems. Taking advantage of this, APD-Net uses an off-the-shelf pretrained diffusion model as a data-driven learned image prior.

3.2 Plugging in Physics Constraints as Task-Specific Guidance for SPI

3.2.1 Data manifold projection as physics guidance

SPI reconstruction involves selecting the most accurate solution from a multitude of potential predictions. With generative diffusion models, the solution space can be greatly narrowed with the implicit prior of natural image distributions. A plain diffusion model begins with random noise and gradually converges to the data distribution’s peak, producing an image as the maximum likelihood estimation. Due to the randomness in the initialization and the nonconvex characteristic of image distributions, the prediction often falls in the local minima and does not represent the specific image corresponding to the observation. Each observation is unique and cannot be represented by the overall statistics of natural images alone.

To increase the uniqueness of the SPI reconstructions, APD-Net incorporates the physics imaging model as a second prior during the image reconstruction process. A model-based consistency correction step is proposed. This correction step is applied alternatively with diffusion steps, preventing the diffusion model from generating images deviating too much away from the SPI observations. This helps disambiguation and ensures more accurate SPI reconstruction. Specifically, we conduct linear projection to adjust the intermediate estimation of diffusion generation, expressed as ${\hat{x}}_{0 | t} = x_{0 | t} - β A^{†} ({Ax}_{0 | t} - y),$ (9)where ${\hat{x}}_{0 | t}$ is the corrected version of $x_{0 | t}$ . $A^{†}$ is the pseudo-inverse of the forward imaging model, which can be obtained from the Moore–Penrose inverse of the forward operator. The correction term $A^{†} ({Ax}_{0 | t} - y)$ is a vector pointing toward the data manifold determined by the observation $y$ . $β$ is the hyper-parameter to balance between the physics prior and the learned prior. Intuitively, ${Ax}_{0 | t} - y$ extracts information from the observation $y$ by calculating its one-dimensional error relative to the current prediction $x_{0 | t}$ . Then, $A^{†}$ projects this error into the 2D image space. By subtracting the error from $x_{0 | t}$ , a physics-compliant prediction ${\hat{x}}_{0 | t}$ is obtained.

With Eq. (9), we assure data consistency and progressively guide the image generation in a controlled manner, as visualized in Fig. 3. As the physics guidance and the diffusion model are alternatively utilized in the SPI reconstruction process, APD-Net offers superior generalization capability. When changes in modulation patterns or sampling rates occur, only the physics-guidance step needs to be adjusted to reflect the new imaging model, whereas the diffusion model remains unchanged. This eliminates the need for training a new data-driven diffusion model for each specific setup.

Figure 3.Visualization of the model-based physics-guidance process. (a) Target scene. (b) Intermediate image after the diffusion denoising step, which at this point may appear overly blurry or smoothed. (c) Intermediate image after applying the model-based consistency correction step, where the image is refined to reflect more accurate details, although noise and artifacts may remain. Such noises and artifacts can be effectively removed by the following diffusion-denoising steps. (d) Influence from the physics-guidance step, which extracts information from the SPI observation and injects it into the SPI reconstruction process.

3.2.2 Physics-based initialization for reconstruction acceleration

Currently, one of the challenges that hinders the application of diffusion models for image restoration is its need for numerous iterations during inference. Although previous works have greatly accelerated the sampling process, most of them started reconstruction from scratch,45 missing the opportunity to utilize the information from the observation itself. Previous research has shown that early iterations are responsible for the overall shape and color of the scene, whereas later iterations focus more on details. Motivated by the fact that the pseudo-inverse of the SPI observation has already contained much structural and semantic information about the scene,35 we accelerate the diffusion reconstruction by skipping some of the early iterations and using the pseudo-inverse image as an initial estimation of the scene. The detailed process of APD-Net is shown in Algorithm 1.


$x_{N} \sim N (A^{†} y, \sqrt{1 - {\bar{α}}_{N - 1}} I)$ ⊳ Physics-based initialization
while $t = T, \dots, 1$ do
$x_{0 \| t} = \frac{1}{\sqrt{{\bar{α}}_{t}}} [x_{t} - \sqrt{1 - {\bar{α}}_{t}} ϵ_{θ} (x_{t}, t)]$ ⊳ Diffusion denoising as Eq. (8)
${\hat{x}}_{0 \| t} = x_{0 \| t} - β A^{†} ({Ax}_{0 \| t} - y)$ ⊳ Manifold projection as Eq. (9)
$x_{t - 1} = \sqrt{{\bar{α}}_{t - 1}} {\hat{x}}_{0 \| t} + \sqrt{1 - {\bar{α}}_{t - 1}} ϵ_{θ} (x_{t}, t)$ ⊳ Diffusion adding noise as Eq. (7)
end while
return $x_{0}$

Table 1. APD-Net algorithm.

View all Tables

4 Numerical Simulations

4.1 Analysis of Low-Sampling Capability

We conduct comparative analyses against four representative methods: gradient descent (GD),18 total variation (TV),46 PIDL,35 and an enhanced version of PIDL featuring test-time model-driven fine-tuning. Different from the original implementation of PIDL, where the coding patterns are co-optimized with the network, we manually specify and fix the pattern for training. All methods are tested using Cake-cutting Hadamard masks47 combined with the proposed extended Bayer color filter. The DNNs in learning-based methods are trained on the ImageNet dataset,48 which is one of the most widely used and influential datasets. It contains millions of images representative of a broad range of objects, including animals, plants, everyday objects, and scenes. Set5 and CelebA datasets are chosen as the testing sets. There is no intersection between the training and the testing sets. For color restoration in GD and TV, Malvar interpolation49 was used as a postprocessing step, whereas the output channel of PIDL was set to three for simultaneous SPI reconstruction and demosaicing. For each sampling rate, five distinct PIDL models were independently retrained. In all experiments, ADP-Net iterates 30 times to produce one reconstruction. The results, detailed in Table 1 and visualized in Fig. 4(a), demonstrate that APD-Net significantly outperforms the other methods. Notably, APD-Net is capable of faithfully reconstructing full-color $256 \times 256$ natural images even with a total number of measurements as low as 655 (1%). Furthermore, the generative prior learned from a vast collection of natural images ensures that the outputs are visually pleasing and free of obvious artifacts.

Table Infomation Is Not Enable

Figure 4.Visual analysis of low-sampling reconstruction and adaptability. (a) Performance of SPI on different sampling rates. (b) Performance of SPI on different modulation patterns. We note that the performance of the previous physics-informed method heavily relies on the accuracy of the pseudo-inverse $A^{†} y$ . By contrast, our proposed APD-Net has better performance in all scenarios. Images are taken from Set5 and CelebA datasets.

4.2 Analysis of Generality

We further evaluate APD-Net’s adaptability to various modulation patterns, reflecting different application scenarios and tasks in SPI. We assess the reconstruction performance using four distinct modulation strategies: random, Cake-cutting Hadamard, randomly ordered Hadamard, and Walsh Hadamard, all at a 5% sampling rate.50 The PIDL method is first trained with the Cake-cutting Hadamard patterns and then finetuned on each specific test data. To better understand the advantage of continuous physics prior injection in APD-Net, we also evaluate the performance of simple pseudo-inverse estimation of SPI, which serves as a once-for-all physics-informed initialization for the PIDL method. The results, depicted in Fig. 4(b), show that TV, the model-based iterative method, successfully reconstructs a blurry or defective image with every sampling pattern. Finetuned PIDL, the physics-enhanced deep learning method performs better in some cases, but it can barely produce any meaningful content when the direct pseudo-inverse does not provide any informative visual clue. On the other hand, APD-Net consistently delivers robust performance across all tested strategies. This superior flexibility stems from two factors. First, APD-Net’s training phase is independent of any specific task, which means that it does not develop a bias toward particular observation models and remains unbiased during inference. Second, APD-Net does not just rely on the physics model for initial guidance; it integrates the model continuously throughout the image-generation process. This constant interaction ensures that the network adapts dynamically, enhancing its flexibility and overall performance in diverse SPI scenarios.

4.3 Analysis of Accelerated Reconstruction

APD-Net leverages physics-informed initialization to expedite the reconstruction process. The reason is illustrated in Figs. 5(a) and 5(b). The error bars in the figure indicate the standard deviation. Each bar is calculated based on 10 images. In the experiment, APD-Net employs the standard diffusion process, starting with pure Gaussian noise and iteratively enhancing the image under physics guidance. The peak signal-to-noise ratio (PSNR) increases steadily over 100 steps, but initial reconstructions are coarse and noisy, only aligning with direct pseudo-inverse quality around the 30th step. Observing this, we initiate the process with a pseudo-inverse estimate instead of noise and eliminate early iterations. The results, as depicted in Figs. 5(c) and 5(d), show that this method halves the necessary diffusion steps without sacrificing image quality. The proposed APD-Net achieves high-quality reconstructions with significantly reduced computational requirements. With the physics-informed initialization, APD-Net requires only 30 iterations and takes $\sim 5 s$ on an RTX-3090 GPU, whereas traditional methods such as TV regularization need 1000 iterations, taking $\sim 10 s$ on an I9-9900 CPU. Another widely used method, PIDL (finetuned), which incorporates physics through backpropagation and gradient updates, requires 2000 iterations, taking around 250 s on an RTX-3090 GPU. These results highlight APD-Net’s superior efficiency, requiring fewer iterations and achieving over 50 times faster computation compared with traditional methods while maintaining or even improving reconstruction quality. Future hardware and software optimizations, such as parallelization and network quantization, are expected to further reduce computation times, making real-time SPI reconstruction more feasible. We will leave this to our future work.

Figure 5.Convergence analysis and reconstruction acceleration. (a) Intermediate predictions during the progressive 100-step APD-Net generation process. (b) Plot for PSNR of intermediate predictions. (c) Visual analysis of the final prediction of APD-Net with different skipped steps. (d) Plot for PSNR of predictions with different skipped steps. Images are taken from the CelebA dataset.

4.4 Analysis of Noise Robustness

In assessing the robustness of APD-Net against noise, a comparative study was undertaken alongside TV and PIDL methods. The results, depicted in Fig. 6, consider a 5% sampling rate and varied noise levels, defined as $γ = α / N$ , where $α$ indicates the amplitude of Gaussian noise in observation $y$ , and $N$ is the total pixel count.51 Each error bar is calculated based on five images in the test set. The findings indicate a clear performance hierarchy: TV, as a conventional method, struggles significantly across all noise intensities; PIDL, although markedly better due to its deep learning prior, is not impervious to high noise levels, exhibiting considerable artifacts. By contrast, APD-Net excels by effectively balancing the noised physics and learned data constraints. Although noise influences the accuracy of the physics-guided update step, the subsequent data-driven update mitigates this issue, leveraging the diffusion model as a robust image prior. Therefore, APD-Net delivers superior clarity and detail in significantly noisy and undersampled conditions.

Figure 6.Analysis of noise robustness. (a) Visual comparison with model-based TV and deep learning-based PIDL. (b) Quantitative performance at different noise levels. Images are taken from the Set5 dataset.

Figure 7.Optical SPI setup for real-world validation. A projector projects extended Bayer-modulated structural light to the target object. A photodetector gathers the light signal reflected from the object and transforms it into an electric signal, which is read out by an oscilloscope.

5 Experimental Demonstrations

We demonstrate APD-Net using a real-world active SPI system, as schematically shown in Fig. 7. In the experiments, the object is illuminated by binary structured light with Hadamard modulation from a laser DLP projector (N1 Pro, JMGO, Shenzhen, China). The resolution of the modulation pattern is $256 \times 256$ . To obtain color information, the pixel-wise color of the projected structured light is set to red, green, or blue, depending on its position in the extended Bayer filter matrix. The reflected light is gathered by a single-pixel detector (DET100A2, Thorlabs, Newton, New Jersey, United States) and the light intensity is read out by an oscilloscope (HDO204, RIGOL, Suzhou, China). The projector operates at a frame rate of 30 Hz, and it takes $\sim 20 s$ to capture an image at 1% sampling. The experiments are conducted on various target objects, ranging from simple logos and printed 2D images to real 3D dolls, to verify generalization capability in different natural scenarios. For each scene, a reference image is also provided using a conventional CMOS camera.

Figure 8.Visualization for real-world SPI reconstruction with ultra-low sampling and different modulation patterns. (a) Different sampling rates. (b) Different modulation patterns.

Figure 8(a) visualizes the results for ultra-low-sampling SPI reconstruction. At sampling rates as low as 1%, the available information is very limited. The model-based TV method produces a rough and blurry estimation of the scene. The learning-based PIDL method enhances the reconstruction by generating more textures and details but suffers from severe blocking artifacts. By contrast, APD-net incorporates the characteristics of natural signals and the matrix operations of SPI as the prior information for image reconstruction, striking a good balance between image sharpness and overall quality. In real-life experiments, the proposed APD-Net outperforms the learning-based PIDL by a larger margin than in numerical simulations. The superiority in performance verifies APD-Net’s noise robustness and the ability to generalize to new environments. Unlike PIDL, which amplifies the measurement noises and generates visually unpleasant artifacts, APD-Net’s diffusion denoising network effectively corrects the reconstruction error from real-world disturbed inputs. In addition, the task-free learning process of the data-driven diffusion prior effectively captures the probabilistic distribution of natural scenes, allowing it to adapt well from simulation to real experiments.

Figure 8(b) presents experimental results using different modulation patterns. We select two different orders of Hadamard matrix at a 4% sampling rate. The network for PIDL is trained on the paired observation-ground truth dataset using the Cake-cutting Hadamard pattern, whereas our APD-Net is trained only with high-quality images from ImageNet. Although the PIDL method successfully restores most of the details with the original Cake-cutting order, it struggles to generalize to the unseen Walsh order, even with an additional self-supervised fine-tuning step. The reconstruction becomes blurry with lower contrast. By contrast, the proposed APD-Net captures the task-agnostic image prior that is invariant to specific imaging models. As a result, it can maintain high performance with unseen imaging models.

Figure 9 demonstrates the effectiveness of the reconstruction speed-up scheme. For the same object (a doll in front of a resolution chart, at 5% sampling), the reconstruction results are nearly identical, with a PSNR exceeding 45 dB. The difference map shows no structural difference between the two reconstructions, confirming the effectiveness of the acceleration method. We also analyze the imaging resolution of the APD-Net. The results indicate that APD-Net outperforms both model-based and learning-based methods. At extremely low sampling rates with the existence of measurement noise, the simple pseudo-inverse method suffers from aliasing and fails to distinguish close lines, whereas the learning-based PIDL method introduces additional distortions. APD-Net avoids the drawbacks of both methods. The data-driven learned prior aids in de-aliasing from limited observations, and the guidance of the forward imaging model further prevents the neural network from generating physically incompliant reconstructions. Therefore, APD-Net achieves higher resolution than previous methods.

Figure 9.Visualization for the performance of accelerated reconstruction.

Figure 10 demonstrates the effectiveness of the extended Bayer filter for low-sampling color reconstruction. The image is taken at just a 1% sampling rate. The Bayer filter is spatially extended at a factor of 4. At such a low sampling rate, the pixel-wise structural details are susceptible to measurement noises and errors in image reconstruction, leading to inaccurate color restoration. However, the proposed extended Bayer filter averages the amplitude fluctuation of nearby pixels and makes the reconstructed color closer to reality.

Figure 10.Visualization for the effectiveness of the extended Bayer color modulation.

The above experiments validate the real-world superiority of APD-Net in high-quality ultra-low-sampling reconstruction, generalization to new patterns, and accelerated reconstruction. Although these experiments are conducted in the visible band, APD-Net can be easily applied to other extreme wavelengths or low-light conditions, where the dataset may be very small or even unavailable. First, image characteristics such as object shape and texture are common priors that can be leveraged across different imaging modalities for image reconstruction. For instance, within the APD-Net framework, we can incorporate the data-driven RGB image prior to capture shape information and combine it with the specific physics model for multispectral imaging to restore color information. These two priors are alternately optimized to achieve high-quality multispectral image restoration. Second, it is possible to use the image prior collected from one source domain to boost the performance in another domain.52 For example, a researcher may fully sample one or two different scenes in the low-light scenarios and use these images to finetune the generative diffusion model pretrained on the ImageNet dataset. The prior knowledge of natural image distribution can be transferred to the new modality. In this way, APD-Net can be effectively applied in various SPI scenarios with minimal costs.

Moreover, the proposed APD-Net could inspire advancements in other computational imaging problems, such as diffuser imaging,53 holographic imaging,54 and snapshot compressive sensing.55 Like SPI, these problems involve the matrix modulation of high-dimensional data, and their image reconstruction is an ill-posed inverse problem. The explicit forward imaging model serves as a physics prior, whereas a generative diffusion network can encode prior knowledge of their probabilistic distributions. By adopting APD-Net’s approach of alternative optimization between these two priors, high-quality predictions for specific imaging tasks can be achieved. Thus, APD-Net may have a broader impact beyond the scope of SPI.

6 Conclusion

In this paper, we introduce an ultra-low-sampling SPI reconstruction framework, named APD-Net. APD-Net leverages both general data-driven diffusion priors and the task-specific physics imaging model for enhanced SPI reconstruction. These two distinct priors are harmonized with a co-optimization strategy during diffusion model inference, significantly diminishing the reliance on extensive sampling. APD-Net boasts remarkable flexibility, enabling a single trained network to adapt to various modulation patterns and sampling rates seamlessly. A key advantage of APD-Net is its accelerated reconstruction speed, surpassing other physics-informed methods due to the elimination of fine-tuning on measurements. Enhanced by physics-based initialization, APD-Net requires merely a third of the iteration steps needed by the traditional diffusion counterpart. Through numerical simulations and real prototype experiments, APD-Net achieves high-quality, full-color reconstructions of complex natural images at a low sampling rate of 1%, with an improvement of over 2 dB in PSNR. In addition, APD-Net demonstrates superior computational efficiency, reducing the reconstruction time by a factor of 50 compared with the widely used PIDL method.

We expect APD-Net to represent significant advancement toward the practical application of SPI, especially in fields such as medical imaging and industrial inspection. APD-Net can be beneficial for imaging tissues with high precision and reducing patient exposure to prolonged imaging sessions. In industrial inspection, SPI equipped with APD-Net can be used to inspect materials and structures that are challenging to image with conventional cameras due to their size, shape, or material properties. The data-driven diffusion prior may allow for accurate reconstruction and identification of defects such as cracks, voids, or inclusions with high precision and efficiency. This can improve the reliability and safety of industrial components and structures.

Biographies of the authors are not available.

References

[1] M. P. Edgar, G. M. Gibson, M. J. Padgett. Principles and prospects for single-pixel imaging. Nat. Photonics, 13, 13-20(2019). https://doi.org/10.1038/s41566-018-0300-7

[2] Y. Wang et al. Mid-infrared single-pixel imaging at the single-photon level. Nat. Commun., 14, 1073(2023). https://doi.org/10.1038/s41467-023-36815-3

[3] B. I. Erkmen, J. H. Shapiro. Ghost imaging: from quantum to classical to computational. Adv. Opt. Photonics, 2, 405-450(2010). https://doi.org/10.1364/AOP.2.000405

[4] T. B. Pittman et al. Optical imaging by means of two-photon quantum entanglement. Phys. Rev. A, 52, R3429(1995). https://doi.org/10.1103/PhysRevA.52.R3429

[5] M. F. Duarte et al. Single-pixel imaging via compressive sampling. IEEE Signal Process Mag., 25, 83-91(2008). https://doi.org/10.1109/MSP.2007.914730

[6] G. M. Gibson, S. D. Johnson, M. J. Padgett. Single-pixel imaging 12 years on: a review. Opt. Express, 28, 28190-28208(2020). https://doi.org/10.1364/OE.403195

[7] R. I. Stantchev et al. Real-time terahertz imaging with a single-pixel detector. Nat. Commun., 11, 2535(2020). https://doi.org/10.1038/s41467-020-16370-x

[8] Y. Guo, B. Li, X. Yin. Dual-compressed photoacoustic single-pixel imaging. Natl. Sci. Rev., 10, nwac058(2023). https://doi.org/10.1093/nsr/nwac058

[9] H. Liu, L. Bian, J. Zhang. Image-free single-pixel segmentation. Opt. Laser Technol., 157, 108600(2023). https://doi.org/10.1016/j.optlastec.2022.108600

[10] A. Tsoy et al. Image-free single-pixel keypoint detection for privacy preserving human pose estimation. Opt. Lett., 49, 546-549(2024). https://doi.org/10.1364/OL.514213