Single-pixel imaging (SPI) enables efficient sensing in challenging conditions. However, the requirement for numerous samplings constrains its practicality. We address the challenge of high-quality SPI reconstruction at ultra-low sampling rates. We develop an alternative optimization with physics and a data-driven diffusion network (APD-Net). It features alternative optimization driven by the learned task-agnostic natural image prior and the task-specific physics prior. During the training stage, APD-Net harnesses the power of diffusion models to capture data-driven statistics of natural signals. In the inference stage, the physics prior is introduced as corrective guidance to ensure consistency between the physics imaging model and the natural image probability distribution. Through alternative optimization, APD-Net reconstructs data-efficient, high-fidelity images that are statistically and physically compliant. To accelerate reconstruction, initializing images with the inverse SPI physical model reduces the need for reconstruction inference from 100 to 30 steps. Through both numerical simulations and real prototype experiments, APD-Net achieves high-quality, full-color reconstructions of complex natural images at a low sampling rate of 1%. In addition, APD-Net’s tuning-free nature ensures robustness across various imaging setups and sampling rates. Our research offers a broadly applicable approach for various applications, including but not limited to medical imaging and industrial inspection.

- Advanced Photonics Nexus
- Vol. 4, Issue 3, 036005 (2025)
Abstract
Video Introduction to the Article
1 Introduction
Single-pixel imaging (SPI) is an innovative approach in computational imaging, drawing significant interest due to its unique utilization of a single-pixel detector for image capture.1
To solve the inverse SPI problem, various reconstruction algorithms have been developed. Model-based methods seek the conformity between observed and simulated observations from the imaging model, using convex optimization.5,18
In 2022, Wang et al.32 proposed physics-informed deep learning (PIDL), an innovative SPI reconstruction framework that combines model-based and learning-based strategies. PIDL fuses data and physics priors during the training phase of the DNN. The weights in the neural network are adjusted based on the SPI measurement and the physical imaging model, making the DNN physics-compatible. During the inference phase, only the trained DNN, as the data-driven prior, is utilized. This PIDL approach has proven successful, requiring fewer samples and yielding more reliable SPI reconstruction results. Following this strategy, many other works have used the physics model to finetune the DNN for improved performance.33
Sign up for Advanced Photonics Nexus TOC. Get the latest issue of Advanced Photonics Nexus delivered right to you!Sign up now
In this paper, we solve the low-sampling and generality problem by emphasizing the decoupling of learned and physics priors for SPI reconstruction. Our motivation stems from the recently emerged diffusion model,41 which generates high-quality images with a gradual and iterative process. Previous research has shown the potential of diffusion models as a data-driven image prior to enhance SPI reconstruction. Mao et al.27 developed a diffusion network for ghost imaging that generates diverse images from a single observation, whereas Song et al.42 applied a diffusion model to Fourier SPI with iterative consistency constraints. However, the current diffusion-based methods rely on end-to-end training tied to specific imaging setups, which demands extensive data preparation and limits practical performance.
Different from previous approaches, we propose to employ a pretrained diffusion model as a general image prior and combine it with the specific physics model for high-quality SPI reconstruction. Conceptually illustrated in Fig. 1, a pretrained diffusion model acquires the common probabilistic distribution across extensive training datasets, serving as an implicit deep image prior for image reconstruction. During inference, it generates new images by step-by-step sampling from the learned image distribution. By introducing the physics constraint of SPI as corrective guidance throughout the inference, the uniqueness of the reconstruction can be significantly enhanced, and the target image can be reasonably reconstructed.
Figure 1.Physics imaging model meets data-driven diffusion prior. (a) Diffusion models are novel generative models that produce images progressively in
In light of this, we introduce the alternating optimization with physics and data-driven diffusion network (APD-Net) framework and exemplifies this framework in ultra-low-sampling SPI. APD-Net treats the interplay between physical constraints and learned diffusion priors as a co-optimization problem. Illustrated in Fig. 2, the physics imaging prior corresponds to the SPI imaging process defined by the modulation patterns for observation. As for the learned diffusion prior, numerous high-quality images are utilized for unsupervised prior learning of a diffusion network, capturing the task-agnostic statistical models of natural images. During the inference stage, iteratively, the trained diffusion model adjusts the image toward the statistical regression of massive images. Then, the physics prior is injected via an additional physics-guided projection step to enforce the data consistency of the prediction. These alternative dual-prior updates reconcile the discrepancy between the general priors for natural images and the distinct attributes of a specific observation. As a result, APD-Net achieves reconstructions that are consistent with statistics and uphold physical preciseness and interpretability.
Figure 2.Overall framework of APD-Net. APD-Net integrates two distinct image priors for SPI reconstruction, namely, (a) physics prior and (b) data-driven diffusion prior. The physics imaging prior, defined by specific imaging setups, can be explicitly represented with matrix operations. The diffusion prior, on the other hand, is free from SPI setups and is implicitly learned from numerous high-quality images. During (c) the SPI reconstruction, the intermediate variable
Apart from the advancement in imaging quality, APD-Net brings pronounced improvements in its generality. Previous PIDL approaches typically embed physics constraints into the network training process by updating model parameters through backpropagation. APD-Net, however, separates image prior learning from physics-based constraints. During inference, the physics model is applied dynamically as a plug-in constraint, rather than as a component of network training. This means that regardless of the measurement conditions (e.g., different sampling rates or illumination conditions), the learned prior remains unchanged, and the reconstruction process adapts solely by incorporating the corresponding physics model.
For improved realism of colors at extremely low sampling rates, we propose a unique expanded Bayer encoding to enhance the performance of APD-Net. Furthermore, the physics model is employed to create a rough estimation of the target image as a reasonable starting point for reconstruction, which relieves the computation load of diffusion reconstruction. The number of iterations is reduced to one-third of the original diffusion model without losing reconstruction quality. Extensive experiments on both simulated data and a real SPI prototype reveal that APD-Net outperforms both conventional and other learning-based methods by a large margin both semantically and quantitatively, with an unprecedented sampling rate as low as 1%.
2 Mathematical Model of SPI
For a grayscale SPI system, the underlying original image
For color SPI imaging, a Bayer filter is incorporated into the modulation patterns to encode color information into grayscale SPI observations.43,44 In a Bayer filter, four neighboring pixels in red (R), green (G), and blue (B) form a full-color unit. The missing colors of each pixel can be restored using a demosaicing algorithm, which interpolates the missing color from nearby pixels. However, this approach often fails in low-sampling scenarios. When the sampling ratio is low, the reliable image resolution is limited, and the minimal modulation pattern is often larger than 1 pixel, which damages the pixel-wise information crucial for color restoration. To address this, we propose extending the Bayer coding pattern by spatially expanding the patterns with a scaling factor
For example, given an image
3 Proposed APD-Net
The task of low-sampling SPI is to reverse the forward measuring process, i.e., the reconstruction of the underlying image
3.1 Diffusion Models as General Image-Centric Priors
The diffusion model is a type of generative model that has gained significant attention in the field of computer vision. In APD-Net, we employ a pretrained diffusion network as a task-agnostic image-centric prior, representing the general distribution of natural images. Following DDIM,45 the forward diffusion process is modeled as adding noise to the image over a series of steps. Given the original data
The diffusion model has drawn much interest because of their strong image-generation capability. As the probability distribution of natural images can be fully captured by a well-trained diffusion network, it is capable of generating new images that satisfy the natural image distribution. When a two-dimensional (2D) image is given, the iterative noising and denoising cycle in the diffusion process enhances its visual quality with progressive refinement, effectively adding details, reducing noise, and fixing distortions. This process is task-agnostic and irrelevant to specific imaging systems. Taking advantage of this, APD-Net uses an off-the-shelf pretrained diffusion model as a data-driven learned image prior.
3.2 Plugging in Physics Constraints as Task-Specific Guidance for SPI
3.2.1 Data manifold projection as physics guidance
SPI reconstruction involves selecting the most accurate solution from a multitude of potential predictions. With generative diffusion models, the solution space can be greatly narrowed with the implicit prior of natural image distributions. A plain diffusion model begins with random noise and gradually converges to the data distribution’s peak, producing an image as the maximum likelihood estimation. Due to the randomness in the initialization and the nonconvex characteristic of image distributions, the prediction often falls in the local minima and does not represent the specific image corresponding to the observation. Each observation is unique and cannot be represented by the overall statistics of natural images alone.
To increase the uniqueness of the SPI reconstructions, APD-Net incorporates the physics imaging model as a second prior during the image reconstruction process. A model-based consistency correction step is proposed. This correction step is applied alternatively with diffusion steps, preventing the diffusion model from generating images deviating too much away from the SPI observations. This helps disambiguation and ensures more accurate SPI reconstruction. Specifically, we conduct linear projection to adjust the intermediate estimation of diffusion generation, expressed as
With Eq. (9), we assure data consistency and progressively guide the image generation in a controlled manner, as visualized in Fig. 3. As the physics guidance and the diffusion model are alternatively utilized in the SPI reconstruction process, APD-Net offers superior generalization capability. When changes in modulation patterns or sampling rates occur, only the physics-guidance step needs to be adjusted to reflect the new imaging model, whereas the diffusion model remains unchanged. This eliminates the need for training a new data-driven diffusion model for each specific setup.
Figure 3.Visualization of the model-based physics-guidance process. (a) Target scene. (b) Intermediate image after the diffusion denoising step, which at this point may appear overly blurry or smoothed. (c) Intermediate image after applying the model-based consistency correction step, where the image is refined to reflect more accurate details, although noise and artifacts may remain. Such noises and artifacts can be effectively removed by the following diffusion-denoising steps. (d) Influence from the physics-guidance step, which extracts information from the SPI observation and injects it into the SPI reconstruction process.
3.2.2 Physics-based initialization for reconstruction acceleration
Currently, one of the challenges that hinders the application of diffusion models for image restoration is its need for numerous iterations during inference. Although previous works have greatly accelerated the sampling process, most of them started reconstruction from scratch,45 missing the opportunity to utilize the information from the observation itself. Previous research has shown that early iterations are responsible for the overall shape and color of the scene, whereas later iterations focus more on details. Motivated by the fact that the pseudo-inverse of the SPI observation has already contained much structural and semantic information about the scene,35 we accelerate the diffusion reconstruction by skipping some of the early iterations and using the pseudo-inverse image as an initial estimation of the scene. The detailed process of APD-Net is shown in Algorithm 1.
|
|
|
Table 1. APD-Net algorithm.
4 Numerical Simulations
4.1 Analysis of Low-Sampling Capability
We conduct comparative analyses against four representative methods: gradient descent (GD),18 total variation (TV),46 PIDL,35 and an enhanced version of PIDL featuring test-time model-driven fine-tuning. Different from the original implementation of PIDL, where the coding patterns are co-optimized with the network, we manually specify and fix the pattern for training. All methods are tested using Cake-cutting Hadamard masks47 combined with the proposed extended Bayer color filter. The DNNs in learning-based methods are trained on the ImageNet dataset,48 which is one of the most widely used and influential datasets. It contains millions of images representative of a broad range of objects, including animals, plants, everyday objects, and scenes. Set5 and CelebA datasets are chosen as the testing sets. There is no intersection between the training and the testing sets. For color restoration in GD and TV, Malvar interpolation49 was used as a postprocessing step, whereas the output channel of PIDL was set to three for simultaneous SPI reconstruction and demosaicing. For each sampling rate, five distinct PIDL models were independently retrained. In all experiments, ADP-Net iterates 30 times to produce one reconstruction. The results, detailed in Table 1 and visualized in Fig. 4(a), demonstrate that APD-Net significantly outperforms the other methods. Notably, APD-Net is capable of faithfully reconstructing full-color
Figure 4.Visual analysis of low-sampling reconstruction and adaptability. (a) Performance of SPI on different sampling rates. (b) Performance of SPI on different modulation patterns. We note that the performance of the previous physics-informed method heavily relies on the accuracy of the pseudo-inverse
4.2 Analysis of Generality
We further evaluate APD-Net’s adaptability to various modulation patterns, reflecting different application scenarios and tasks in SPI. We assess the reconstruction performance using four distinct modulation strategies: random, Cake-cutting Hadamard, randomly ordered Hadamard, and Walsh Hadamard, all at a 5% sampling rate.50 The PIDL method is first trained with the Cake-cutting Hadamard patterns and then finetuned on each specific test data. To better understand the advantage of continuous physics prior injection in APD-Net, we also evaluate the performance of simple pseudo-inverse estimation of SPI, which serves as a once-for-all physics-informed initialization for the PIDL method. The results, depicted in Fig. 4(b), show that TV, the model-based iterative method, successfully reconstructs a blurry or defective image with every sampling pattern. Finetuned PIDL, the physics-enhanced deep learning method performs better in some cases, but it can barely produce any meaningful content when the direct pseudo-inverse does not provide any informative visual clue. On the other hand, APD-Net consistently delivers robust performance across all tested strategies. This superior flexibility stems from two factors. First, APD-Net’s training phase is independent of any specific task, which means that it does not develop a bias toward particular observation models and remains unbiased during inference. Second, APD-Net does not just rely on the physics model for initial guidance; it integrates the model continuously throughout the image-generation process. This constant interaction ensures that the network adapts dynamically, enhancing its flexibility and overall performance in diverse SPI scenarios.
4.3 Analysis of Accelerated Reconstruction
APD-Net leverages physics-informed initialization to expedite the reconstruction process. The reason is illustrated in Figs. 5(a) and 5(b). The error bars in the figure indicate the standard deviation. Each bar is calculated based on 10 images. In the experiment, APD-Net employs the standard diffusion process, starting with pure Gaussian noise and iteratively enhancing the image under physics guidance. The peak signal-to-noise ratio (PSNR) increases steadily over 100 steps, but initial reconstructions are coarse and noisy, only aligning with direct pseudo-inverse quality around the 30th step. Observing this, we initiate the process with a pseudo-inverse estimate instead of noise and eliminate early iterations. The results, as depicted in Figs. 5(c) and 5(d), show that this method halves the necessary diffusion steps without sacrificing image quality. The proposed APD-Net achieves high-quality reconstructions with significantly reduced computational requirements. With the physics-informed initialization, APD-Net requires only 30 iterations and takes
Figure 5.Convergence analysis and reconstruction acceleration. (a) Intermediate predictions during the progressive 100-step APD-Net generation process. (b) Plot for PSNR of intermediate predictions. (c) Visual analysis of the final prediction of APD-Net with different skipped steps. (d) Plot for PSNR of predictions with different skipped steps. Images are taken from the CelebA dataset.
4.4 Analysis of Noise Robustness
In assessing the robustness of APD-Net against noise, a comparative study was undertaken alongside TV and PIDL methods. The results, depicted in Fig. 6, consider a 5% sampling rate and varied noise levels, defined as
Figure 6.Analysis of noise robustness. (a) Visual comparison with model-based TV and deep learning-based PIDL. (b) Quantitative performance at different noise levels. Images are taken from the Set5 dataset.
Figure 7.Optical SPI setup for real-world validation. A projector projects extended Bayer-modulated structural light to the target object. A photodetector gathers the light signal reflected from the object and transforms it into an electric signal, which is read out by an oscilloscope.
5 Experimental Demonstrations
We demonstrate APD-Net using a real-world active SPI system, as schematically shown in Fig. 7. In the experiments, the object is illuminated by binary structured light with Hadamard modulation from a laser DLP projector (N1 Pro, JMGO, Shenzhen, China). The resolution of the modulation pattern is
Figure 8.Visualization for real-world SPI reconstruction with ultra-low sampling and different modulation patterns. (a) Different sampling rates. (b) Different modulation patterns.
Figure 8(a) visualizes the results for ultra-low-sampling SPI reconstruction. At sampling rates as low as 1%, the available information is very limited. The model-based TV method produces a rough and blurry estimation of the scene. The learning-based PIDL method enhances the reconstruction by generating more textures and details but suffers from severe blocking artifacts. By contrast, APD-net incorporates the characteristics of natural signals and the matrix operations of SPI as the prior information for image reconstruction, striking a good balance between image sharpness and overall quality. In real-life experiments, the proposed APD-Net outperforms the learning-based PIDL by a larger margin than in numerical simulations. The superiority in performance verifies APD-Net’s noise robustness and the ability to generalize to new environments. Unlike PIDL, which amplifies the measurement noises and generates visually unpleasant artifacts, APD-Net’s diffusion denoising network effectively corrects the reconstruction error from real-world disturbed inputs. In addition, the task-free learning process of the data-driven diffusion prior effectively captures the probabilistic distribution of natural scenes, allowing it to adapt well from simulation to real experiments.
Figure 8(b) presents experimental results using different modulation patterns. We select two different orders of Hadamard matrix at a 4% sampling rate. The network for PIDL is trained on the paired observation-ground truth dataset using the Cake-cutting Hadamard pattern, whereas our APD-Net is trained only with high-quality images from ImageNet. Although the PIDL method successfully restores most of the details with the original Cake-cutting order, it struggles to generalize to the unseen Walsh order, even with an additional self-supervised fine-tuning step. The reconstruction becomes blurry with lower contrast. By contrast, the proposed APD-Net captures the task-agnostic image prior that is invariant to specific imaging models. As a result, it can maintain high performance with unseen imaging models.
Figure 9 demonstrates the effectiveness of the reconstruction speed-up scheme. For the same object (a doll in front of a resolution chart, at 5% sampling), the reconstruction results are nearly identical, with a PSNR exceeding 45 dB. The difference map shows no structural difference between the two reconstructions, confirming the effectiveness of the acceleration method. We also analyze the imaging resolution of the APD-Net. The results indicate that APD-Net outperforms both model-based and learning-based methods. At extremely low sampling rates with the existence of measurement noise, the simple pseudo-inverse method suffers from aliasing and fails to distinguish close lines, whereas the learning-based PIDL method introduces additional distortions. APD-Net avoids the drawbacks of both methods. The data-driven learned prior aids in de-aliasing from limited observations, and the guidance of the forward imaging model further prevents the neural network from generating physically incompliant reconstructions. Therefore, APD-Net achieves higher resolution than previous methods.
Figure 9.Visualization for the performance of accelerated reconstruction.
Figure 10 demonstrates the effectiveness of the extended Bayer filter for low-sampling color reconstruction. The image is taken at just a 1% sampling rate. The Bayer filter is spatially extended at a factor of 4. At such a low sampling rate, the pixel-wise structural details are susceptible to measurement noises and errors in image reconstruction, leading to inaccurate color restoration. However, the proposed extended Bayer filter averages the amplitude fluctuation of nearby pixels and makes the reconstructed color closer to reality.
Figure 10.Visualization for the effectiveness of the extended Bayer color modulation.
The above experiments validate the real-world superiority of APD-Net in high-quality ultra-low-sampling reconstruction, generalization to new patterns, and accelerated reconstruction. Although these experiments are conducted in the visible band, APD-Net can be easily applied to other extreme wavelengths or low-light conditions, where the dataset may be very small or even unavailable. First, image characteristics such as object shape and texture are common priors that can be leveraged across different imaging modalities for image reconstruction. For instance, within the APD-Net framework, we can incorporate the data-driven RGB image prior to capture shape information and combine it with the specific physics model for multispectral imaging to restore color information. These two priors are alternately optimized to achieve high-quality multispectral image restoration. Second, it is possible to use the image prior collected from one source domain to boost the performance in another domain.52 For example, a researcher may fully sample one or two different scenes in the low-light scenarios and use these images to finetune the generative diffusion model pretrained on the ImageNet dataset. The prior knowledge of natural image distribution can be transferred to the new modality. In this way, APD-Net can be effectively applied in various SPI scenarios with minimal costs.
Moreover, the proposed APD-Net could inspire advancements in other computational imaging problems, such as diffuser imaging,53 holographic imaging,54 and snapshot compressive sensing.55 Like SPI, these problems involve the matrix modulation of high-dimensional data, and their image reconstruction is an ill-posed inverse problem. The explicit forward imaging model serves as a physics prior, whereas a generative diffusion network can encode prior knowledge of their probabilistic distributions. By adopting APD-Net’s approach of alternative optimization between these two priors, high-quality predictions for specific imaging tasks can be achieved. Thus, APD-Net may have a broader impact beyond the scope of SPI.
6 Conclusion
In this paper, we introduce an ultra-low-sampling SPI reconstruction framework, named APD-Net. APD-Net leverages both general data-driven diffusion priors and the task-specific physics imaging model for enhanced SPI reconstruction. These two distinct priors are harmonized with a co-optimization strategy during diffusion model inference, significantly diminishing the reliance on extensive sampling. APD-Net boasts remarkable flexibility, enabling a single trained network to adapt to various modulation patterns and sampling rates seamlessly. A key advantage of APD-Net is its accelerated reconstruction speed, surpassing other physics-informed methods due to the elimination of fine-tuning on measurements. Enhanced by physics-based initialization, APD-Net requires merely a third of the iteration steps needed by the traditional diffusion counterpart. Through numerical simulations and real prototype experiments, APD-Net achieves high-quality, full-color reconstructions of complex natural images at a low sampling rate of 1%, with an improvement of over 2 dB in PSNR. In addition, APD-Net demonstrates superior computational efficiency, reducing the reconstruction time by a factor of 50 compared with the widely used PIDL method.
We expect APD-Net to represent significant advancement toward the practical application of SPI, especially in fields such as medical imaging and industrial inspection. APD-Net can be beneficial for imaging tissues with high precision and reducing patient exposure to prolonged imaging sessions. In industrial inspection, SPI equipped with APD-Net can be used to inspect materials and structures that are challenging to image with conventional cameras due to their size, shape, or material properties. The data-driven diffusion prior may allow for accurate reconstruction and identification of defects such as cracks, voids, or inclusions with high precision and efficiency. This can improve the reliability and safety of industrial components and structures.
Biographies of the authors are not available.
References
[45] J. Song, C. Meng, S. Ermon. Denoising diffusion implicit models(2020).
[46] C. Li. An Efficient Algorithm for Total Variation Regularization with Applications to the Single Pixel Camera and Compressive Sensing(2010).
[48] P. Dhariwal, A. Nichol. Diffusion models beat GANs on image synthesis, 8780-8794(2021).
[49] H. S. Malvar, L.-W. He, R. Cutler. High-quality linear interpolation for demosaicing of Bayer-patterned color images, iii-485(2004).

Set citation alerts for the article
Please enter your email address