
- Advanced Imaging
- Vol. 2, Issue 3, 031001 (2025)
Abstract
Keywords
1. Introduction
Computational imaging is a novel optical technology, implemented by the joint design of optical capture and computational algorithms[1,2]. By incorporating computational techniques into imaging processes, based on interdisciplinary collaboration, it significantly relaxes the constraints of optical system design and opens up new possibilities for expanding imaging capabilities. In recent studies, computational imaging not only significantly enhances optical imaging performance but also expands the dimensional scope of information perception, including aspects such as phase, spectrum, polarization, light field, and depth of field[2–7]. Benefiting from this, we can achieve imaging quality comparable to that of complex optical systems, using simple and compact imaging techniques. With phase differences corrected through backend algorithms, various single-lens imaging strategies have emerged, including deep Fresnel lenses and diffractive optical elements (DOEs)[7–13]. However, due to the introduction of specialized optical designs for information encoding, single-lens computational imaging requires corresponding image reconstruction algorithms to achieve high-quality image restoration[7–13]. This results in additional time delays, as well as increased computational and power demands, which hinder high-speed, low-power optical applications such as unmanned aerial vehicle (UAV) remote sensing and biomedical imaging. Therefore, a low-latency reconstruction algorithm with high computational and power efficiency, capable of running at the edge, has become essential for advancing computational imaging.
The image restoration of computational imaging is an inverse process of inferring target intensity from measured values. The commonly used restoration algorithms can be divided into forward solving algorithms, model-based optimization iterative algorithms, and deep learning algorithms[2]. Currently, data-driven deep learning with neural networks has become one of the most widely used image reconstruction algorithms, owing to its ability to directly model the relationship between compressed data and optical target patterns[10–13]. Compared with traditional methods, it can fit the errors between the reconstruction model and actual imaging process precisely, and achieve higher efficiency by combining with a graphics processing unit (GPU). Unfortunately, current research hotspots in reconstruction algorithms primarily focus on improving reconstruction performance. The disregard for parameters and complexity, along with poor adaptability induced by a lack of customization for edge chips, has kept many excellent studies from advancing beyond the lab. This is because latency and frame rate in practical applications of computational imaging remain significant challenges due to the power and computing limitations of edge chips[14–16].
Model compression of reconstruction neural networks is a crucial approach for accelerating computational imaging at the edge. Lightweight compression techniques, such as pruning, quantization, distillation, and neural architecture search (NAS), have demonstrated promising results in computer vision and remote sensing applications[17–24]. Among these, pruning and quantization, as the earliest and most mature methods, have made initial attempts in edge acceleration for non-computational imaging restoration tasks[25,26]. To optimize the trade-off between efficiency and restoration accuracy, sensitivity-based pruning strategies have been widely adopted in many studies[27–31]. A layer-wise sensitivity analysis of U-Net reveals that many redundant filters exist in the innermost layers near the bottleneck, making them highly amenable to pruning. Based on this observation, this work achieved a
Sign up for Advanced Imaging TOC. Get the latest issue of Advanced Imaging delivered right to you!Sign up now
In this paper, we propose an edge-accelerated reconstruction strategy based on end-to-end sensitivity analysis for single-lens computational imaging systems, as shown in Fig. 1. The on-chip performance of the restoration algorithm, including recovery quality and inference speed, is used as the optimization objective to achieve enhanced in situ reconstruction results within the camera. To ensure optimal compatibility between the selected chip and the restoration algorithm, jointly trained with a single-lens computational imaging system, a performance evaluation of the operators in the reconstruction network is initially conducted on the edge. Operators with poor hardware support are removed or replaced at this stage. Meanwhile, sensitivity analysis of the network for pruning and quantization is performed on the edge chip to provide guidance and constraints for the following model compression. Building on this foundation, we apply higher pruning ratios and lower quantization precision to less sensitive blocks, while applying lower pruning ratios and higher quantization precision to more sensitive ones. The proposed model compression strategy, guided by detailed hardware sensitivity, strikes a balance between reconstruction quality and computational efficiency. Finally, the optimized models are deployed on the target AI chip to evaluate the edge acceleration effect. The experimental results show that, compared to the traditional approach without hardware feature guidance, the proposed strategy achieves better performance in both reconstruction quality and speed, with reduced complexity and fewer MACs. Our work demonstrates the effectiveness of edge sensitivity analysis in in situ image restoration for single-lens computational imaging and paves the way for lightweight, low-latency computational imaging applications.
Figure 1.The proposed edge acceleration framework for
In summary, the specific contributions of our work are as follows:
- 1)We propose an edge-accelerated reconstruction strategy for infrared single-lens computational imaging, effectively balancing the speed and performance of in situ image restoration.
- 2)We develop an end-to-end sensitivity analysis framework to model the nonlinear relationships between the parameters and MACs of neural networks and their actual operational efficiency on edge devices.
- 3)We introduce the compatibility-based operator reconfiguration, sensitivity-aware pruning, and mixed quantization techniques to scientifically guide the model compression process, addressing the challenges of complex multifactor coupling during edge deployment.
- 4)We perform dataset and real-world image restoration experiments using a single-lens camera integrated with the RK3588 NPU chip, verifying the effectiveness and practical engineering potential of the proposed method.
2. Related Work
Our work relates to image restoration for computational imaging, edge acceleration, and neural network compression, primarily focusing on pruning and quantization.
2.1. Image restoration
Traditional optical restoration methods, such as rain and fog removal, primarily address the impact of environmental interference on image quality, including factors like atmospheric scattering[39–41]. In contrast, image restoration for computational imaging emphasizes the physical constraints of the imaging system and requires a combination of hardware and algorithms for reconstruction. Currently, most computational imaging reconstruction efforts focus on enhancing restoration performance and increasing the dimensionality of light field information. Reference [42] proposed a transformer-based U-Net model to achieve high-quality imaging with a single-lens system. Reference [10] assessed the achromatic effect of Restormer in natural scenes of the real world. Excellent hyperspectral depth imaging performance was achieved through the joint optimization of DOEs and a neural network[43,44]. Reference [3] used deep learning methods to decode polarization and spectral information, mapping all high-dimensional light field data to a single imaging process. In recent research, lightweight reconstruction methods have emerged to achieve video-level reconstruction of hyperspectral imaging[45,46].
2.2. Edge acceleration
Edge acceleration refers to the optimization and acceleration of neural network models on edge devices, such as smartphones, cameras, and embedded systems. Since these devices typically have limited computational power and resources, offloading neural network inference tasks from the cloud to the edge can significantly reduce latency, improve real-time processing capabilities, and lower bandwidth requirements. Reference [47] proposed an intelligent coscheduling framework for efficient super-resolution, leveraging heterogeneous computational resources on an edge system on chip (SoC), including a central processing unit (CPU), a GPU, and an NPU. The combination of the Winograd algorithm and FPGA effectively lowers the computational cost of convolution operations while significantly boosting hardware inference speed[48,49]. References [50,51] implemented integrated storage and computing using compute-in-memory technology and neuro-inspired memristors for energy-efficient edge computing. Additionally, the model compression techniques discussed in the subsequent section were also used to reduce model size and computational complexity, facilitating better adaptation to the resource constraints of edge devices.
2.3. Model compression
Commonly used model compression techniques include pruning, quantization, knowledge distillation, and NAS. Our work primarily focuses on hardware-aware pruning and quantization.
2.3.1. Model pruning
Model pruning reduces the computational complexity and storage requirements of a neural network by eliminating redundant parameters, thereby improving the model’s operational efficiency without significantly compromising its performance. References [52,53] used unstructured pruning to remove redundant elements from the weight matrix. However, on hardware without specialized support, the efficiency improvements from fine-grained sparse matrix computation are limited. Structured pruning directly removed entire channels, convolution kernels, or layers[31,54,55]. This approach enhanced compatibility with the parallel computing capabilities of existing hardware (such as NPUs and CPUs) and facilitated deployment on edge devices. References [31,54,55] proposed various evaluation criteria to guide pruning, including the
2.3.2. Model quantization
Quantization reduces model size and computational complexity by lowering the precision of numerical representations within the model. It is also a crucial technique for deploying models on edge hardware platforms, such as NPUs and FPGAs. References [56,57] used post-training static quantization (PTQ) to quantize model parameters without requiring additional training or data. References [58,59] used quantization-aware training (QAT) to explicitly introduce quantization errors into the optimization process during training, enabling the model to adapt to low-precision constraints and minimizing performance loss. Extremely low-bit quantization, such as binary networks, has been applied for exploration[19,60], but challenges remain in edge deployment. Similar to pruning, mixed quantization techniques based on sensitivity analysis have also emerged and shown preliminary applications[32,33].
2.3.3. Hardware-aware compression
In general, compression efficiency at the edge primarily targets power consumption, memory usage, and inference time, while also considering model performance[34–38]. The key to hardware-aware model compression lies in accurately modeling hardware characteristics. State-of-the-art research predominantly employs joint optimization strategies that integrate multiple compression techniques—such as combining pruning with mixed-precision quantization, merging fine-grained and coarse-grained pruning, and incorporating parameter sharing with structural simplification[34,36–38]. These methods have been shown to substantially enhance overall efficiency on edge devices. Concurrently, some studies focus on enabling learning-based automatic optimization and deployment without the need for retraining[34].
3. Method
In this section, we propose an edge-accelerated reconstruction method based on end-to-end sensitivity analysis for single-lens computational imaging systems. Before diving into the details, we briefly explain the basic idea of this work. The standpoint of our study is based on the fundamental premise that, at the current level of technology, the parameters and MACs of neural networks are not directly or linearly correlated with their actual operational efficiency on edge chips. This is mainly due to the complex multifactor coupling challenges during deployment, including matching operators with logic circuits and coordinating instruction execution with data access. Therefore, in a single-lens computational imaging system, it is crucial to perform end-to-end optimization based on the model’s actual performance at the edge. In this work, we first generate a degraded dataset using finely calibrated point spread functions (PSFs) designed for our single-lens infrared computational camera, shown in Fig. 2(a), as reported recently[61]. The baseline restoration network incorporating physical constraints, as shown in Fig. 2(b), is further trained on the constructed dataset. More importantly, to enable video-level in situ reconstruction of infrared images on the AI chip inside the camera, we performed end-to-end model compression guided by edge sensitivity. The details of the edge acceleration framework shown in Fig. 1 are described as follows:
- 1)Operator reconfiguration: Evaluate the compatibility of all operators in the baseline network with the specified edge-AI chip, and use the results to optimize operator fusion, deletion, or replacement.
- 2)Edge sensitivity analysis: Directly perform end-to-end edge sensitivity analysis for pruning and quantization on the specified edge-AI chip, deriving sensitivity rules for the selected networks.
- 3)Sensitivity-aware pruning: Perform non-uniform pruning for different layers or blocks under the guidance of sensitivity rules.
- 4)Sensitivity-aware quantization: Perform mixed quantization for the pruned network under the guidance of sensitivity rules.
- 5)Edge deployment and evaluation: Convert the compressed model and deploy it to the edge chip inside the camera for verification, thereby completing edge-accelerated reconstruction.
Figure 2.(a) The prototype of used single-lens infrared computational camera. (b) The architecture of the original network used in this work.
3.1. Operator reconfiguration
The foundation of this work is that the same network may exhibit varying performance on different edge-AI chips, primarily due to differences in compatibility of the corresponding operators with the hardware architecture. The proposed compatible optimization can be interpreted as follows. First, decompose the baseline network and classify the operators. Generate a time consumption report for network inference on the selected edge chip, including the proportion of each operator and layer in the total inference time. Next, considering the computational complexity, importance, and edge inference time of operators, identify those with poor adaptability. By combining hardware characteristics, operators fusion, elimination, and replacement are applied to address the blocking point. Thus, we obtain an equivalent network optimized for compatibility with the selected edge chip.
3.2. Edge sensitivity analysis
The proposed edge sensitivity analysis is based on the network optimized for compatibility, as described above. The smallest unit for sensitivity analysis can be either a layer or a block, where a block refers to the combination of multiple adjacent layers. Consider a network
Specifically, the edge sensitivity for pruning can be measured as follows: Pruning the
3.3. Sensitivity-aware pruning
We denote
The intuitive explanation for sensitivity-aware pruning is that less sensitive layers can tolerate higher pruning ratios, while more sensitive layers can only tolerate lower pruning ratios. With the pruning sensitivities
Once the pruning ratios for all layers are determined, the
A filter’s
3.4. Sensitivity-aware quantization
The method of PTQ with mixed precision is used before model deployment in our work. Given a floating-point weight
Similar to pruning, less sensitive layers can tolerate lower quantization bit-widths, while more sensitive layers require higher bit-widths. At the same time, the specific bit-width must be constrained by the hardware support of edge chips. If the chip is not a dedicated ASIC, this value is typically a power of 2.
3.5. Edge deployment and evaluation
The evaluation metrics on the edge include both performance and efficiency. For performance evaluation, image reconstruction quality is assessed using the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM), with testing conducted on the chip. For efficiency evaluation, both latency and power consumption are considered. Due to the utilization of all available computing power in our research, both latency and power consumption can be equivalently expressed using frame rate. As discussed above, the parameters and MACs of neural networks are not directly used to guide model compression. However, testing and discussion are still conducted in this article to support our argument. In addition, we test the modulation transfer function (MTF) after in situ reconstruction on the edge, using it to evaluate the optical performance of single-lens computational imaging.
4. Experiments
4.1. Camera, dataset, and model
The single-lens camera used in this paper was previously designed by our team[61], as shown in Fig. 2(a), through a collaborative effort between optical design and restoration algorithms. The system has a spectral range of 8–12 µm, a focal length of 70 mm, and an F-number of 1.0. It integrates an uncooled infrared detector with a resolution of 640 pixel × 480 pixel. The goal of this work is to directly deploy the image restoration algorithm within the camera, which was initially run on a GPU. Thus, an RK3588 NPU chip is integrated into the camera to enable in situ reconstruction of infrared images.
The dataset used in the edge acceleration was constructed by our team based on the characteristics of the single-lens camera. The PSF of the single-lens camera, after detailed calibration, is used to simulate the degradation of clear images. Blurry images are generated through a convolution operation, with the clear images serving as the ground truth (GT). So far, blurry images with simulated detector noise can be input into the network for training. The dataset includes multiple categories, such as buildings, vehicles, humans, and optical target patterns. According to the ratio of 9:1, 10,068 images in the datasets are divided into a training set and a test set.
As shown in Fig. 2(b), a classical U-Net architecture is used for data reconstruction in single-lens computational imaging. The downsampling blocks consist of two convolution (Conv) layers with LeakyReLU activation followed by a MaxPooling layer. The upsampling blocks comprise a transposed convolution (ConvT) layer, followed by concatenation with the output from the downsampling blocks, and then two additional Conv layers. The input and output dimensions of the network are both
4.2. Implementation details
We first introduce the settings for training, testing, and fine-tuning. To ensure a fair comparison between different methods, we train the networks using the same settings and number of iterations. Specifically, we use the Adam optimizer with
As for the implementation details for edge acceleration, the DepGraph mechanism from the Torch-Pruning toolkit[62] is used to facilitate efficient parameter grouping in this work, simplifying the pruning process. Specifically, a pruning ratio dictionary can be defined for each layer or block. The optimized model is deployed to the RK3588 chip for edge inference testing, with RKNN-Toolkit2 (v2.0.0) used for model quantization and conversion. A subset of images from the dataset is used for calibration during the quantization process. The RKNPU driver version is v0.9.6, and the inference process utilizes all three cores of the NPU.
4.3. Operator reconfiguration results
We first pre-deployed the original U-Net network onto the RK3588 chip for hardware compatibility testing. The experimental results of the pre-deployment are shown in Fig. 3. Significantly, Conv and LeakyReLU are fused by default, as this is a common optimization technique. We obtained a total inference time of 74 ms, with a frame rate of 13.5 fps for 8-bit integer quantization (INT8), which does not meet the real-time imaging requirements. We analyzed the time consumption of each operator in the network and found that Conv and ConvT operations account for the majority of the network’s inference time. This is reasonable since most computations are concentrated in these two operations. However, the MaxPooling operator used for downsampling accounts for about 10% of the inference time, which is not cost-effective. In addition, some independent LeakyReLU operations remain because the RK3588 does not support hardware-level fusion of LeakyReLU and ConvT layers.
Figure 3.Edge inference time of operators optimized by default.
Thus, we further removed all MaxPooling layers and set the stride of the preceding Conv layers to 2, effectively achieving downsampling. Considering both time consumption and importance, we removed all LeakyReLU layers after the ConvT layer, with no significant decrease in performance. Detailed ablation experiments evaluating the impact of performance are provided in the Supplement 1. After these optimizations, the network inference time was reduced by approximately 10 ms.
4.4. Sensitivity results
We conducted edge sensitivity analysis for pruning on the network after operator reconfiguration, using blocks as the minimum unit. To obtain more comprehensive sensitivity results, pruning was performed on each block using three fixed rates: 25%, 50%, and 75%. Figure 4 presents the pruning sensitivity results, comparing before and after quantization, as well as before and after fine-tuning, for both PSNR and SSIM. PSNR and SSIM both exhibited a decrease in performance after pruning across various testing conditions. The curves are quite similar, with the outer blocks demonstrating greater sensitivity compared to the middle ones. The performance of INT8 is generally lower than FP32, which is an inevitable consequence of quantization. In addition, we observe that while the trend of the results after fine-tuning remains consistent with that before retraining, the effect becomes less pronounced. This is because the optimization goal of fine-tuning is to enhance performance, which compensates for the loss caused by pruning in certain layers. However, this also diminishes the observed sensitivity pattern to some extent. Therefore, we ultimately used the on-chip INT8 test results before fine-tuning, with PSNR and SSIM weighted equally, to calculate the sensitivity metric. As for fine-tuning, it is just performed for final performance testing after completing sensitivity-aware pruning.
Figure 4.(a)–(f) Performance degradation caused by pruning at different stages. The black lines indicate the performance of the unpruned model without pruning, while the colored lines represent the performance after pruning the corresponding proportion of each block individually. For example, the red marker on the horizontal axis of D1 denotes the performance after pruning 25% of the D1 block, based on the original model.
Quantization sensitivity is measured in a similar manner. However, since the RK3588 chip currently only supports 16-bit floating point (FP16) and INT8 precisions, the experiments that can be conducted are relatively limited. We perform INT8 quantization on each block and FP16 quantization on the remaining layers, then test the edge performance. The network without pruning was quantified to FP16, as the baseline of quantization. Both the unpruned network and the network with 50% uniform pruning for all blocks were tested simultaneously. The experimental results in Fig. 5 indicate that for models with no pruning and uniform pruning, their quantization sensitivity is consistent. The curves drop sharply only in the last block, indicating that it is more sensitive to quantization. Considering the combined effects of pruning and quantization, we calculated the edge sensitivity using a network with 50% uniform pruning, with PSNR and SSIM weighted equally.
Figure 5.(a), (b) Performance degradation caused by quantization for different blocks. The black lines indicate the performance of the unpruned model with FP16 quantization, while the colored lines represent the performance of both the unpruned model and the uniformly 50%-pruned model under INT8 quantization, applied to each block individually. For example, the green marker on the horizontal axis of D1 denotes the performance after applying INT8 quantization to the D1 block based on the original model.
The sensitivity
Figure 6.(a) The edge pruning sensitivity results. (b) The edge quantization sensitivity results.
4.5. Compression results
In the model compression evaluation experiment, the pruning ratio and quantization bit-width settings for each block are shown in Tables 1 and 2. For pruning, considering the edge acceleration ratio, pruning sensitivity, and hardware constraints comprehensively, we tested two combinations of pruning ratios (Sensitive-B & Sensitive-C), with the only difference occurring in the middle block. As a comparison, we conducted tests under identical conditions on both the baseline network and the uniform pruning network with a 50% ratio simultaneously, which was used in our previous work[61]. In addition, we conducted ablation experiments by testing pruning ratios without considering hardware compatibility, including both uniform pruning and sensitivity-aware pruning (Sensitive-A). The specific number of channels corresponding to different pruning ratios is provided in detail in the Supplement 1. For quantization, we tested three methods with the bit-widths of FP16, INT8, and mixed precision. All the results are presented in Table 3.
Pruning method | D1 | D2 | D3 | D4 | C1 | U4 | U3 | U2 | U1 |
Unprune | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Uniform-50%[ | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
Uniform-60% | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 |
Sensitive-A | 0.125 | 0.25 | 0.5 | 0.75 | 0.875 | 0.75 | 0.5 | 0.25 | 0.125 |
Sensitive-B | 0.5 | 0.5 | 0.5 | 0.75 | 0.75 | 0.75 | 0.5 | 0.5 | 0.5 |
Sensitive-C | 0.5 | 0.5 | 0.5 | 0.75 | 0.875 | 0.75 | 0.5 | 0.5 | 0.5 |
Table 1. Pruning Ratio Settings for Each Block in Different Methods.
Quantization method | D1 | D2 | D3 | D4 | C1 | U4 | U3 | U2 | U1 |
FP16 quantization | FP16 | FP16 | FP16 | FP16 | FP16 | FP16 | FP16 | FP16 | FP16 |
INT8 quantization | INT8 | INT8 | INT8 | INT8 | INT8 | INT8 | INT8 | INT8 | INT8 |
Mixed quantization | INT8 | INT8 | INT8 | INT8 | INT8 | INT8 | INT8 | INT8 | FP16 |
Table 2. Quantization Settings for Each Block in Different Methods.
Model | Params | MACs | FP16 | INT8 | Mixed | ||||||
PSNR | SSIM | FPS | PSNR | SSIM | FPS | PSNR | SSIM | FPS | |||
Unprune | 8.63M | 68.12G | 36.80 | 0.9528 | 8.02 | 34.61 | 0.9245 | 17.07 | 34.99 | 0.9387 | 12.28 |
Uniform-50%[ | 3.92M | 28.44G | 36.55 | 0.9518 | 18.37 | 34.76 | 0.9148 | 29.70 | 35.60 | 0.9414 | 23.68 |
Uniform-60% | 3.06M | 21.33G | 36.43 | 0.9512 | 7.80 | 34.14 | 0.8885 | 10.85 | 35.45 | 0.9373 | 9.21 |
Sensitive-A | 1.80M | 33.96G | 6.13 | 10.71 | 35.87 | 0.9444 | 6.72 | ||||
Sensitive-B | 2.15M | 23.67G | 36.67 | 0.9523 | 21.52 | 34.78 | 0.9158 | 32.97 | 25.22 | ||
Sensitive-C | 36.70 | 0.9522 | 34.42 | 0.9094 | 35.87 | 0.9429 |
Table 3. Results of Model Compression Evaluation Experiments.
The following conclusions can be drawn from the experimental data of Table 3: 1) Pruning has significantly improved the speed of edge inference. Compared to uniform pruning, sensitivity-aware pruning not only boosts speed but also results in higher edge performance. 2) Compared to a fixed quantization bit-width, sensitivity-aware mixed quantization strikes a desired balance between speed and edge performance. 3) Hardware constraints must be strictly adhered to, as pruning ratios that do not comply with these constraints result in poor inference time for both uniform and sensitivity pruning. This is the key factor that leads to the inconsistency between edge efficiency and network complexity, such as the parameters and MACs in Table 3.
Considering both inference time and edge performance, we ultimately selected Sensitive-B as the final edge-accelerated reconstruction solution for our single-lens computational camera. This is because it achieved optimal PSNR and SSIM while maintaining frame rates greater than 25 Hz.
4.6. Reconstruction ablation experiments
We used the unpruned model with FP16 quantization, the 50% uniform pruning model with INT8 quantization, and the sensitivity-aware pruning model with mixed quantization to conduct on-chip image restoration tests on typical scenes from the dataset. The experimental results are shown in Figs. 7 and 8, where the sensitivity-aware compression achieved performance comparable to the unpruned network. Specifically, as shown in Fig. 7, sensitivity-aware compression yields better reconstruction of fine cloud textures compared to uniform compression, closely matching the unpruned network performance. Figure 8 illustrates the enhancement effect of the proposed method in building and vegetation, demonstrating improved performance in restoring and extracting high-frequency details, such as building textures and branches. In summary, the proposed method achieves superior performance in both visual perception and specific evaluation metrics, while maintaining the same network complexity and improving edge reconstruction speed.
Figure 7.Ablation experimental results on reconstruction focus on the details of clouds in local areas. Sensitivity-aware pruning restores finer texture details within clouds compared to uniform pruning, closely matching the performance of the unpruned network.
Figure 8.Ablation experimental results on reconstruction focus on the details of vegetation in local areas. Sensitivity-aware pruning restores finer texture details of branches compared to uniform pruning, closely matching the performance of the unpruned network.
4.7. Prototype experiments
We use the single-lens camera with integrated edge acceleration to capture optical target patterns for MTF testing. The test results at room temperature (25°C) are shown in Fig. 9. The MTFs across various fields at the Nyquist frequency (42 lp/mm) all exceed 0.5. The MTF test results demonstrate that the edge model with sensitivity-aware compression provides the single-lens camera with excellent high-frequency performance.
Figure 9.Experimental results of MTF testing. The MTFs across various fields at the Nyquist frequency (42 lp/mm) all exceed 0.5, showing excellent high-frequency performance.
We also conducted outdoor experiments at the Siping Road campus of Tongji University, Shanghai, China. The experimental results are shown in Figs. 10 and 11. The images of buildings, natural scenery, and people exhibit excellent clarity and detail. Especially, the capture of extremely distant aircraft, shown in Fig. 11, demonstrates our significant potential in small infrared target detection applications. To demonstrate the real-time performance of the camera at a frame rate of 25 Hz, we provide the corresponding live video in the Supplement 2, including raw blurry videos and clear videos reconstructed in real-time on the RK3588 chip.
Figure 10.Outdoor experimental assessment with real-time on-chip reconstruction.
Figure 11.Experimental results of small infrared target tracking.
5. Conclusion
In this work, we propose an edge-accelerated reconstruction strategy based on end-to-end sensitivity analysis for single-lens infrared computational cameras. The edge performance of the restoration algorithm, deployed on the RK3588 chip, is used as guidance for model compression. Specifically, we used a method based on compatible optimization of operators, sensitivity-aware pruning, and sensitivity-aware mixed quantization to balance the inference speed and reconstruction quality of the model. The proposed model compression strategy, guided by detailed hardware sensitivity, achieves better performance in both reconstruction quality and speed, with reduced complexity and fewer MACs. The experimental results indicate that, compared to uniform pruning and quantization, sensitivity-aware compression significantly improves performance, particularly in enhancing high-frequency details and suppressing noise. The excellent field experimental results demonstrate the practical potential of our method for high-speed, high-quality video reconstruction in computational imaging. Our edge-accelerated reconstruction method achieves a balance between performance and efficiency through the joint optimization of hardware and software, paving the way for the application of lightweight, low-latency computational imaging in fields such as UAV-based optical monitoring and in situ medical examination.
References
[1] A. Bhandari, A. Kadambi, R. Raskar. Computational Imaging(2022).
[7] S.-H. Baek et al. Single-shot hyperspectral-depth imaging with learned diffractive optics, 2651(2021).
[25] X. Shi et al. Memory-oriented structural pruning for efficient image restoration, 37, 2245(2023).
[26] J. Oh et al. Attentive fine-grained structured sparsity for image restoration, 17673(2022).
[27] B.-K. Kim, S. Choi, H. Park. Cut inner layers: a structured pruning strategy for efficient U-Net gans(2022).
[31] Y. He et al. Filter pruning via geometric median for deep convolutional neural networks acceleration(2019).
[32] Z. Dong et al. HAWQ: Hessian aware quantization of neural networks with mixed-precision(2019).
[33] Z. Dong, H. Larochelle et al. HAWQ-V2: Hessian aware trace-weighted quantization of neural networks. Advances in Neural Information Processing Systems, 18518(2020).
[37] J. Xiao et al. HALOC: hardware-aware automatic low-rank compression for compact neural networks, 37, 10464(2023).
[38] A. Desai, A. Krause, K. Zhou, A. Shrivastava et al. Hardware-aware compression with random operation access specific tile (ROAST) hashing, 7732(2023).
[43] Q. Sun et al. Learning rank-1 diffractive optics for single-shot high dynamic range imaging(2020).
[44] C. A. Metzler et al. Deep optics for single-shot high-dynamic-range imaging(2020).
[48] L. Lu et al. Evaluating fast algorithms for convolutional neural networks on fpgas, 101(2017).
[49] S. Kala et al. UniWiG: unified winograd-GEMM architecture for accelerating CNN on fpgas, 209(2019).
[52] S. Han, H. Mao, W. J. Dally. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding(2016).
[53] Y. Tang et al. Manifold regularized dynamic network pruning, 5018(2021).
[54] H. Li et al. Pruning filters for efficient ConvNets(2017).
[55] P. Molchanov et al. Importance estimation for neural network pruning(2019).
[56] K. Chaudhuri, R. Zhao, R. Salakhutdinov et al. Improving neural network quantization without retraining using outlier channel splitting, 7543(2019).
[57] M. Nagel et al. Data-free quantization through weight equalization and bias correction(2019).
[58] S. K. Esser et al. “Learned step size quantization(2020).
[59] R. Gong et al. Differentiable soft quantization: bridging full-precision and low-bit neural networks(2019).
[60] Y. Cai, A. Oh et al. Binarized spectral compressive imaging, 38335(2023).
[62] G. Fang et al. DepGraph: Towards any structural pruning, 16091(2023).

Set citation alerts for the article
Please enter your email address