Edge accelerated reconstruction using sensitivity analysis for single-lens computational imaging

Xuquan Wang; Tianyang Feng; Yujie Xing; Ziyu Zhao; Xiong Dun; Zhanshan Wang; Xinbin Cheng

doi:10.3788/AI.2025.10003

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Abstract

Computational imaging enables high-quality infrared imaging using simple and compact optical systems. However, the integration of specialized reconstruction algorithms introduces additional latency and increases computational and power demands, which impedes the performance of high-speed, low-power optical applications, such as unmanned aerial vehicle (UAV)-based remote sensing and biomedical imaging. Traditional model compression strategies focus primarily on optimizing network complexity and multiply-accumulate operations (MACs), but they overlook the unique constraints of computational imaging and the specific requirements of edge hardware, rendering them inefficient for computational camera implementation. In this work, we propose an edge-accelerated reconstruction strategy based on end-to-end sensitivity analysis for single-lens infrared computational cameras. Compatibility-based operator reconfiguration, sensitivity-aware pruning, and sensitivity-aware mixed quantization are employed on edge-artificial intelligence (AI) chips to balance inference speed and reconstruction quality. The experimental results show that, compared to the traditional approach without hardware feature guidance, the proposed strategy achieves better performance in both reconstruction quality and speed, with reduced complexity and fewer MACs. Our single-lens computational camera with edge-accelerated reconstruction demonstrates high-quality, video-level imaging capability in field experiments. This work is dedicated to addressing the practical challenge of real-time edge reconstruction, paving the way for lightweight, low-latency computational imaging applications.

Keywords

computational imaging edge acceleration image restoration model compression sensitivity analysis

1. Introduction

Computational imaging is a novel optical technology, implemented by the joint design of optical capture and computational algorithms^[1,2]. By incorporating computational techniques into imaging processes, based on interdisciplinary collaboration, it significantly relaxes the constraints of optical system design and opens up new possibilities for expanding imaging capabilities. In recent studies, computational imaging not only significantly enhances optical imaging performance but also expands the dimensional scope of information perception, including aspects such as phase, spectrum, polarization, light field, and depth of field^[2–7]. Benefiting from this, we can achieve imaging quality comparable to that of complex optical systems, using simple and compact imaging techniques. With phase differences corrected through backend algorithms, various single-lens imaging strategies have emerged, including deep Fresnel lenses and diffractive optical elements (DOEs)^[7–13]. However, due to the introduction of specialized optical designs for information encoding, single-lens computational imaging requires corresponding image reconstruction algorithms to achieve high-quality image restoration^[7–13]. This results in additional time delays, as well as increased computational and power demands, which hinder high-speed, low-power optical applications such as unmanned aerial vehicle (UAV) remote sensing and biomedical imaging. Therefore, a low-latency reconstruction algorithm with high computational and power efficiency, capable of running at the edge, has become essential for advancing computational imaging.

The image restoration of computational imaging is an inverse process of inferring target intensity from measured values. The commonly used restoration algorithms can be divided into forward solving algorithms, model-based optimization iterative algorithms, and deep learning algorithms^[2]. Currently, data-driven deep learning with neural networks has become one of the most widely used image reconstruction algorithms, owing to its ability to directly model the relationship between compressed data and optical target patterns^[10–13]. Compared with traditional methods, it can fit the errors between the reconstruction model and actual imaging process precisely, and achieve higher efficiency by combining with a graphics processing unit (GPU). Unfortunately, current research hotspots in reconstruction algorithms primarily focus on improving reconstruction performance. The disregard for parameters and complexity, along with poor adaptability induced by a lack of customization for edge chips, has kept many excellent studies from advancing beyond the lab. This is because latency and frame rate in practical applications of computational imaging remain significant challenges due to the power and computing limitations of edge chips^[14–16].

Model compression of reconstruction neural networks is a crucial approach for accelerating computational imaging at the edge. Lightweight compression techniques, such as pruning, quantization, distillation, and neural architecture search (NAS), have demonstrated promising results in computer vision and remote sensing applications^[17–24]. Among these, pruning and quantization, as the earliest and most mature methods, have made initial attempts in edge acceleration for non-computational imaging restoration tasks^[25,26]. To optimize the trade-off between efficiency and restoration accuracy, sensitivity-based pruning strategies have been widely adopted in many studies^[27–31]. A layer-wise sensitivity analysis of U-Net reveals that many redundant filters exist in the innermost layers near the bottleneck, making them highly amenable to pruning. Based on this observation, this work achieved a $10 \times$ reduction in complexity while maintaining baseline performance^[27]. Furthermore, a long short-term memory (LSTM) strategy was used as an evaluation tool to identify the least important layers, generating pruning decisions for a given network^[28]. The LSTM was updated using the policy gradient method, with both model performance and complexity as rewards, achieving a 70% reduction in floating-point operations per second (FLOPs) for visual geometry group (VGG). Similar to pruning, researchers at the University of California, Berkeley developed a mixed-precision quantization method based on sensitivity analysis, assigning higher bit precision to more sensitive layers and lower bit precision to less sensitive ones^[32,33]. Building on this, automatic bit precision selection for different layers, guided by the Hessian matrix, was later developed and demonstrated superior performance across a wide range of models. However, the optimization objectives in these studies primarily focus on parameters and FLOPs, sometimes maybe multiply-accumulate operations (MACs), which may not be fully aligned with the inference time on edge artificial intelligence (AI) chips, such as neural processing unit (NPU) and field programmable gate array (FPGA). This is mainly because deploying algorithms to edge chips is a complex, multifaceted issue, involving factors such as operator configuration, chip architecture, hardware design, and memory access bottleneck^[14–16]. Although pruning optimizations targeting power consumption have been developed^[29], directly optimizing compression for edge inference speed and latency remains a significant challenge. More importantly, simply applying these methods to computational imaging often fails to yield optimal results due to the lack of integrated physical constraints and guidance during image capture^[13]. The relationship among neural network parameters, MACs, and the actual operational efficiency of edge devices is a complex, multifactor, nonlinear coupling problem. Direct edge performance metrics of a model—such as latency, throughput, or energy consumption—do not always improve with the optimization of parameters, FLOPs and MACs. Recently, research on hardware-aware compression for deep neural networks (DNNs) has gained traction. Verilog register transfer language (RTL) hardware models, such as Eyeriss-based DNN accelerators, are used to simulate hardware characteristics, enabling targeted improvements in edge-side compression efficiency, including power consumption, memory usage, and inference time, while also taking model performance into account^[34–38]. Building on this foundation, joint optimization strategies that integrate multiple compression techniques have begun to emerge^[34]. Although this approach allows for flexible exploration of hardware behaviors, it has not yet been translated into generic application-specific integrated circuit (ASIC) fabrication or deployed in real-world engineering applications. As a result, its applicability to embedded systems, such as computational imaging cameras, remains severely constrained.

In this paper, we propose an edge-accelerated reconstruction strategy based on end-to-end sensitivity analysis for single-lens computational imaging systems, as shown in Fig. 1. The on-chip performance of the restoration algorithm, including recovery quality and inference speed, is used as the optimization objective to achieve enhanced in situ reconstruction results within the camera. To ensure optimal compatibility between the selected chip and the restoration algorithm, jointly trained with a single-lens computational imaging system, a performance evaluation of the operators in the reconstruction network is initially conducted on the edge. Operators with poor hardware support are removed or replaced at this stage. Meanwhile, sensitivity analysis of the network for pruning and quantization is performed on the edge chip to provide guidance and constraints for the following model compression. Building on this foundation, we apply higher pruning ratios and lower quantization precision to less sensitive blocks, while applying lower pruning ratios and higher quantization precision to more sensitive ones. The proposed model compression strategy, guided by detailed hardware sensitivity, strikes a balance between reconstruction quality and computational efficiency. Finally, the optimized models are deployed on the target AI chip to evaluate the edge acceleration effect. The experimental results show that, compared to the traditional approach without hardware feature guidance, the proposed strategy achieves better performance in both reconstruction quality and speed, with reduced complexity and fewer MACs. Our work demonstrates the effectiveness of edge sensitivity analysis in in situ image restoration for single-lens computational imaging and paves the way for lightweight, low-latency computational imaging applications.

Figure 1.The proposed edge acceleration framework for in situ reconstruction of computational imaging. Operator reconfiguration is first conducted for the selected AI chip. Then, the sensitivity of pruning and quantization is characterized for each layer or block. Next, sensitivity-aware pruning and quantization are sequentially performed following the guidance. Finally, the compressed model is deployed to the chip for acceleration.

In summary, the specific contributions of our work are as follows:

1)We propose an edge-accelerated reconstruction strategy for infrared single-lens computational imaging, effectively balancing the speed and performance of in situ image restoration.
2)We develop an end-to-end sensitivity analysis framework to model the nonlinear relationships between the parameters and MACs of neural networks and their actual operational efficiency on edge devices.
3)We introduce the compatibility-based operator reconfiguration, sensitivity-aware pruning, and mixed quantization techniques to scientifically guide the model compression process, addressing the challenges of complex multifactor coupling during edge deployment.
4)We perform dataset and real-world image restoration experiments using a single-lens camera integrated with the RK3588 NPU chip, verifying the effectiveness and practical engineering potential of the proposed method.

2. Related Work

Our work relates to image restoration for computational imaging, edge acceleration, and neural network compression, primarily focusing on pruning and quantization.

2.1. Image restoration

Traditional optical restoration methods, such as rain and fog removal, primarily address the impact of environmental interference on image quality, including factors like atmospheric scattering^[39–41]. In contrast, image restoration for computational imaging emphasizes the physical constraints of the imaging system and requires a combination of hardware and algorithms for reconstruction. Currently, most computational imaging reconstruction efforts focus on enhancing restoration performance and increasing the dimensionality of light field information. Reference [42] proposed a transformer-based U-Net model to achieve high-quality imaging with a single-lens system. Reference [10] assessed the achromatic effect of Restormer in natural scenes of the real world. Excellent hyperspectral depth imaging performance was achieved through the joint optimization of DOEs and a neural network^[43,44]. Reference [3] used deep learning methods to decode polarization and spectral information, mapping all high-dimensional light field data to a single imaging process. In recent research, lightweight reconstruction methods have emerged to achieve video-level reconstruction of hyperspectral imaging^[45,46].

2.2. Edge acceleration

Edge acceleration refers to the optimization and acceleration of neural network models on edge devices, such as smartphones, cameras, and embedded systems. Since these devices typically have limited computational power and resources, offloading neural network inference tasks from the cloud to the edge can significantly reduce latency, improve real-time processing capabilities, and lower bandwidth requirements. Reference [47] proposed an intelligent coscheduling framework for efficient super-resolution, leveraging heterogeneous computational resources on an edge system on chip (SoC), including a central processing unit (CPU), a GPU, and an NPU. The combination of the Winograd algorithm and FPGA effectively lowers the computational cost of convolution operations while significantly boosting hardware inference speed^[48,49]. References [50,51] implemented integrated storage and computing using compute-in-memory technology and neuro-inspired memristors for energy-efficient edge computing. Additionally, the model compression techniques discussed in the subsequent section were also used to reduce model size and computational complexity, facilitating better adaptation to the resource constraints of edge devices.

2.3. Model compression

Commonly used model compression techniques include pruning, quantization, knowledge distillation, and NAS. Our work primarily focuses on hardware-aware pruning and quantization.

2.3.1. Model pruning

Model pruning reduces the computational complexity and storage requirements of a neural network by eliminating redundant parameters, thereby improving the model’s operational efficiency without significantly compromising its performance. References [52,53] used unstructured pruning to remove redundant elements from the weight matrix. However, on hardware without specialized support, the efficiency improvements from fine-grained sparse matrix computation are limited. Structured pruning directly removed entire channels, convolution kernels, or layers^[31,54,55]. This approach enhanced compatibility with the parallel computing capabilities of existing hardware (such as NPUs and CPUs) and facilitated deployment on edge devices. References [31,54,55] proposed various evaluation criteria to guide pruning, including the $ℓ_{1}$ -norm and geometric center distance. Recently, a more direct pruning strategy based on sensitivity analysis has been used for performance optimization^[27–29].

2.3.2. Model quantization

Quantization reduces model size and computational complexity by lowering the precision of numerical representations within the model. It is also a crucial technique for deploying models on edge hardware platforms, such as NPUs and FPGAs. References [56,57] used post-training static quantization (PTQ) to quantize model parameters without requiring additional training or data. References [58,59] used quantization-aware training (QAT) to explicitly introduce quantization errors into the optimization process during training, enabling the model to adapt to low-precision constraints and minimizing performance loss. Extremely low-bit quantization, such as binary networks, has been applied for exploration^[19,60], but challenges remain in edge deployment. Similar to pruning, mixed quantization techniques based on sensitivity analysis have also emerged and shown preliminary applications^[32,33].

2.3.3. Hardware-aware compression

In general, compression efficiency at the edge primarily targets power consumption, memory usage, and inference time, while also considering model performance^[34–38]. The key to hardware-aware model compression lies in accurately modeling hardware characteristics. State-of-the-art research predominantly employs joint optimization strategies that integrate multiple compression techniques—such as combining pruning with mixed-precision quantization, merging fine-grained and coarse-grained pruning, and incorporating parameter sharing with structural simplification^[34,36–38]. These methods have been shown to substantially enhance overall efficiency on edge devices. Concurrently, some studies focus on enabling learning-based automatic optimization and deployment without the need for retraining^[34].

3. Method

In this section, we propose an edge-accelerated reconstruction method based on end-to-end sensitivity analysis for single-lens computational imaging systems. Before diving into the details, we briefly explain the basic idea of this work. The standpoint of our study is based on the fundamental premise that, at the current level of technology, the parameters and MACs of neural networks are not directly or linearly correlated with their actual operational efficiency on edge chips. This is mainly due to the complex multifactor coupling challenges during deployment, including matching operators with logic circuits and coordinating instruction execution with data access. Therefore, in a single-lens computational imaging system, it is crucial to perform end-to-end optimization based on the model’s actual performance at the edge. In this work, we first generate a degraded dataset using finely calibrated point spread functions (PSFs) designed for our single-lens infrared computational camera, shown in Fig. 2(a), as reported recently^[61]. The baseline restoration network incorporating physical constraints, as shown in Fig. 2(b), is further trained on the constructed dataset. More importantly, to enable video-level in situ reconstruction of infrared images on the AI chip inside the camera, we performed end-to-end model compression guided by edge sensitivity. The details of the edge acceleration framework shown in Fig. 1 are described as follows:

1)Operator reconfiguration: Evaluate the compatibility of all operators in the baseline network with the specified edge-AI chip, and use the results to optimize operator fusion, deletion, or replacement.
2)Edge sensitivity analysis: Directly perform end-to-end edge sensitivity analysis for pruning and quantization on the specified edge-AI chip, deriving sensitivity rules for the selected networks.
3)Sensitivity-aware pruning: Perform non-uniform pruning for different layers or blocks under the guidance of sensitivity rules.
4)Sensitivity-aware quantization: Perform mixed quantization for the pruned network under the guidance of sensitivity rules.
5)Edge deployment and evaluation: Convert the compressed model and deploy it to the edge chip inside the camera for verification, thereby completing edge-accelerated reconstruction.

Figure 2.(a) The prototype of used single-lens infrared computational camera. (b) The architecture of the original network used in this work.

3.1. Operator reconfiguration

The foundation of this work is that the same network may exhibit varying performance on different edge-AI chips, primarily due to differences in compatibility of the corresponding operators with the hardware architecture. The proposed compatible optimization can be interpreted as follows. First, decompose the baseline network and classify the operators. Generate a time consumption report for network inference on the selected edge chip, including the proportion of each operator and layer in the total inference time. Next, considering the computational complexity, importance, and edge inference time of operators, identify those with poor adaptability. By combining hardware characteristics, operators fusion, elimination, and replacement are applied to address the blocking point. Thus, we obtain an equivalent network optimized for compatibility with the selected edge chip.

3.2. Edge sensitivity analysis

The proposed edge sensitivity analysis is based on the network optimized for compatibility, as described above. The smallest unit for sensitivity analysis can be either a layer or a block, where a block refers to the combination of multiple adjacent layers. Consider a network $N = {L_{1}, L_{2}, \dots, L_{n}}$ consisting of $n$ layers, where $L_{i}$ denotes the $i$ th layer of the network. The edge performance of the network is denoted as $P$ , which is expressed by a combination of various indices of interest, with the assumption that a higher $P$ is preferable. After pruning or quantizing $L_{i}$ , the performance of the network changes to $P_{i}$ . We define $Δ P_{i} = P - P_{i}$ , and typically $Δ P_{i} > 0$ . After repeating $n$ times, we obtain $Δ P = {Δ P_{1}, Δ P_{2}, \dots, Δ P_{n}}$ . $S_{P, i}$ and $S_{Q, i}$ are used to represent the edge sensitivities of the $i$ th layer to pruning and quantization, respectively. Thus, $S_{P, i}$ and $S_{Q, i}$ are expected to be positively correlated with their respective $Δ P_{i}$ . To better characterize edge sensitivity, we propose the following equation for the sensitivity curve: $S_{*, i} = 1 - e^{- α \frac{Δ P_{i}}{P}} .$ (1)Here, $α$ is a scaling factor used to normalize $\frac{Δ P_{i}}{P}$ to the range [0, 1], in order to approximate the pruning ratio as closely as possible.

Specifically, the edge sensitivity for pruning can be measured as follows: Pruning the $i$ th layer of the network with a fixed rate yields network $N_{P, i}$ . The network is then fine-tuned and deployed on the target AI chip to evaluate its edge performance $P_{i}$ . By repeating this process $n$ times, we obtain the pruning sensitivities $S_{P,1}, S_{P,2}, \dots, S_{P, n}$ for all layers. Similarly, for quantization sensitivity, low-bit quantization is applied to the $i$ th layer of the network to obtain network $N_{Q, i}$ . The remaining steps are the same as those used to assess pruning sensitivity, resulting in the quantization sensitivities $S_{Q,1}, S_{Q,2}, \dots, S_{Q, n}$ for all layers. It is worth noting that selecting the pruning ratio and quantization bit-width at this stage can be challenging. This is primarily due to the complexity of neural network operations on hardware, which makes accurate formulation difficult. Therefore, it is recommended to conduct multiple trials and adjustments when first using a new AI chip.

3.3. Sensitivity-aware pruning

We denote $M_{i}$ and $M_{i + 1}$ as the input and output channels of the $i$ th layer, respectively, and $F_{i, j}$ as the $j$ th filter of the $i$ th layer. Thus, $L_{i} = {F_{i, j} \in R^{M_{i} \times K \times K}, 1 \leq j \leq M_{i + 1}}$ , where $K$ is the size of the convolution kernel. The pruning ratio for the $i$ th layer is defined as $P R_{i} = \frac{M_{i + 1} - M_{i + 1}^{'}}{M_{i + 1}} .$ (2)Here, $M_{i + 1}^{'}$ represents the remaining output channel of the $i$ th layer after pruning.

The intuitive explanation for sensitivity-aware pruning is that less sensitive layers can tolerate higher pruning ratios, while more sensitive layers can only tolerate lower pruning ratios. With the pruning sensitivities ${S_{P,1}, S_{P,2}, \dots, S_{P, n}}$ obtained from the previous step for each layer, the target pruning ratio of the $i$ th layer is represented as $P R_{i} = β \cdot S_{P, i} s.t. M_{i + 1}^{'} {= 2}^{x}, x \in N^{+},$ (3)where $i = 1, 2, \dots, n$ , and $β$ represents the conversion factor from sensitivity to pruning ratio, which is largely determined by the edge performance of $P_{i}$ . The key to selecting the pruning ratio is that it is not only related to $S_{P, i}$ but also constrained by $M_{i + 1}^{'}$ . The reason for this is that, due to the limitations of circuit hardware characteristics, not all channel reductions are meaningful for edge hardware. The optimal number is typically a power of 2. Specifically, channels greater than $2^{x}$ and less than $2^{x + 1}$ are likely to be aligned and computed in parallel using a logical resource of $2^{x + 1}$ .

Once the pruning ratios for all layers are determined, the $ℓ_{2}$ -norm is used as the importance criterion to decide which filters will be pruned, in ascending order. For a specific filter, its $ℓ_{2}$ -norm is defined as ${‖ F_{i, j} ‖}_{2} = \sqrt{\sum_{x = 1}^{M_{i}} \sum_{y = 1}^{K} \sum_{z = 1}^{K} {| w_{x, y, z} |}^{2}} .$ (4)

A filter’s $ℓ_{2}$ -norm reflects its overall contribution to the network’s output. Filters with smaller $ℓ_{2}$ -norms are considered less active and less important, making them good candidates for pruning. In other words, filters with smaller $ℓ_{2}$ -norms are directly removed according to the edge pruning ratio. Finally, considering the changes in complexity, the pruned network should perform fine-tuning to achieve optimal performance.

3.4. Sensitivity-aware quantization

The method of PTQ with mixed precision is used before model deployment in our work. Given a floating-point weight $w$ , the process of quantizing it to a fixed-point weight $w_{int}$ is as follows: $w_{int} = clip (round (w / s) + z, - 2^{b - 1} {,2}^{b - 1} - 1) .$ (5)Here, $round (\cdot)$ refers to the rounding operation, $s$ is the quantization scale factor, $z$ is the quantization zero-point, $b$ is the quantization bit-width, and $clip (\cdot)$ denotes the clipping operation. The MinMax strategy is used for quantization calibration to determine the activation values and weights of the model.

Similar to pruning, less sensitive layers can tolerate lower quantization bit-widths, while more sensitive layers require higher bit-widths. At the same time, the specific bit-width must be constrained by the hardware support of edge chips. If the chip is not a dedicated ASIC, this value is typically a power of 2.

3.5. Edge deployment and evaluation

The evaluation metrics on the edge include both performance and efficiency. For performance evaluation, image reconstruction quality is assessed using the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM), with testing conducted on the chip. For efficiency evaluation, both latency and power consumption are considered. Due to the utilization of all available computing power in our research, both latency and power consumption can be equivalently expressed using frame rate. As discussed above, the parameters and MACs of neural networks are not directly used to guide model compression. However, testing and discussion are still conducted in this article to support our argument. In addition, we test the modulation transfer function (MTF) after in situ reconstruction on the edge, using it to evaluate the optical performance of single-lens computational imaging.

4. Experiments

4.1. Camera, dataset, and model

The single-lens camera used in this paper was previously designed by our team^[61], as shown in Fig. 2(a), through a collaborative effort between optical design and restoration algorithms. The system has a spectral range of 8–12 µm, a focal length of 70 mm, and an F-number of 1.0. It integrates an uncooled infrared detector with a resolution of 640 pixel × 480 pixel. The goal of this work is to directly deploy the image restoration algorithm within the camera, which was initially run on a GPU. Thus, an RK3588 NPU chip is integrated into the camera to enable in situ reconstruction of infrared images.

The dataset used in the edge acceleration was constructed by our team based on the characteristics of the single-lens camera. The PSF of the single-lens camera, after detailed calibration, is used to simulate the degradation of clear images. Blurry images are generated through a convolution operation, with the clear images serving as the ground truth (GT). So far, blurry images with simulated detector noise can be input into the network for training. The dataset includes multiple categories, such as buildings, vehicles, humans, and optical target patterns. According to the ratio of 9:1, 10,068 images in the datasets are divided into a training set and a test set.

As shown in Fig. 2(b), a classical U-Net architecture is used for data reconstruction in single-lens computational imaging. The downsampling blocks consist of two convolution (Conv) layers with LeakyReLU activation followed by a MaxPooling layer. The upsampling blocks comprise a transposed convolution (ConvT) layer, followed by concatenation with the output from the downsampling blocks, and then two additional Conv layers. The input and output dimensions of the network are both $640 \times 480$ , as determined by the resolution of the infrared detector.

4.2. Implementation details

We first introduce the settings for training, testing, and fine-tuning. To ensure a fair comparison between different methods, we train the networks using the same settings and number of iterations. Specifically, we use the Adam optimizer with $β_{1} = 0.9$ and $β_{2} = 0.999$ . The initial learning rate is set to $1 \times 10^{- 4}$ , and the batch size is 2. The learning rate remains constant throughout training, and the loss function combines $L_{2}$ loss with VGG-based perceptual loss. Additionally, random detector noise is added to the dataset during training. For training and fine-tuning, the numbers of epochs are 300 and 100, respectively. Gaussian noise with a mean ( $μ$ ) of 0 and a variance ( $σ^{2}$ ) of 0.006 is added to the natural scene images in the test set. The basis for noise addition is explained in detail in the Supplement 1. PyTorch-OpCounter is used to calculate the model’s parameters and MACs. All the experiments are implemented with PyTorch on an NVIDIA A40 GPU.

As for the implementation details for edge acceleration, the DepGraph mechanism from the Torch-Pruning toolkit^[62] is used to facilitate efficient parameter grouping in this work, simplifying the pruning process. Specifically, a pruning ratio dictionary can be defined for each layer or block. The optimized model is deployed to the RK3588 chip for edge inference testing, with RKNN-Toolkit2 (v2.0.0) used for model quantization and conversion. A subset of images from the dataset is used for calibration during the quantization process. The RKNPU driver version is v0.9.6, and the inference process utilizes all three cores of the NPU.

4.3. Operator reconfiguration results

We first pre-deployed the original U-Net network onto the RK3588 chip for hardware compatibility testing. The experimental results of the pre-deployment are shown in Fig. 3. Significantly, Conv and LeakyReLU are fused by default, as this is a common optimization technique. We obtained a total inference time of 74 ms, with a frame rate of 13.5 fps for 8-bit integer quantization (INT8), which does not meet the real-time imaging requirements. We analyzed the time consumption of each operator in the network and found that Conv and ConvT operations account for the majority of the network’s inference time. This is reasonable since most computations are concentrated in these two operations. However, the MaxPooling operator used for downsampling accounts for about 10% of the inference time, which is not cost-effective. In addition, some independent LeakyReLU operations remain because the RK3588 does not support hardware-level fusion of LeakyReLU and ConvT layers.

Figure 3.Edge inference time of operators optimized by default.

Thus, we further removed all MaxPooling layers and set the stride of the preceding Conv layers to 2, effectively achieving downsampling. Considering both time consumption and importance, we removed all LeakyReLU layers after the ConvT layer, with no significant decrease in performance. Detailed ablation experiments evaluating the impact of performance are provided in the Supplement 1. After these optimizations, the network inference time was reduced by approximately 10 ms.

4.4. Sensitivity results

We conducted edge sensitivity analysis for pruning on the network after operator reconfiguration, using blocks as the minimum unit. To obtain more comprehensive sensitivity results, pruning was performed on each block using three fixed rates: 25%, 50%, and 75%. Figure 4 presents the pruning sensitivity results, comparing before and after quantization, as well as before and after fine-tuning, for both PSNR and SSIM. PSNR and SSIM both exhibited a decrease in performance after pruning across various testing conditions. The curves are quite similar, with the outer blocks demonstrating greater sensitivity compared to the middle ones. The performance of INT8 is generally lower than FP32, which is an inevitable consequence of quantization. In addition, we observe that while the trend of the results after fine-tuning remains consistent with that before retraining, the effect becomes less pronounced. This is because the optimization goal of fine-tuning is to enhance performance, which compensates for the loss caused by pruning in certain layers. However, this also diminishes the observed sensitivity pattern to some extent. Therefore, we ultimately used the on-chip INT8 test results before fine-tuning, with PSNR and SSIM weighted equally, to calculate the sensitivity metric. As for fine-tuning, it is just performed for final performance testing after completing sensitivity-aware pruning.

Figure 4.(a)–(f) Performance degradation caused by pruning at different stages. The black lines indicate the performance of the unpruned model without pruning, while the colored lines represent the performance after pruning the corresponding proportion of each block individually. For example, the red marker on the horizontal axis of D1 denotes the performance after pruning 25% of the D1 block, based on the original model.

Quantization sensitivity is measured in a similar manner. However, since the RK3588 chip currently only supports 16-bit floating point (FP16) and INT8 precisions, the experiments that can be conducted are relatively limited. We perform INT8 quantization on each block and FP16 quantization on the remaining layers, then test the edge performance. The network without pruning was quantified to FP16, as the baseline of quantization. Both the unpruned network and the network with 50% uniform pruning for all blocks were tested simultaneously. The experimental results in Fig. 5 indicate that for models with no pruning and uniform pruning, their quantization sensitivity is consistent. The curves drop sharply only in the last block, indicating that it is more sensitive to quantization. Considering the combined effects of pruning and quantization, we calculated the edge sensitivity using a network with 50% uniform pruning, with PSNR and SSIM weighted equally.

Figure 5.(a), (b) Performance degradation caused by quantization for different blocks. The black lines indicate the performance of the unpruned model with FP16 quantization, while the colored lines represent the performance of both the unpruned model and the uniformly 50%-pruned model under INT8 quantization, applied to each block individually. For example, the green marker on the horizontal axis of D1 denotes the performance after applying INT8 quantization to the D1 block based on the original model.

The sensitivity $S_{P, i}$ and $S_{Q, i}$ of each block are shown in Fig. 6. As for pruning, though the results obtained at different ratios were generally consistent, we ultimately referred to the results at a 50% ratio, as it was closer to the compression ratio we aimed for. For a specific edge chip, sensitivity performance can vary significantly across different networks, and even between layers within the same network. We observed that the value of $α$ is approximately equal to the average of the reciprocal values of $\frac{Δ P_{i}}{P}$ . Thus, for the edge chip of RK3588 used in this paper, the scale factor of $α$ was set to 8 both for $S_{P, i}$ and $S_{Q, i}$ .

Figure 6.(a) The edge pruning sensitivity results. (b) The edge quantization sensitivity results.

4.5. Compression results

In the model compression evaluation experiment, the pruning ratio and quantization bit-width settings for each block are shown in Tables 1 and 2. For pruning, considering the edge acceleration ratio, pruning sensitivity, and hardware constraints comprehensively, we tested two combinations of pruning ratios (Sensitive-B & Sensitive-C), with the only difference occurring in the middle block. As a comparison, we conducted tests under identical conditions on both the baseline network and the uniform pruning network with a 50% ratio simultaneously, which was used in our previous work^[61]. In addition, we conducted ablation experiments by testing pruning ratios without considering hardware compatibility, including both uniform pruning and sensitivity-aware pruning (Sensitive-A). The specific number of channels corresponding to different pruning ratios is provided in detail in the Supplement 1. For quantization, we tested three methods with the bit-widths of FP16, INT8, and mixed precision. All the results are presented in Table 3.


Pruning method	D1	D2	D3	D4	C1	U4	U3	U2	U1
Unprune	0	0	0	0	0	0	0	0	0
Uniform-50%^[61]	0.5	0.5	0.5	0.5	0.5	0.5	0.5	0.5	0.5
Uniform-60%	0.6	0.6	0.6	0.6	0.6	0.6	0.6	0.6	0.6
Sensitive-A	0.125	0.25	0.5	0.75	0.875	0.75	0.5	0.25	0.125
Sensitive-B	0.5	0.5	0.5	0.75	0.75	0.75	0.5	0.5	0.5
Sensitive-C	0.5	0.5	0.5	0.75	0.875	0.75	0.5	0.5	0.5

Table 1. Pruning Ratio Settings for Each Block in Different Methods.

View all Tables


Quantization method	D1	D2	D3	D4	C1	U4	U3	U2	U1
FP16 quantization	FP16	FP16	FP16	FP16	FP16	FP16	FP16	FP16	FP16
INT8 quantization	INT8	INT8	INT8	INT8	INT8	INT8	INT8	INT8	INT8
Mixed quantization	INT8	INT8	INT8	INT8	INT8	INT8	INT8	INT8	FP16

Table 2. Quantization Settings for Each Block in Different Methods.

View all Tables


Model	Params	MACs	FP16	INT8	Mixed
PSNR	SSIM	FPS	PSNR	SSIM	FPS	PSNR	SSIM	FPS
Unprune	8.63M	68.12G	36.80	0.9528	8.02	34.61	0.9245	17.07	34.99	0.9387	12.28
Uniform-50%^[61]	3.92M	28.44G	36.55	0.9518	18.37	34.76	0.9148	29.70	35.60	0.9414	23.68
Uniform-60%	3.06M	21.33G	36.43	0.9512	7.80	34.14	0.8885	10.85	35.45	0.9373	9.21
Sensitive-A	1.80M	33.96G	36.76	0.9528	6.13	34.87	0.9190	10.71	35.87	0.9444	6.72
Sensitive-B	2.15M	23.67G	36.67	0.9523	21.52	34.78	0.9158	32.97	35.92	0.9431	25.22
Sensitive-C	1.71M	23.14G	36.70	0.9522	21.62	34.42	0.9094	33.13	35.87	0.9429	26.03

Table 3. Results of Model Compression Evaluation Experiments.

View all Tables

The following conclusions can be drawn from the experimental data of Table 3: 1) Pruning has significantly improved the speed of edge inference. Compared to uniform pruning, sensitivity-aware pruning not only boosts speed but also results in higher edge performance. 2) Compared to a fixed quantization bit-width, sensitivity-aware mixed quantization strikes a desired balance between speed and edge performance. 3) Hardware constraints must be strictly adhered to, as pruning ratios that do not comply with these constraints result in poor inference time for both uniform and sensitivity pruning. This is the key factor that leads to the inconsistency between edge efficiency and network complexity, such as the parameters and MACs in Table 3.

Considering both inference time and edge performance, we ultimately selected Sensitive-B as the final edge-accelerated reconstruction solution for our single-lens computational camera. This is because it achieved optimal PSNR and SSIM while maintaining frame rates greater than 25 Hz.

4.6. Reconstruction ablation experiments

We used the unpruned model with FP16 quantization, the 50% uniform pruning model with INT8 quantization, and the sensitivity-aware pruning model with mixed quantization to conduct on-chip image restoration tests on typical scenes from the dataset. The experimental results are shown in Figs. 7 and 8, where the sensitivity-aware compression achieved performance comparable to the unpruned network. Specifically, as shown in Fig. 7, sensitivity-aware compression yields better reconstruction of fine cloud textures compared to uniform compression, closely matching the unpruned network performance. Figure 8 illustrates the enhancement effect of the proposed method in building and vegetation, demonstrating improved performance in restoring and extracting high-frequency details, such as building textures and branches. In summary, the proposed method achieves superior performance in both visual perception and specific evaluation metrics, while maintaining the same network complexity and improving edge reconstruction speed.

Figure 7.Ablation experimental results on reconstruction focus on the details of clouds in local areas. Sensitivity-aware pruning restores finer texture details within clouds compared to uniform pruning, closely matching the performance of the unpruned network.

Figure 8.Ablation experimental results on reconstruction focus on the details of vegetation in local areas. Sensitivity-aware pruning restores finer texture details of branches compared to uniform pruning, closely matching the performance of the unpruned network.

4.7. Prototype experiments

We use the single-lens camera with integrated edge acceleration to capture optical target patterns for MTF testing. The test results at room temperature (25°C) are shown in Fig. 9. The MTFs across various fields at the Nyquist frequency (42 lp/mm) all exceed 0.5. The MTF test results demonstrate that the edge model with sensitivity-aware compression provides the single-lens camera with excellent high-frequency performance.

Figure 9.Experimental results of MTF testing. The MTFs across various fields at the Nyquist frequency (42 lp/mm) all exceed 0.5, showing excellent high-frequency performance.

We also conducted outdoor experiments at the Siping Road campus of Tongji University, Shanghai, China. The experimental results are shown in Figs. 10 and 11. The images of buildings, natural scenery, and people exhibit excellent clarity and detail. Especially, the capture of extremely distant aircraft, shown in Fig. 11, demonstrates our significant potential in small infrared target detection applications. To demonstrate the real-time performance of the camera at a frame rate of 25 Hz, we provide the corresponding live video in the Supplement 2, including raw blurry videos and clear videos reconstructed in real-time on the RK3588 chip.

Figure 10.Outdoor experimental assessment with real-time on-chip reconstruction.

Figure 11.Experimental results of small infrared target tracking.

5. Conclusion

In this work, we propose an edge-accelerated reconstruction strategy based on end-to-end sensitivity analysis for single-lens infrared computational cameras. The edge performance of the restoration algorithm, deployed on the RK3588 chip, is used as guidance for model compression. Specifically, we used a method based on compatible optimization of operators, sensitivity-aware pruning, and sensitivity-aware mixed quantization to balance the inference speed and reconstruction quality of the model. The proposed model compression strategy, guided by detailed hardware sensitivity, achieves better performance in both reconstruction quality and speed, with reduced complexity and fewer MACs. The experimental results indicate that, compared to uniform pruning and quantization, sensitivity-aware compression significantly improves performance, particularly in enhancing high-frequency details and suppressing noise. The excellent field experimental results demonstrate the practical potential of our method for high-speed, high-quality video reconstruction in computational imaging. Our edge-accelerated reconstruction method achieves a balance between performance and efficiency through the joint optimization of hardware and software, paving the way for the application of lightweight, low-latency computational imaging in fields such as UAV-based optical monitoring and in situ medical examination.

References

[1] A. Bhandari, A. Kadambi, R. Raskar. Computational Imaging(2022).

[2] X. Hu et al. Metasurface-based computational imaging: A review. Adv. Photonics, 6, 14002(2024). https://doi.org/10.1117/1.AP.6.1.014002

[3] L. Bian et al. A broadband hyperspectral image sensor with high spatio-temporal resolution. Nature, 635, 73(2024). https://doi.org/10.1038/s41586-024-08109-1

[4] Y. Fan et al. Dispersion-assisted high-dimensional photodetector. Nature, 630, 77(2024). https://doi.org/10.1038/s41586-024-07398-w

[5] W. Zhang et al. Handheld snapshot multi-spectral camera at tens-of-megapixel resolution. Nat. Commun., 14, 5043(2023). https://doi.org/10.1038/s41467-023-40739-3

[6] L. Huang et al. Spectral imaging with deep learning. Light Sci. Appl., 11, 61(2022). https://doi.org/10.1038/s41377-022-00743-6

[7] S.-H. Baek et al. Single-shot hyperspectral-depth imaging with learned diffractive optics, 2651(2021).

[8] J. Wu, L. Cao, G. Barbastathis. DNN-FZA camera: a deep learning approach toward broadband FZA lensless imaging. Opt. Lett., 46, 130(2021). https://doi.org/10.1364/OL.411228

[9] Y. Peng et al. Computational imaging using lightweight diffractive-refractive optics. Opt. Express, 23, 31393(2015). https://doi.org/10.1364/OE.23.031393

[10] Y. Peng et al. Learned large field-of-view imaging with thin-plate optics. ACM Trans. Graph., 38, 219(2019). https://doi.org/10.1145/3355089.3356526

[11] Y. Liu et al. End-to-end computational optics with a singlet lens for large depth-of-field imaging. Opt. Express, 29, 28530(2021). https://doi.org/10.1364/OE.433067

[12] B. Qi et al. All-day thin-lens computational imaging with scene-specific learning recovery. Appl. Opt., 61, 1097(2022). https://doi.org/10.1364/AO.448155

[13] Y. Liu et al. Research advances in simple and compact optical imaging techniques. Acta Phys. Sin., 72, 084205(2023). https://doi.org/10.7498/aps.72.20230092

[14] J. Chen, X. Ran. Deep learning with edge computing: a review. Proc. IEEE, 107, 1655(2019). https://doi.org/10.1109/JPROC.2019.2921977

[15] D. Liu et al. Bringing AI to edge: from deep learning’s perspective. Neurocomputing, 485, 297(2022). https://doi.org/10.1016/j.neucom.2021.04.141

[16] S. Deng et al. Edge intelligence: the confluence of edge computing and artificial intelligence. IEEE Internet Things J., 7, 7457(2020). https://doi.org/10.1109/JIOT.2020.2984887

[17] F. Yu et al. EasiEdge: a novel global deep neural networks pruning method for efficient edge computing. IEEE Internet Things J., 8, 1259(2021). https://doi.org/10.1109/JIOT.2020.3034925

[18] X. Guo et al. Network pruning for remote sensing images classification based on interpretable CNNS. IEEE Trans. Geosci. Remote Sens., 60, 5605615(2022). https://doi.org/10.1109/TGRS.2021.3077062

[19] B. Zhao et al. 4-bit CNN quantization method with compact LUT-based multiplier implementation on FPGA. IEEE Trans. Instrum. Meas., 72, 2008110(2023). https://doi.org/10.1109/TIM.2023.3324357

[20] N. Tonellotto et al. Neural network quantization in federated learning at the edge. Inform. Sci., 575, 417(2021). https://doi.org/10.1016/j.ins.2021.06.039

[21] C.-C. Tsai, J.-I. Guo. IVS-caffe—hardware-oriented neural network model development. IEEE Trans. Neural Networks Learn. Syst., 33, 5978(2022). https://doi.org/10.1109/TNNLS.2021.3072145

[22] J. Gou et al. Knowledge distillation: a survey. Int. J. Comput. Vis., 129, 1789(2021). https://doi.org/10.1007/s11263-021-01453-z

[23] Z. Chen et al. MNGNAS: Distilling adaptive combination of multiple searched networks for one-shot neural architecture search. IEEE Trans. Pattern Anal. Mach. Intell., 45, 13408(2023). https://doi.org/10.1109/TPAMI.2023.3289667

[24] Y. Xue, X. Han, Z. Wang. Self-adaptive weight based on dual-attention for differentiable neural architecture search. IEEE Trans. Ind. Inf., 20, 6394(2024). https://doi.org/10.1109/TII.2023.3348843

[25] X. Shi et al. Memory-oriented structural pruning for efficient image restoration, 37, 2245(2023).

[26] J. Oh et al. Attentive fine-grained structured sparsity for image restoration, 17673(2022).

[27] B.-K. Kim, S. Choi, H. Park. Cut inner layers: a structured pruning strategy for efficient U-Net gans(2022).

[28] G. Ding et al. Where to prune: using LSTM to guide data-dependent soft pruning. IEEE Trans. Image Process., 30, 293(2021). https://doi.org/10.1109/TIP.2020.3035028

[29] A. Jayasimhan, P. Pabitha. ResPrune: an energy-efficient restorative filter pruning method using stochastic optimization for accelerating CNN. Pattern Recognit., 155, 110671(2024). https://doi.org/10.1016/j.patcog.2024.110671

[30] X. Liu et al. Compressing cnns using multilevel filter pruning for the edge nodes of multimedia internet of things. IEEE Internet Things J., 8, 11041(2021). https://doi.org/10.1109/JIOT.2021.3052016

[31] Y. He et al. Filter pruning via geometric median for deep convolutional neural networks acceleration(2019).

[32] Z. Dong et al. HAWQ: Hessian aware quantization of neural networks with mixed-precision(2019).

[33] Z. Dong, H. Larochelle et al. HAWQ-V2: Hessian aware trace-weighted quantization of neural networks. Advances in Neural Information Processing Systems, 18518(2020).

[34] K. Balaskas et al. Hardware-aware DNN compression via diverse pruning and mixed-precision quantization. IEEE Trans. Emerging Top. Comput., 12, 1079(2024). https://doi.org/10.1109/TETC.2023.3346944

[35] Y.-H. Chen et al. Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits, 52, 127(2017). https://doi.org/10.1109/JSSC.2016.2616357

[36] S. Huai et al. On hardware-aware design and optimization of edge intelligence. IEEE Des. Test, 40, 149(2023). https://doi.org/10.1109/MDAT.2023.3307558

[37] J. Xiao et al. HALOC: hardware-aware automatic low-rank compression for compact neural networks, 37, 10464(2023).

[38] A. Desai, A. Krause, K. Zhou, A. Shrivastava et al. Hardware-aware compression with random operation access specific tile (ROAST) hashing, 7732(2023).

[39] J. Su, B. Xu, H. Yin. A survey of deep learning approaches to image restoration. Neurocomputing, 487, 46(2022). https://doi.org/10.1016/j.neucom.2022.02.046

[40] A. Wali et al. Recent progress in digital image restoration techniques: a review. Digit. Signal Process., 141, 104187(2023). https://doi.org/10.1016/j.dsp.2023.104187

[41] L. Zhai et al. A comprehensive review of deep learning-based real-world image restoration. IEEE Access, 11, 21049(2023). https://doi.org/10.1109/ACCESS.2023.3250616

[42] S. Wei et al. Computational imaging-based single-lens imaging systems and performance evaluation. Opt. Express, 32, 26107(2024). https://doi.org/10.1364/OE.527950

[43] Q. Sun et al. Learning rank-1 diffractive optics for single-shot high dynamic range imaging(2020).

[44] C. A. Metzler et al. Deep optics for single-shot high-dynamic-range imaging(2020).

[45] D. Li et al. SpectraTrack: megapixel, hundred-fps, and thousand-channel hyperspectral imaging. Nat. Commun., 15, 9459(2024). https://doi.org/10.1038/s41467-024-53747-8

[46] M. Yako et al. Video-rate hyperspectral camera based on a CMOS-compatible random array of Fabry–Pérot filters. Nat. Photonics, 17, 218(2023). https://doi.org/10.1038/s41566-022-01141-5

[47] Q. Wang et al. An intelligent co-scheduling framework for efficient super-resolution on edge platforms with heterogeneous processors. IEEE Internet Things J., 11, 17651(2024). https://doi.org/10.1109/JIOT.2024.3357898

[48] L. Lu et al. Evaluating fast algorithms for convolutional neural networks on fpgas, 101(2017).

[49] S. Kala et al. UniWiG: unified winograd-GEMM architecture for accelerating CNN on fpgas, 209(2019).

[50] W. Zhang et al. Edge learning using a fully integrated neuro-inspired memristor chip. Science, 381, 1205(2023). https://doi.org/10.1126/science.ade3483

[51] T.-H. Wen et al. Fusion of memristor and digital compute-in-memory processing for energy-efficient edge computing. Science, 384, 325(2024). https://doi.org/10.1126/science.adf5538

[52] S. Han, H. Mao, W. J. Dally. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding(2016).

[53] Y. Tang et al. Manifold regularized dynamic network pruning, 5018(2021).

[54] H. Li et al. Pruning filters for efficient ConvNets(2017).

[55] P. Molchanov et al. Importance estimation for neural network pruning(2019).

[56] K. Chaudhuri, R. Zhao, R. Salakhutdinov et al. Improving neural network quantization without retraining using outlier channel splitting, 7543(2019).

[57] M. Nagel et al. Data-free quantization through weight equalization and bias correction(2019).

[58] S. K. Esser et al. “Learned step size quantization(2020).

[59] R. Gong et al. Differentiable soft quantization: bridging full-precision and low-bit neural networks(2019).

[60] Y. Cai, A. Oh et al. Binarized spectral compressive imaging, 38335(2023).

[61] Y. Xing et al. Real-time high-quality single-lens computational imaging via enhancing lens modulation transfer function consistency. Opt. Express, 33, 5179(2025). https://doi.org/10.1364/OE.552050

[62] G. Fang et al. DepGraph: Towards any structural pruning, 16091(2023).

微信扫一扫：分享

微信扫一扫：分享