
- Chinese Optics Letters
- Vol. 22, Issue 2, 022701 (2024)
Abstract
1. Introduction
3D imaging finds wide applications in systems such as automatic driving target recognition, unmanned aerial vehicle (UAV) automatic navigation, and reconnaissance surveillance. Three main methods are commonly used for 3D imaging: structured light[1–3], binocular stereo imaging[4–7], and time of flight (ToF)[8–10]. ToF imaging has gained significant attention due to its advantages, including medium and long-distance measurement capability, high precision, and strong anti-interference ability. Direct time-of-flight (D-ToF) methods estimate depth information by measuring the flight time of light from the scene to the sensor. Currently, three mainstream methods are used to capture lateral spatial information in ToF imaging: the single-photon avalanche diode (SPAD) array sensor[11–14], single-pixel imaging combined with specialized structured lighting[15–20], and single-pixel imaging combined with scanning techniques[21–27]. Due to the high cost of SPAD array sensors, the low-cost and highly sensitive single-pixel imaging method has gained more attention. In recent years, single-pixel imaging has also produced significant research results. This method utilizes low-cost color LED arrays for structured illumination, enabling color imaging with single photodiodes[28]. Moreover, by adding a small number of photodiodes at different positions, the method can be extended to 3D imaging. While the above method combines multiple single-pixel detectors for 3D imaging, it does not achieve true single-pixel 3D imaging. Ghost imaging employs a computer-controlled spatial light modulator (SLM) to generate speckle pattern illumination on the object, eliminating the requirement for beam splitters and array detectors. Imaging is accomplished by synchronously measuring the light intensity from the bucket detector[15–17,29]. Otherwise, a pulsed laser uniformly illuminates a digital micromirror device (DMD), used to provide structured illumination onto a scene, and the backscattered light is collected onto a photodiode. The measured light intensities are used in a 3D reconstruction algorithm to reconstruct both depth and reflectivity images[18–20]. However, specialized structured lighting is typically not suitable for long-distance 3D imaging. In contrast, combining single-pixel SPAD with scanning structures enables long-distance 3D imaging with high spatial resolution[21–24]. By implementing time filtering to suppress noise, single-pixel imaging can achieve an average imaging sensitivity of 0.4 signal photons per pixel, enabling long-range 3D imaging up to 200 km[25–27].
To acquire lateral spatial information of targets, the aforementioned single-pixel imaging methods rely on array sensors, specialized structured lighting, or single-pixel scanning techniques. However, the high costs and complex structures associated with these approaches hinder the progress of single-pixel imaging. Consequently, alternative methods that do not involve scanning or unique lighting structures have been proposed in recent years[30–32]. These methods, known as 3D imaging from temporal data, capture the temporal information (ToF) of the entire scene using a single-pixel single-photon detector (SPD) and a time digital converter (TDC), followed by reconstruction of the 3D images using an artificial neural network (ANN)[33–36]. However, the aforementioned methods inherently exhibit symmetry blur issues. This arises because a scene with center symmetry, captured by a single-pixel detector placed at its center, yields identical measurement results. For instance, the same measurement value is obtained for symmetrical positions on the left and right sides of a single-pixel sensor. One approach[30] to addressing this problem is to introduce a background that reveals the relative position of the subject. Another strategy[32] involves leveraging multipath time signals to gather more scene-related data. However, both of these methods heavily rely on the specific requirements of the scene and may fail to produce accurate images if the background is a plain wall or lacks distinguishing features.
This paper proposes a fusion-data-based method for 3D imaging. Our approach involves placing a single-pixel SPD and a millimeter-wave radar at different locations, forming a specific angle with respect to the target. The SPD records the arrival time of return photons from the entire scene as a temporal histogram, while the millimeter-wave radar captures the one-dimensional (1D) range profile of the scene. The data from the SPD and radar are directly fused and input into an ANN for 3D scene reconstruction. Millimeter-wave radar is a radar system that operates in the millimeter-wave frequency band (approximately 30–300 GHz). This radar technology has high resolution and accuracy, making it valuable in various applications. In fact, millimeter-wave radar is widely used in many applications, such as autonomous vehicles, drones, and aviation radar[37–39]. The fusion approach of SPD + millimeter-wave radar is used in this paper because millimeter-wave radar offers numerous advantages that SPD does not possess. For instance, it has a strong adaptability to adverse weather conditions and can operate normally in rain, smog, and heavy snow. Additionally, millimeter-wave radar is less susceptible to interference from other light sources, such as sunlight or car headlights. More importantly, millimeter-wave radar is generally cheaper and has a higher level of integration. Typically, an entire millimeter-wave radar system can be integrated onto a small circuit board. The added cost of using a system with SPD + millimeter-wave radar, compared to the one using only a single-pixel SPD, is negligible. Therefore, we believe that the system architecture of SPD + millimeter-wave radar can fully capitalize on the strengths of both types of sensors and overcome the weaknesses of using a single sensor, achieving a synergistic effect where the whole is greater than the sum of its parts. By integrating the single-pixel SPD and millimeter-wave radar, our method achieves higher accuracy in 3D scene reconstruction and effectively addresses the symmetry blur issue without the need for a background. Both simulation and experimental results demonstrate the successful elimination of symmetry blur and significant improvement in the quality of reconstructed images.
Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you!Sign up now
The remainder of this paper is structured as follows. Section 2 presents a theoretical analysis of the imaging method, addressing the associated challenges, and introduces the imaging algorithm based on ANN. In Section 3, numerical simulations are conducted to compare the performance of the two imaging methods under various imaging conditions, demonstrating the effectiveness of the proposed approach. Section 4 describes the imaging experiments performed using an optical system operating at 1550 nm and a 60 GHz millimeter-wave radar, assessing the feasibility of the proposed method. Section 5 discusses the potential capabilities of our method in challenging environments. Finally, Section 6 concludes the paper.
2. System Framework and Imaging Principle
The proposed fusion-data-based 3D imaging method is illustrated in Fig. 1, comprising two main processes: data acquisition [Fig. 1(a)] and 3D image recovery [Fig. 1(b)]. During the data acquisition process, a millimeter-wave radar is positioned in proximity to a single-pixel SPD at a specific angle relative to the target under measurement. This arrangement mimics binocular imaging as the millimeter-wave radar and SPD are separated by a certain distance. The temporal histogram of the SPD and millimeter-wave data is fused to generate a fusion histogram by connecting them. Concurrently, a high-precision depth camera captures the 3D image of the target solely for training purposes, not participating in the image recovery process. By varying the target’s orientation and position during data acquisition, a substantial number of real measured training data are obtained. Each pair of data for 3D image recovery consists of one fusion histogram and one depth map. In the 3D image recovery process, the acquired fusion histogram serves as input, while the depth map functions as output for training an ANN. Sufficient training data and iterations enable the ANN to effectively learn the mapping between the input fusion histogram and depth map. Once trained, the network can directly retrieve new, untrained targets by inputting the acquired fusion histogram data from the millimeter-wave radar and the SPD.
Figure 1.3D imaging with fusion data. (a) Data acquisition process; (b) 3D images recovery process; in the 3D images recovery process, the ANN training is performed only once, and then the MLP algorithm can directly reconstruct the 3D image from the temporal histogram. The radar represents the millimeter-wave radar. The human moves in an empty room.
To describe this method mathematically, it is simple to construct a forward model where all points in the scene that are at some distance,
The same purpose as SPD, millimeter-wave radar also converts echo collection containing 3D scene information into a 1D histogram. Assume that the millimeter-wave radar transmits a linear frequency modulation (LFM) pulse, which is generally in the form of
To generate the range information of the targets, a matched filter processing method can be performed. Finally, the output is approximately expressed as
The goal is to find a mapping
3. Numerical Results
In this section, we conduct simulations of our proposed fusion-data-based 3D imaging model and compare it with the method that utilizes only a single-pixel SPD. The specific simulation scenario is depicted in Fig. 1. We position various virtual human models in space and assume that data collection is performed by the millimeter-wave radar located on the left side of the single-pixel SPD at a distance of 0.5 m. Meanwhile, the depth camera is positioned in the center. Simulations are conducted for two cases: with and without background. In the background-free scenario, we exclude the background and solely retain the data corresponding to the human model for simulation. Conversely, in the scenario with background, we situate the human model within a virtual indoor scene for simulation (further details can be found in the Supplementary Material).
To assess the potential performance under ideal conditions, which represent the best capabilities of current equipment, we conducted simulations with a system time resolution [impulse response function (IFR)] set to 2 ps (further details regarding the analysis of imaging resolution can be found in the Supplementary Material). When utilizing only the temporal histogram from a single-pixel SPD for training, the resulting image after passing through the ANN closely resembles the one illustrated in Fig. 2(a).
Figure 2.Single-pixel SPD imaging simulation result. (a) Single-pixel 3D imaging symmetry blur result due to lack of background information; (b) 3D imaging without symmetry blur in the background.
Consequently, if a single-pixel SPD alone is employed to detect a target in the absence of background, the multi-layer perceptron (MLP) model fails to differentiate between the left and right sides of the detector, resulting in an image with superimposed left and right targets. Currently, the prevalent solution for SPDs involves incorporating a background behind the subject, as depicted in Fig. 2(b). By including asymmetric targets within the background, symmetry blur is eliminated, and the background imparts relative positional information during the neural network training of the scene (Turpin et al.[30]). An alternative approach entails utilizing the multipath effect of radio frequency or acoustic waves within a confined space to enhance the amount of information and eliminate symmetry blur (Turpin et al.[32]).
In real-world scenarios, certain locations such as flat ground, long corridors, and open rooms may lack sufficiently complex backgrounds and multipath signals to provide relative positional information in the data. This gives rise to the inevitable problem of symmetry blur when conducting target detection in such environments. When the scene is limited and unchangeable, addressing the challenge of symmetry blur requires improvements in the detection-end sensor. In this regard, we propose a cost-effective solution that integrates a millimeter-wave radar sensor and introduces a fusion-data-based 3D imaging method.
As depicted in Figs. 3(a)–3(c), the algorithm that fuses millimeter-wave data successfully eliminates symmetry blur and achieves clear reconstruction of the human body. Conversely, Figs. 3(d)–3(f) demonstrate that the single-pixel single-photon method produces symmetry blur in the absence of background, rendering it impossible to discern the specific location of the target.
Figure 3.Simulation and reconstruction results of the fusion method. (a)–(c) Images recovered using a fused data-based 3D image reconstruction algorithm; (d)–(f) images recovered using only single-photon data. The first column shows fused temporal histograms generated by simulation [(a)–(c)] or histograms with only single-photon data [(d)–(f)], the second column shows the ground-truth depth maps generated by simulation, and the third column shows the reconstructed images of the time histogram recovered by the MLP algorithm.
4. Experimental Results
Numerical simulations were initially conducted to validate the feasibility of our approach, followed by imaging tests on individuals and objects within the experimental environment. The experimental system used in the tests is shown in Fig. 4. A supercontinuum laser generated an optical pulse with a width of 6 ps and a wavelength of
Figure 4.Schematic diagram of the experimental system and experimental scene. (a) Schematic of the layout of the 1550 nm fusion-data-based single-photon single-pixel 3D imaging system, which comprises a supercontinuum laser source, an SNSNP, a TDC module, an optical lens system, a depth camera, a millimeter-wave radar, and a laptop; (b) experimental scene with a person; (c) schematic diagram of an optical lens system.
In the imaging experiment, we conducted tests in an open room to simulate simple background imaging conditions, as depicted in Fig. 4(b). The room had dimensions of
Figure 5.Experiment results. (a)–(d) Imaging results of different people and objects; in each subfigure, the histogram, the ground-truth depth maps, and the retrieved images are shown from left to right. The reconstructed images with fusion data, only single-photon data, only millimeter-wave data are shown from top to bottom.
We evaluate the quality of the reconstructed images by computing the structural similarity index measure (SSIM) between the reconstructed images and the ground-truth images[40]. SSIM is a perceptual image quality assessment method based on human visual characteristics that can quantify the degree of distortion in an image and is consistent with human perception of image distortion. SSIM is calculated by comparing the luminance, contrast, and structural similarity of two images, as shown in Eqs. (5)–(8),
Among them,
In Figs. 5(a)–5(d), the first column presents the temporal histograms used for image reconstruction. The second column displays the ground-truth depth maps captured by a depth camera for comparison. The third column showcases the reconstructed images based on the histograms from the first column. Each subfigure depicts the reconstructed images with fusion data, only single-photon data, and only millimeter-wave data, arranged from top to bottom. Among the 400 testing data sets, our proposed fusion method exhibits an average SSIM of 0.6576, surpassing the SSIM of the single-photon method (0.6389) and the radar method (0.5266).
In Figs. 5(a) and 5(b), the fusion method effectively reconstructs clear images of a person. However, the single-photon-only method lacks background information, leading to an imprecise determination of the person’s lateral position and resulting in symmetry blur around the person. The radar results reveal that the larger IRF causes the loss of certain details in the shapes, such as incomplete recovery of arms or legs.
In Fig. 5(c), imaging experiments were performed with a person holding the letter “C” (with dimensions of
In Fig. 5(d), another imaging experiment was conducted using the letter “T” (with dimensions of
The obtained results demonstrate that the fusion-data-driven 3D imaging method effectively addresses the issue of symmetry blur in single-pixel single-photon 3D imaging. By utilizing a fusion-based approach, the mapping relationship between 3D imaging and ToF histograms is established more effectively, leading to enhanced system performance and robustness.
5. Discussion
We incorporated millimeter-wave radar as an auxiliary sensing sensor in our proposed method and experiments to eliminate symmetry blur in imaging and enhance imaging quality. Millimeter-wave radar was selected due to its affordability, all-weather operability, and simplicity of data. These advantages enabled us to achieve considerable improvements in image quality at a relatively low cost.
In the experiment, adhering to the principles of binocular vision, it was necessary to maintain a certain separation distance, referred to as the baseline, between the single-pixel SPD and the millimeter-wave radar. The data obtained from both sensors consisted of 1D temporal data, eliminating the requirement for sensor calibration. It is essential to avoid selecting a baseline that is too close, as this would result in minimal variations in the collected data when objects are in motion. Due to the limited detection field of view of the two sensors, an excessively large baseline distance is also impractical. While a larger baseline enhances imaging accuracy, it introduces blind spots. The imaging performance associated with different baseline distances can be evaluated in subsequent studies.
For data fusion, we adopted a straightforward approach by directly concatenating the 1D single-photon temporal histogram with the millimeter-wave radar data. This simplified the data processing procedure, avoided increasing the complexity of the ANN, and yielded satisfactory imaging results. More complex and in-depth data fusion methods can further improve the quality of imaging, which is where future work can be directed. The combined utilization of the all-weather operation of millimeter-wave radar and the low-light detection capability of SPDs endowed our proposed method with excellent imaging potential, even in extreme conditions such as foggy and rainy weather.
In addition to millimeter-wave radar, we employed a 1550 nm laser as the emission source for the optical system. Compared to previous work that used 550 nm, the 1550 nm laser offered improved eye safety and enabled higher power output. In current autonomous driving vehicles, lidar and millimeter-wave radar are widely employed independently for detection purposes. However, our proposed method integrates the raw data from both sensors, resulting in a robust imaging system. Consequently, our method holds significant potential for applications in unmanned autonomous navigation platforms and autonomous driving.
During the ANN training, the ground truth collected by the depth camera determines the highest resolution of the system. The IRF of the single-photon system determines the image reconstruction performance of the algorithm. When the IRF is smaller, the reconstructed image is closer to the ground truth.
6. Conclusion
Instead of utilizing structured light illumination or laser scanning, single-pixel 3D imaging leverages data-driven imaging-retrieval algorithms to convert the 1D temporal histogram obtained from the scene into a 3D depth map. However, the inherent limitations of single-pixel imaging may give rise to issues such as symmetry blur in data-driven image-retrieval algorithms. In this paper, we present a novel single-pixel 3D imaging method that integrates temporal data from an SPD and a millimeter-wave radar. Specifically, our approach involves capturing temporal histograms of objects using both a single-pixel SPD and a millimeter-wave radar. The acquired data are then fused and employed to train a neural network based on the MLP algorithm capable of reconstructing 3D images. To validate the performance of our proposed method, we conducted numerical simulations and experimental measurements. Remarkably, the results demonstrate the superiority of our approach over relying solely on a single-pixel SPD, as it significantly eliminates the impact of symmetrical blur on image reconstruction. Our proposed method exhibits exceptional imaging performance and robustness compared to existing approaches. The combination of SPD and millimeter-wave radar enables the development of a novel 3D image reconstruction system, which holds tremendous potential for various applications, including unmanned autonomous navigation platforms, forward-looking imaging in vehicles, and indoor security monitoring.
References
[1] J. Geng. Structured-light 3D surface imaging: a tutorial. Adv. Opt. Photonics, 3, 128(2011).
[5] S. T. Barnard, M. A. Fischler. Computational stereo. ACM Comput. Surv., 14, 553(1982).
[8] M. Hansard, S. Lee, O. Choi et al. Time-of-Flight Cameras: Principles, Methods and Applications(2012).
[9] Y. Cui, S. Schuon, D. Chan et al. 3D shape scanning with a time-of-flight camera. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1173(2010).
[12] T. Al Abbas, N. Dutton, O. Almer et al. Backside illuminated SPAD image sensor with 7.83 µm pitch in 3D-stacked CMOS technology. IEEE International Electron Devices Meeting (IEDM), 1(2016).
[15] J. H. Shapiro. Computational ghost imaging. Phys. Rev. A, 78, 061802(2008).
[25] Z.-P. Li, J.-T. Ye, X. Huang et al. Single-photon imaging over 200 km. Optica, 8, 344(2021).
[29] J. H. Shapiro, R. W. Boyd. The physics of ghost imaging. Quantum Inf. Process., 11, 949(2012).
[33] J. J. Hopfield. Artificial neural networks. IEEE Circuits Devices Mag., 4, 3(1988).
[34] B. Yegnanarayana. Artificial Neural Networks(2009).
[35] A. Krogh. What are artificial neural networks?. Nat. Biotechnol., 26, 195(2008).
[38] J. Guan, S. Madani, S. Jog et al. Through fog high-resolution imaging using millimeter wave radar. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11464(2020).
[39] P. Zhao, C. X. Lu, J. Wang et al. Mid: Tracking and identifying people with millimeter wave radar. 15th International Conference on Distributed Computing in Sensor Systems (DCOSS), 33(2019).

Set citation alerts for the article
Please enter your email address