Incomplete Laser 3D Point Cloud Classification and Completion Approach Based on Object Symmetry under Occlusion Conditions
Yong Tong, Fangyong Xu, Ning Yang, and Hui Chen
There are many symmetric objects in the real world. Three-dimensional (3D) laser scanning equipment can be used to obtain a 3D point cloud of symmetric objects; however, the 3D point cloud defects are easily caused by occlusion factors and the equipment itself. To resolve this problem, this paper proposes an incomplete laser 3D point cloud classification and completion approach based on object symmetry. Herein, according to the symmetric key points in a single two-dimensional image, a mapping relationship of the symmetric plane was established, and the incomplete point cloud was classified by the position of the symmetric plane in the point cloud. Moreover, for a half residual defect point cloud, symmetric plane detection was performed directly followed by the image completion. For a less than half-incomplete point cloud, it was fused with the mirror point cloud to remove the duplicated data points for image completion. Furthermore, for a greater than half-incomplete point cloud, the direct hole-repair method was used to complete the missing information after fusion with the mirror point cloud. For extremely defective point clouds, the input point cloud was fused with the mirror point cloud and then fused with a completed similar point cloud as too much information was missing. Finally, redundant points were removed to complete the missing information. The experimental results of the actual acquisition point cloud of the incomplete 3D point cloud and the point cloud in the public database show that the proposed approach can complete different types of incomplete point clouds. These results are very similar to that of the entire complete point clouds. Thus, here, the effectiveness and feasibility of the proposed classification and completion approach are verified.
  • Oct. 25, 2023
  • Laser & Optoelectronics Progress
  • Vol. 60, Issue 20, 2010004 (2023)
  • DOI:10.3788/LOP222968
Research on Embroidery Image Restoration Based on Improved Deep Convolutional Generative Adversarial Network
Yixuan Liu, Guangying Ge, Zhenling Qi, Zhenxuan Li, and Fulin Sun
Presently, image inpainting in the inheritance and protection of Chinese traditional embroidery often depend on human labor, with considerable work force and material resources. Furthermore, with the rapid development of deep learning, generative adversarial networks can be applied to repair damaged embroidery relics. An embroidery image restoration method based on improved deep convolutional generative adversarial network (DCGAN) is proposed to solve the above problems. In the generator part, dilated convolution is introduced to expand receptive fields; the addition of the convolution attention-mechanism module enhances the guiding role of significant features in two dimensions of channel and space. In the discriminator part, the number of full connection layers are increased to improve the ability of the network to solve nonlinear problems. In the loss function part, the mean square error loss and confrontation loss are combined to realize embroidery image inpainting through the game process of network training. The experimental results show that the dilated convolution and convolution attention mechanism module improves the network performance and repair effect, and the structural similarity of the repaired image is as high as 0.955. This method enables obtaining a more natural embroidery image-restoration effect, which can provide experts with information such as texture and color as a reference to assist subsequent repair.
  • Oct. 25, 2023
  • Laser & Optoelectronics Progress
  • Vol. 60, Issue 20, 2010005 (2023)
  • DOI:10.3788/LOP223060
Hyperspectral Imaging-Based Quality Classification for Kiwifruit by Incorporating Three-Dimensional Convolution Neural Network and Haar Wavelet Filter
Ke Jin, Zhiqiang Guo, Yunliu Zeng, and Gang Ding
To address challenges in the non-destructive inspection and classification of kiwifruit hardness quality, we propose a classification model that incorporates hyperspectral imaging technology and a convolution neural network. This network combines the spatial feature information extracted by the Haar wavelet and the space-spectrum joint information extracted by the three-dimensional (3D) convolution kernel. In this network, the data decomposition of channel connections is executed to ensure that all features can be utilized by the model, which improves the ability of network feature learning. Experiments on the acquired hyperspectral image-based, self-made kiwifruit hardness quality dataset (named Kiwi_seed) demonstrate that the Haar wavelet transform module can significantly improve the feature extraction ability of the network. Ablation experiments reveal that the classification accuracy of the model incorporating the Haar wavelet transform module is increased by 7.4% and reaches the optimum level at 97.3%, which is better than the classical image classification network. The proposed classification model can be effectively used for the non-destructive inspection and classification of kiwifruit quality.
  • Oct. 25, 2023
  • Laser & Optoelectronics Progress
  • Vol. 60, Issue 20, 2010003 (2023)
  • DOI:10.3788/LOP223142
Infrared and Visible Image Fusion with Convolutional Neural Network and Transformer
Yang Yang, Zhennan Ren, and Beichen Li
An innovative image fusion model combining convolutional neural network (CNN) and Transformer is proposed to address the issues of the CNN's inability to model the global semantic relevance within the source image and insufficient use of the image context information in infrared and visible image fusion field. First, to compensate for the shortcomings of CNN in establishing long-range dependencies, a combined CNN and Transformer encoder was proposed to improve the feature extraction of correlation between multiple local regions and improve the model's ability to extract local detailed information of images. Second, a fusion strategy based on the modal maximum disparity was proposed for better adaptive representation of information from various regions of the source image during the fusion process, enhancing the fused image's contrast. Finally, by comparing with multiple contrast methods, the fusion model developed in this research was experimentally confirmed using the TNO public dataset. The experimental results demonstrate that the suggested model has significant advantages over existing fusion approaches in terms of both subjective visual effects and objective evaluation metrics. Additionally, through ablation tests, the efficiency of the suggested combined encoder and fusion technique was examined separately. The findings of the experiments further support the effectiveness of the design concept for the infrared and visible image fusion assignments.
  • Aug. 25, 2023
  • Laser & Optoelectronics Progress
  • Vol. 60, Issue 16, 1610013 (2023)
  • DOI:10.3788/LOP222265
Infrared and Visible Image Fusion Method Based on Saliency Target Extraction and Poisson Reconstruction
Wenqing Liu, Renhua Wang, Xiaowen Liu, and Xin Yang
An infrared and visible image fusion method based on saliency target extraction and Poisson reconstruction is proposed to address the problems of incomplete saliency target, blurred edges, and low contrast in the fusion process of infrared and visible images in a low illumination environment. First, the saliency target was extracted using the correlation of saliency detection, threshold segmentation, and Gamma correction, which is based on the difference in saliency intensity between infrared image pixels, to separate the target from the background in infrared images. Second, the visual saliency features and gradient saliency of the source images were considered, and fused images were reconstructed by solving the Poisson equation in the gradient domain. Finally, the mean and standard deviation of the infrared images were used to optimize the fused images, thereby improving the quality of the results in a low illumination environment. Experimental results show that the proposed method is superior to other comparison methods in terms of subjective and objective evaluation performances. Moreover, the proposed method can better highlight infrared target information, retain rich background information, and have remarkable visual effects.
  • Aug. 25, 2023
  • Laser & Optoelectronics Progress
  • Vol. 60, Issue 16, 1610012 (2023)
  • DOI:10.3788/LOP222293
Hyperspectral Image Classification Based on Hyperpixel Segmentation and Convolutional Neural Network
Rujun Chen, Yunwei Pu, Fengzhen Wu, Yuceng Liu, and Qi Li
A hyperspectral image classification method based on superpixel segmentation and the convolutional neural network (CNN) is proposed to address the issues of low utilization of spatial-spectral features and low classification efficiency of CNN in hyperspectral image classification. First, the first three principal components were filtered after extracting the first 12 image components utilizing the principal component analysis (PCA), and the three filtered bands were then subjected to superpixel segmentation. Sample points were then mapped within the hyperpixels, enabling it to select superpixels rather than pixels as the basic taxon. Finally, the CNN was used for image segmentation. Experiments on two public datasets, WHU-Hi-Longkou and WHU-Hi-HongHu, show improved accuracy obtained by combining spatial-spectral features compared to using only spectral information, with classification accuracy of 99.45% and 97.60%, respectively.
  • Aug. 25, 2023
  • Laser & Optoelectronics Progress
  • Vol. 60, Issue 16, 1610010 (2023)
  • DOI:10.3788/LOP222551
Cross-modal geo-localization method based on GCI-CycleGAN style translation
Qingge Li, Xiaogang Yang, Ruitao Lu, Siyu Wang, Jiwei Fan, and Hai Xia
  • Jul. 25, 2023
  • Infrared and Laser Engineering
  • Vol. 52, Issue 7, 20220875 (2023)
  • DOI:10.3788/IRLA20220875
Research Progress in Fundamental Architecture of Deep Learning-Based Single Object Tracking Method
Tingfa Xu, Ying Wang, Guokai Shi, Tianhao Li, and Jianan Li
SignificanceSingle object tracking (SOT) is one of the fundamental problems in computer vision, which has received extensive attention from scholars and industry professionals worldwide due to its important applications in intelligent video surveillance, human-computer interaction, autonomous driving, military target analysis, and other fields. For a given video sequence, a SOT method needs to predict the real-time and accurate location and size of the target in subsequent frames based on the initial state of the target (usually represented by the target bounding box) in the first frame. Unlike object detection, the tracking target in the tracking task is not specified by any specific category, and the tracking scene is always complex and diverse, involving many challenges such as changes in target scales, target occlusion, motion blur, and target disappearance. Therefore, tracking targets in real-time, accurately, and robustly is an extremely challenging task.The mainstream object tracking methods can be divided into three categories: discriminative correlation filters-based tracking methods, Siamese network-based tracking methods, and Transformer-based tracking methods. Among them, the accuracy and robustness of discirminative correlation filter (DCF) are far below the actual requirements. Meanwhile, with the advancement of deep learning hardware, the advantage of DCF methods being able to run in real time on mobile devices no longer exists. On the contrary, deep learning techniques have rapidly developed in recent years with the continuous improvement of computer performance and dataset capacity. Among them, deep learning theory, deep backbone networks, attention mechanisms, and self-supervised learning techniques have played a powerful role in the development of object tracking methods. Deep learning-based SOT methods can make full use of large-scale datasets for end-to-end offline training to achieve real-time, accurate, and robust tracking. Therefore, we provide an overview of deep learning-based object tracking methods.Some review works on tracking methods already exist, but the presentation of Transformer-based tracking methods is absent. Therefore, based on the existing work, we introduce the latest achievements in the field. Meanwhile, in contrast to the existing work, we innovatively divide tracking methods into two categories according to the type of architecture, i.e., Siamese network-based two-stream tracking method and Transformer-based one-stream tracking method. We also provide a comprehensive and detailed analysis of these two basic architectures, focusing on their principles, components, limitations, and development directions. In addition, the dataset is the cornerstone of the method training and evaluation. We summarize the current mainstream deep learning-based SOT datasets, elaborate on the evaluation methods and evaluation metrics of tracking methods on the datasets, and summarize the performance of various methods on the datasets. Finally, we analyze the future development trend of video target tracking methods from a macro perspective, so as to provide a reference for researchers.ProgressDeep learning-based target tracking methods can be divided into two categories according to the architecture type, namely the Siamese network-based two-stream tracking method and the Transformer-based one-stream tracking method. The essential difference between the two architectures is that the two-stream method uses a Siamese network-shaped backbone network for feature extraction and a separate feature fusion module for feature fusion, while the one-stream method uses a single-stream backbone network for both feature extraction and fusion.The Siamese network-based two-stream tracking method constructs the tracking task as a similarity matching problem between the target template and the search region, consisting of three basic modules: feature extraction, feature fusion, and tracking head. The method process is as follows: The weight-shared two-stream backbone network extracts the features of the target template and the search region respectively. The two features are fused for information interaction and input to the tracking head to output the target position. In the subsequent improvements of the method, the feature extraction module is from shallow to deep; the feature fusion module is from coarse to fine, and the tracking head module is from complex to simple. In addition, the performance of the method in complex backgrounds is gradually improved.The Transformer-based one-stream tracking method first splits and flattens the target template and search frame into sequences of patches. These patches of features are embedded with learnable position embedding and fed into a Transformer backbone network, which allows feature extraction and feature fusion at the same time. The feature fusion operation continues throughout the backbone network, resulting in a network that outputs the target-specified search features. Compared with two-stream networks, one-stream networks are simple in structure and do not require prior knowledge about the task. This task-independent network facilitates the construction of general-purpose neural network architectures for multiple tasks. Meanwhile, the pre-training technique further improves the performance of the one-stream method. Experimental results demonstrate that the pre-trained model based on masked image modeling optimizes the method.Conclusions and ProspectsOne-stream tracking method with a simple structure and powerful learning and modeling capability is the trend of future target tracking method research. Meanwhile, collaborative multi-task tracking, multi-modal tracking, scenario-specific target tracking, unsupervised target tracking methods, etc. have strong applications and demands.
  • Aug. 10, 2023
  • Acta Optica Sinica
  • Vol. 43, Issue 15, 1510003 (2023)
  • DOI:10.3788/AOS230746
From Perception to Creation: Exploring Frontier of Image and Video Generation Methods
Liang Lin, and Binbin Yang
SignificanceIn recent years, advancements in computing software and hardware have led to artificial intelligent (AI) models achieving performance levels approaching or surpassing human capabilities in perceptive tasks. However, in order to develop mature AI systems that can comprehensively understand the world, models must be capable of generating visual concepts, rather than simply recognizing them because creation and customization require a thorough understanding of high-level semantics and full details of each generated object.From an applied perspective, when AI models obtain the capability of visual understanding and generation, they will significantly promote progress and development across diverse aspects of the industry. For example, visual generative models can be applied to the following aspects: colorizing and restoring old black and white photos and films; enhancing and remastering old videos in high definition; synthesizing real-time virtual anchors, talking faces, and AI avatars; incorporating special effects into personalized video shooting on short video platforms; stylizing users' portraits and input images; compositing movie special effects and scene rendering, and so on. Therefore, research on the theories and methods of image and video generation models holds significant theoretical significance and industrial application value.ProgressIn this paper, we first provide a comprehensive overview of existing generative frameworks, including generative adversarial networks (GAN), variational autoencoders (VAE), flow models, and diffusion models, which can be summarized in Fig. 5. GAN is trained in an adversarial manner to obtain an ideal generator, with the mutual competition of a generator and a discriminator. VAE is composed of an encoder and a decoder, and it is trained via variational inference to make the decoded distribution approximate the real distribution. The flow model uses a family of invertible mappings and simple priors to construct an invertible transformation between real data distribution and the prior distribution. Different from GANs and VAEs, flow models are trained by the estimation of maximum likelihood. Recently, diffusion models emerge as a class of powerful visual generative models with state-of-the-art synthesis results on visual data. The diffusion model decomposes the image generation process into a sequence of denoising processes from a Gaussian prior. Its training procedure is more stable by avoiding the use of an adversarial training strategy and can be successfully deployed in a large-scale pre-trained generation system.We then review recent state-of-the-art advances in image and video generation and discuss their merits and limitations. Fig. 6 shows the overview of image and video generation models and their classifications. Works on pre-trained text-to-image generation models study how to pre-train a text-to-image foundation model on large-scale datasets. Among those T2I foundation models, stable diffusion becomes a widely-used backbone for the tasks of image/video customization and editing, due to its impressive performance and scalability. Prompt-based image editing methods aim to use the pre-trained text-to-image foundation model, e.g., stable diffusion, to edit a generated/natural image according to input text prompts. Due to the difficulty of collecting large-scale and high-quality video datasets and the expensive computational cost, the research on video generation still lags behind image generation. To learn from the success of text-to-image diffusion models, some works, e.g., video diffusion model, imagen video, VIDM, and PVDM, have tried to use enormous video data to train a video diffusion model from scratch and obtain a video generation foundation model similar to stable diffusion. Another line of work aims to resort to pre-trained image generators, e.g., stable diffusion, to provide content prior to video generation and only learn the temporal dynamics from video, which significantly improves the training efficiency.Finally, we discuss the drawbacks of existing image and video generative modeling methods, such as misalignment between input prompts and generated images/videos, further propose feasible strategies to improve those visual generative models, and outline potential and promising future research directions. These contributions are crucial for advancing the field of visual generative modeling and realizing the full potential of AI systems in generating visual concepts.Conclusions and ProspectsUnder the rapid evolution of diffusion models, artificial intelligence has undergone a significant transformation from perception to creation. AI can now generate perceptually realistic and harmonious data, even allowing visual customization and editing based on input conditions. In light of this progress in generative models, here we provide prospects for the potential future forms of AI: with both perception and cognitive abilities, AI models can establish their own open world, enabling people to realize the concept of "what they think is what they get" without being constrained by real-life conditions. For example, in this open environment, the training of AI models is no longer restricted by data collection, leading to a reformation of many existing paradigms in machine learning. Techniques like transfer learning (domain adaptation) and active learning may diminish in importance. AI might be able to achieve self-interaction, self-learning, and self-improvement within the open world it creates, ultimately attaining higher levels of intelligence and profoundly transforming humans' lifestyles.
  • Aug. 10, 2023
  • Acta Optica Sinica
  • Vol. 43, Issue 15, 1510002 (2023)
  • DOI:10.3788/AOS230758
Image Inpainting of Damaged Textiles Based on Improved Criminisi Algorithm
Qi Li, Long Li, Wei Wang, and Pengbo Nan
For the inpainting of the images of textile cultural relics at the damaged parts, an improved algorithm is pro-posed based on K-means color segmentation and Criminisi algorithm. Due to the characteristics of textile cultural relics images, RGB images were converted into Lab color model, and K-means classifier was used to segment a* and b * layer data according to their colors to calibrate the edges of the patterns and narrow the search area of matching blocks. The standard deviation of L value was introduced to represent the color dispersion and the priority function and adaptive matching block were improved.The proposed algorithm and the three algorithms reported in the literature were used to repair the image of natural damaged textile relics and man-made damaged textile images, and the restoration results were evaluated. The experimental results show that the image restored by the proposed algorithm has natural texture, reasonable structure, and better peak signal-to-noise ratio, structural similarity, feature similarity, mean square error values.
  • Aug. 25, 2023
  • Laser & Optoelectronics Progress
  • Vol. 60, Issue 16, 1610011 (2023)
  • DOI:10.3788/LOP222378