• Spectroscopy and Spectral Analysis
  • Vol. 40, Issue 6, 1869 (2020)
ZHANG Lei1, DING Xiang-qian1, GONG Hui-li1, WU Li-jun2、*, BAI Xiao-li2, and LUO Lin2
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • show less
    DOI: 10.3964/j.issn.1000-0593(2020)06-1869-07 Cite this Article
    ZHANG Lei, DING Xiang-qian, GONG Hui-li, WU Li-jun, BAI Xiao-li, LUO Lin. Research on Near Infrared Spectral Feature Variable Selection Method Based on Improved Harmonic Search Algorithm[J]. Spectroscopy and Spectral Analysis, 2020, 40(6): 1869 Copy Citation Text show less

    Abstract

    Near-infrared spectroscopy has been widely used in many fields for detection and analysis because of its advantages of simplicity, speed, efficiency, low cost, and environment protection. However, the NIR spectra also contain interferences such as high variable dimension, multiple collinearities, redundant information, and high frequency noise. The direct construction of the prediction model not only increases the modeling complexity but also affects the prediction performance and generalization. For this purpose, a spectral feature variable selection method based on the improved Harmony Search algorithm (HS) is proposed. HS is often used to solve feature variable optimization problem. When the spectral variable selection is applied by the HS algorithm, the feature contribution of spectra is firstly calculated by the PLS loading coefficient as the disturbance weight of the improved HS. In the process of optimizing the spectral feature variables, the variable feature contribution is introduced as the excitation factor, and the initial solution vectors are generated by the combination of random traversal and excitation factor. When generating the new harmony vector, the feature contribution is applied as a penalty factor, and the parameters of HS are dynamically adjusted with the number of iterations by adding the balance factor, so as to adapt to the search of spectral variables. It enhances the ergodicity of the search process and the diversity of the population. In order to verify the effectiveness of the algorithm, the NIR PLS models of nicotine, total sugar and total nitrogen using tobacco samples are constructed. After pre-processing the original spectra, this method is used to optimize spectral variables. The prediction performance of each model corresponding to the number of different variables is calculated according to the cumulative frequency at which the variables are selected, and the final selected spectral variables are determined by the increasing trend of the Root Mean Square Error of Calibration (RMSEC) with the variables. The three PLS models are established on the training set and the test set respectively, and they are compared with the full spectrum, Uninformative Variables Elimination (UVE) and Particle Swarm Optimization (PSO). The experimental results show that the coefficient of determination (R2) of nicotine, total sugar and total nitrogen models using the selected variables is 0.921 1, 0.925 7 and 0.941 2, respectively; and the Root Mean Square Error of Prediction (RMSEP) is 0.102 3, 1.034 6 and 0.053 1. Compared with other methods, the RMSEP of this study is low, the R2 of these models is more than 0.92, and the spectral characteristic variables are small. It is shown that the improved HS algorithm can effectively filter the feature spectrum, reduce the modeling complexity, improve the model prediction performance and generalization ability.
    ZHANG Lei, DING Xiang-qian, GONG Hui-li, WU Li-jun, BAI Xiao-li, LUO Lin. Research on Near Infrared Spectral Feature Variable Selection Method Based on Improved Harmonic Search Algorithm[J]. Spectroscopy and Spectral Analysis, 2020, 40(6): 1869
    Download Citation