• Spectroscopy and Spectral Analysis
  • Vol. 41, Issue 4, 1097 (2021)
LI Si-hai1、* and LIU Dong-ling2
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • show less
    DOI: 10.3964/j.issn.1000-0593(2021)04-1097-05 Cite this Article
    LI Si-hai, LIU Dong-ling. Quantitative Analysis of Near Infrared Spectroscopy Based on Orthogonal Matching Pursuit Algorithm[J]. Spectroscopy and Spectral Analysis, 2021, 41(4): 1097 Copy Citation Text show less

    Abstract

    Compressed sensing (CS) is a new technology of signal compression and sampling. Orthogonal Matching Pursuit (OMP), a greedy tracking algorithm, is widely used in sparse signal reconstruction in the compressed sensing field. In connection with the characteristics of high-dimensional small samples of near-infrared spectra signals and sparse prior signals, a novel near-infrared spectra variable selection method named Orthogonal Matching Pursuit Based Variable Selection (OMPBVS) is proposed, based on the compressed sensing theory, to further improve the flexibility and reliability of near-infrared spectra variable selection. By sparse reconstruction of the original spectral signal, OMPBVS can compress the regression coefficient of most variables to zero, and then indirectly realize the selection of spectral variables. In the specific process, the spectral matrix is adopted as the sensing matrix, the predictive variable as the observation variable and iteratively calculated residual and the inner product of the atom, and the inner product of the largest atom is chosen. During each iteration, the signal is projected onto the subspace spanned by all selected atoms, and then the coefficients are updated for all the selected atoms, enabling the residual error and all the selected atoms to be orthogonal. With the residual calculation to be the essence of Grammar-Schmidt Orthogonalization, the orthogonal projection can reduce the number of iterations and ensure the accuracy of signal reconstruction. OMPBVS can reduce the spectral dimension to the sample size scale, and its variable selection capability is comparable to LASSO. However, compared with LASSO, the optimization method of OMPBVS loss function is a forward selection algorithm, which reduces the number of iterations and can precisely control the number of selected variables. Variable selection experiments were performed on the beer dataset and Wheat kernels dataset to compare the performance of six variable selection methods: PLS, MCUVE, CARS, WMSCVS, LASSOLarsCV, and OMPBVS. There were 60 samples in the beer dataset, 36 samples of the training set and 24 samples of the test set were divided by Kennard Stone (KS) method, and the prediction variable was Original extract concentration. The Wheat kernels data set consisted of 523 samples, 415 training samples, and 108 test samples. The predicted value was protein content. The OMPBVS method selects the number of variables, RMSEC and RMSEP from the beer dataset as 2, 0.205 2 and 0.159 8, respectively. When on the Wheat kernels data set, the number of selected variables, RMSEC and RMSEP were 9, 0.450 2, and 0.412 5, respectively, and the variable selection ability and model performance was better than the other five methods, indicating that OMPBVS is an effective NIR spectral variable selection and quantitative analysis method. OMPBVS variable selection method has good generalization ability in the case of small samples, which can reduce the number of selected variables and improve the robustness of variable selection. Besides, spectral preprocessing methods based on SNV and MSC can reduce the number of selected variables to a certain extent and improve the interpretability of the model.
    LI Si-hai, LIU Dong-ling. Quantitative Analysis of Near Infrared Spectroscopy Based on Orthogonal Matching Pursuit Algorithm[J]. Spectroscopy and Spectral Analysis, 2021, 41(4): 1097
    Download Citation