• Spectroscopy and Spectral Analysis
  • Vol. 40, Issue 4, 1056 (2020)
WANG Yu-xi1, JIA Zhen-hong1、*, YANG Jie2, and Nikola K Kasabov3
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • 3Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland 1020, New Zealand
  • show less
    DOI: 10.3964/j.issn.1000-0593(2020)04-1056-07 Cite this Article
    WANG Yu-xi, JIA Zhen-hong, YANG Jie, Nikola K Kasabov. A Variable Selection Method of the Selectivity Ratio Competitive Model Population Analysis for Near Infrared Spectroscopy[J]. Spectroscopy and Spectral Analysis, 2020, 40(4): 1056 Copy Citation Text show less

    Abstract

    Spectral analysis is an important application of chemometrics and has been widely used in various fields. Spectral variable selection is a key part of spectral analysis. Therefore, it is critical to study different variable selection methods to objectively identify useful information variables or eliminate irrelevant and interfering variables. In our study, a new variable selection method of the selectivity ratio competitive population analysis (SRCMPA) is proposed. This algorithm adopts the idea of selection ratio, adaptive weighted sampling and model population analysis, and combines the method of variable arrangement and exponential decline function. The key wavelength is defined as the wavelength with a high score value in the regression model. In this paper, the score value of the selection ratio under the PLS model is used as an index to evaluate the importance of each wavelength. Then, according to the importance of each wavelength, SRCMPA sequentially selects N wavelength subsets from Monte Carlo sampling, and runs in an iterative and competitive manner. In each sampling operation, the PLS model is built with a fixed ratio samples and the selection ratio value of each variable is calculated. Based on the score value of the ranking selection ratio and the normalized SR (selection ratio) score value as the weight, the key variables are selected by two steps: the compulsory selection of exponential decline function and the competitive selection of adaptive weighted sampling. Finally, cross validation (CV) method is applied to select the optimal subset with the lowest cross validation mean square root (RMSECV). The algorithm has been tested on wheat protein data set and beer data set, and compared with three efficient algorithms. Through the experimental results to evaluate the superiority of the algorithm, this algorithm can find the best combination of the key wavelength variables of the data set, and can be used to explain the chemical characteristics of interest, the evaluation results after modeling are also the best. Compared with the PLS model of full-spectrum beer data set, the number of variables in this algorithm has been reduced from 567 to about 42. And the RMSECV of model decreased from 0.622 to 0.115, RMSEP decreased from 0.823 to 0.363, and the prediction accuracy increased by 81.5% and 55.9%, respectively. Q2_CV and Q2_test also increased from 0.940, 0.852 to 0.994 and 0.995. For wheat protein data sets, Compared with the PLS model of full-spectrum wheat protein spectral data set, the number of variables has been reduced from 175 to about 18. And the RMSECV of the model decreased from 0.607 to 0.292, the RMSEP decreased from 0.519 to 0.234, and the prediction accuracy increased by 51.9% and 54.9%, respectively. Q2_CV and Q2_test also increased from 0.748, 0.774 to 0.931 and 0.839.
    WANG Yu-xi, JIA Zhen-hong, YANG Jie, Nikola K Kasabov. A Variable Selection Method of the Selectivity Ratio Competitive Model Population Analysis for Near Infrared Spectroscopy[J]. Spectroscopy and Spectral Analysis, 2020, 40(4): 1056
    Download Citation