• Spectroscopy and Spectral Analysis
  • Vol. 40, Issue 11, 3451 (2020)
Jie-hong CHENG1、1、* and Zheng-guang CHEN1、1
Author Affiliations
  • 1[in Chinese]
  • 11. College of Electrical and Information, Heilongjiang Bayi Agricultural University, Daqing 163319, China
  • show less
    DOI: 10.3964/j.issn.1000-0593(2020)11-3451-06 Cite this Article
    Jie-hong CHENG, Zheng-guang CHEN. Wavelength Selection of Near-Infrared Spectra Based on Improved SiPLS-Random Frog Algorithm[J]. Spectroscopy and Spectral Analysis, 2020, 40(11): 3451 Copy Citation Text show less

    Abstract

    In the modeling and prediction analysis of near-infrared spectroscopy, the redundancy and collinearity of the data will seriously affect the prediction accuracy and robustness of the model. The feature wavelength selection is an effective method to improve the prediction accuracy of quantitative analysis. Random frog (RF) is a feature wavelength selection algorithm based on different variables with different probability of being selected. In recent years, it has shown good performance in feature wavelength selection. The method calculates the probability of each variable being selected by iteration, and takes the variable with high probability as the feature wavelength. However, the initial variable set V0 of RF is random and uncertain. It may contain useless or disturbing information. Moreover, it is difficult to guarantee the validity of the initial information, which makes the number of iterations too large and the running time too long. In this paper, an improved Si-RF feature wavelength selection algorithm is proposed based on RF. SiPLS is used to select the variables of the full spectrum. At this time, the wavelength obtained is the most sensitive to the change of the target variable. It is used as the initial variable subset of RF to solve the problem of long running time and low efficiency. On the other hand, when RF selects the feature wavelength, it selects the variable whose probability value is larger than the threshold value as the feature wavelength. However, there is no theoretical basis for setting the threshold value, which is easily influenced by human factors. In this paper, the MLR model is established by adding one variable each time in the descending order according to the probability values of being selected of each variable. The subset of variables with the lowest RMSEV value is taken as the feature wavelength, so as to find the wavelength subset contained in the highest prediction accuracy and improve the prediction accuracy. In view of the above two points, Si-RF was applied to soil near-infrared spectroscopy data sets. MLR model is established after selecting the feature wavelength, and the prediction accuracy was compared with that of RF-MLR and Full-PLSR models. The results show that the RF after 10 000 iterations, 10 wavelength points are selected, and the RMSEP of the MLR model is 1.627 6. The improved Si-RF only needs 1 000 iterations to select 17 wavelength points. The RMSEP of MLR model is reduced to 0.818 4, which greatly improves the prediction accuracy and the running efficiency. Compared with the full spectrum, it also greatly improves the prediction accuracy, simplifies the complexity of the model. It proves that improved Si-RF is an effective feature wavelength selection algorithm.
    Jie-hong CHENG, Zheng-guang CHEN. Wavelength Selection of Near-Infrared Spectra Based on Improved SiPLS-Random Frog Algorithm[J]. Spectroscopy and Spectral Analysis, 2020, 40(11): 3451
    Download Citation