• Spectroscopy and Spectral Analysis
  • Vol. 41, Issue 11, 3331 (2021)
Yan-kun LI1、1; *;, Ru-nan DONG1、1;, Jin ZHANG2、2;, Ke-nan HUANG3、3;, and Zhi-yi MAO4、4;
Author Affiliations
  • 11. Department of Environmental Science and Engineering, North China Electric Power University, Hebei Key Lab of Power Plant Flue Gas Multi-Pollutants Control, Baoding 071003, China
  • 22. School of Food Science, Guizhou Medical University, Guiyang 550025, China
  • 33. The 82nd Army Group Hospital of the Chinese People’s Liberation Army, Baoding 071000, China
  • 44. Tianjin Building Material Science Research Academy, Tianjin 300110, China
  • show less
    DOI: 10.3964/j.issn.1000-0593(2021)11-3331-08 Cite this Article
    Yan-kun LI, Ru-nan DONG, Jin ZHANG, Ke-nan HUANG, Zhi-yi MAO. Variable Selection Methods in Spectral Data Analysis[J]. Spectroscopy and Spectral Analysis, 2021, 41(11): 3331 Copy Citation Text show less
    An overview of related works on variable/wavelength selection
    Fig. 1. An overview of related works on variable/wavelength selection
    The methods of WS and WIS
    Fig. 2. The methods of WS and WIS
    Comparisons of variable selection methods in NIR-protein model for corn data
    Fig. 3. Comparisons of variable selection methods in NIR-protein model for corn data
    Illustration for filter (F), wrapper (W) and embedded (E) methods
    Fig. 4. Illustration for filter (F), wrapper (W) and embedded (E) methods
    MethodFirst appearance[Ref.]Characteristic (Merit and Drawback)
    UVE (uninformative variable elimination)Massart, 1996[6]Intuitive and practical, effectively eliminate the influence of non-objective factors; Random noise variables make the result unstable, and LOOCV makes calculation efficiency low.
    MC-UVE (Monte Carlo-UVE)Shao, 2008[7]MC technique instead of LOOCV, does not add noise variables, high stability; Needs to define a threshold, tends to select more variables.
    iPLS (interval PLS)Norgaard, 2000[8]Focus on a choice of better sub-intervals; Just testing a series of adjacent but nonoverlapping intervals, which would miss some more informative ones.
    MWPLS (moving window PLS)Jiang, 2002[9]Considers all the possible continuous intervals but maybe not the optimized intervals.
    CARS (competitive adaptive reweighted sampling)-PLSLiang, 2009[10]With fewer variables and latent variables; The reliability of PLS model parameters based on full spectra needs to be strengthened, low stability.
    VIP (variable importance in projection)Wold, 1993[11]Accumulate the importance of each variable reflected by loading weight from each component; It can be used when the independent variables number is more than the sample size; Require probabilistic considerations regarding VIP.
    RT (randomization test)-PLSFisher, 1935[12]Combines permutation and statistical test, the result is more reliable; When the dataset is large, it has low efficiency and time consumption.
    IVS (interactive variable
    selection)
    Lindgren & Wold,
    1994[13]
    Dimension-wise instead of model-wise, variable selection is carried out for each PLS component, an interactive variable selection approach; Large elements in sometimes suppress smaller values.
    IPW (iterative predictor
    weighting)[15]
    Forina, 1999[14]The importance measure is used both to re-scale the original X-variables and to eliminate the least important variables; Time-consuming for too many variables.
    Table 1. PLS parameter-based variables selection methods
    Selection strategyRepresentative methods[Ref.]First appearance[Ref.]Characteristic(Merit and Drawback)
    Intelligent optimizing
    algorithms (IOA)-based
    GA(Genetic algorithm)Holand, 1975[43]Return to the mathematical essence of variable combination optimization, retain advantages of the combination of variables; Too many combinations of variables to optimize, usually need more preset parameters, sometimes easy to fall into local optimum.
    SA(Simulated Annealing)Metropolis, 1953[44]
    PSO(Particle swarm optimization)Eberhart&Kennedy, 1995[45]
    ACO(Ant colony optimization)Colorni, 1991[46]
    GWO(Gray wolf optimizer)Mirjalili, 2014[47]
    Model population
    analysis (MPA)-based
    BOSS (Bootstrapping soft shrinkage)Liang, 2016[48]The traditional strategy of rigidly eliminating variables according to a single index is transformed into a flexible strategy of changing weight, which can preserve the effective variables more safely; The introduction of random algorithm helps to preserve the combination effect among spectral variables, however, it also makes the calculation more complicated.
    VCPA (Variable combination
    population analysis)
    Liang, 2015[49]
    VISSA (Variable iterative space
    shrinkage approach)
    Liang, 2014[50]
    ICO (Interval combination optimization)Xiong & Min, 2016[51]
    iRF (internal Random frog)Liang, 2013[52]
    Collinearity
    minimization-based
    SPA (Successive projection
    algorithm)[53, 54]
    Araujo, 2001[55]Minimizing the influence of multi-collinearity variables on the model; In the optimization, each variable is used as the starting point, the calculation amount is too large to be suitable for small-size sample.
    SR (Stepwise
    regression)[56]
    Category model-basedLDA (Linear discriminant analysis)Fisher, 1936[57]The correlation between variables and model is preserved, and the overall prediction accuracy is improved by combining different classification algorithms. The computational complexity is small, but the result is limited by the performance of the classification model.
    ULDA (Uncorrelated
    lineardiscriminant analysis)[58]
    Jin, 2001[59]
    RF (Random forest)[60,61,62]Breiman, 2001[63]
    SVM (Support vector machine)Vapnik, 1995[64]
    Regularization methodLASSO (Least absolute shrinkage
    and selection operator)[65]
    Tibshirani, 1996[66]Parameter estimation and variable selection are realized simultaneously, fast. When the number of variables is large, the over-fitting can be avoided; The suitable parameter value should be chosen.
    EN (Elastic net)Zou, 2003[67]
    RR (Ridge regression)Hoerl & Kennard, 1998[68]
    Table 2. Other common methods of spectral variables selection
    Yan-kun LI, Ru-nan DONG, Jin ZHANG, Ke-nan HUANG, Zhi-yi MAO. Variable Selection Methods in Spectral Data Analysis[J]. Spectroscopy and Spectral Analysis, 2021, 41(11): 3331
    Download Citation