• Spectroscopy and Spectral Analysis
  • Vol. 43, Issue 7, 2238 (2023)
JIN Cheng-liang1, WANG Yong-jun2, HUANG He2, and LIU Jun-min33
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • 3[in Chinese]
  • show less
    DOI: 10.3964/j.issn.1000-0593(2023)07-2238-08 Cite this Article
    JIN Cheng-liang, WANG Yong-jun, HUANG He, LIU Jun-min3. Application of High-Dimensional Infrared Spectral Data Preprocessing in the Origin Identification of Traditional Chinese Medicinal Materials[J]. Spectroscopy and Spectral Analysis, 2023, 43(7): 2238 Copy Citation Text show less

    Abstract

    To improve the effectiveness of identifying the origin of Chinese Medicinal Materials based on infrared spectroscopic data with high dimensions, appropriate data preprocessing(DP) should be firstly used, and advanced algorithms can be considered secondly if necessary. Faced with the dataset consists of 658 samples with wavelengths from 551 to 3 998 nm, with the help of support vector machine (SVM) algorithm, ten sample-based DP methods (namelynon-DP, maximum and minimum normalization, standardization, centralization, moving average smoothing, SG smoothing filtering, multivariate scattering correction, regularization, first order derivative followed by second order derivative calculation), five spectral feature based methods (i. e., non-DP, centralization, maximum and minimum normalization, standardization and regularization) and their combinations (50 kinds in total) were investigated accord to the prediction effectiveness and stability. Numerical results show that the right DP is conducive to improving the model accuracy. Moreover the standard variate and Max-Min average DP methods achieve higher scores (the coefficient R2 is approximately 85%) among 10 sample based methods. Feature based only methods get little model improvement. The sample based only and feature-based only methods get the approximately equal average ratio of 64%. The combined methods of standard normal variate or normalization processing followed by second order derivative DP achieve the relatively highest prediction score with R2 of nearly 94%. However, the DP approach of data regularization added to centralization performs most poorly. The suggestions are also given. The research is valuable for further analysis of medicinal efficacy and chemical composition. Furthermore, it can be a reference to infrared spectral data analysis. Moreover, the research also provides references for modeling data with high dimensional small samples.
    JIN Cheng-liang, WANG Yong-jun, HUANG He, LIU Jun-min3. Application of High-Dimensional Infrared Spectral Data Preprocessing in the Origin Identification of Traditional Chinese Medicinal Materials[J]. Spectroscopy and Spectral Analysis, 2023, 43(7): 2238
    Download Citation