• Spectroscopy and Spectral Analysis
  • Vol. 40, Issue 2, 403 (2020)
HE Wei-jian*, CHENG Liang-lun, and DENG Guang-shui
Author Affiliations
  • [in Chinese]
  • show less
    DOI: 10.3964/j.issn.1000-0593(2020)02-0403-07 Cite this Article
    HE Wei-jian, CHENG Liang-lun, DENG Guang-shui. Terahertz Spectral Interval Combination Feature Extraction Algorithm in the Case of Aliasing Absorption Peak[J]. Spectroscopy and Spectral Analysis, 2020, 40(2): 403 Copy Citation Text show less

    Abstract

    Terahertz spectrum is an advanced method for material recognition. Due to the different molecular organizations and structures of different substances, the terahertz absorption spectrum of many substances will have many absorption peaks at certain frequency, which can be used as important features of the mixture for component detection. Effective and accurate extraction of the parameters of these absorption peaks is the key to improving the recognition rate. The multi-peak fitting algorithm fits the spectral curve into the sum of several standard peak functions, which can extract the frequency, wave height and wave width of the absorption peaks at the same time. However, based on the results of the peak finding algorithm, fitting algorithm determines the approximate position and number of the absorption peaks before fitting. The peak finding result is not necessarily the optimal fitting result, and it is difficult to accurately identify the aliasing absorption peaks. In order to improve the recognition and positioning accuracy of the absorption peaks in the aliasing spectrum, this thesis proposes to divide the pre-processed spectrum into several sub-intervals by the wave troughs of sharp smoothed curve. Then the sub-intervals are combined for multi-peak fitting, and the optimal fitting sub-interval combination and the approximate value of the absorption peak frequency are obtained by genetic algorithm. The number of absorption peaks is determined by the peak number increment optimization method in each subinterval during fitting. In order to realize the identification of matter, the density clustering algorithm is used to obtain the common absorption peaks of the same kind of pure substance in multiple measurements. Using those peak data as the standard data, the proposed spectral matching algorithm based on the absorption peak characteristics enables rapid identification of pure substances and mixtures of different contents. The actual spectral data of ten kinds of pure substance are fitted and clustered to obtain parameters of absorption peaks, which are basically consistent with the terahertz spectral database. The recognition rate for identifying the test set of pure substances by the recognition algorithm of this thesis is 100%, which proves the effectiveness of the feature extraction algorithm and material recognition algorithm. For the spectrum of mixtures with aliasing peaks, the recognition rate of the second derivative method for the masked absorption peak (1.280 THz) in the glucose-lactose mixture spectrum is only 70%, and the extracted frequency average value is 1.316 THz; The algorithm in this thesis improves the recognition rate to 95% and the average frequency is 1.281 THz, that is to say, this method improves the resolution of the aliasing peak and can accurately locate the aliasing peak. The Top-2 and Top-3 accuracy of the six types of binary mixtures which have different degrees of aliasing and consist of 10 pure materials are 90.8% and 98.3%, respectively. The extracted features can be effectively applied to the component detection of the mixture. The algorithm in this thesis can realize the component detection of mixture by using the data of pure substances as the standard data, which is of great significance to the component detection of mixture in terahertz spectroscopy.
    HE Wei-jian, CHENG Liang-lun, DENG Guang-shui. Terahertz Spectral Interval Combination Feature Extraction Algorithm in the Case of Aliasing Absorption Peak[J]. Spectroscopy and Spectral Analysis, 2020, 40(2): 403
    Download Citation