• Spectroscopy and Spectral Analysis
  • Vol. 39, Issue 10, 3292 (2019)
ZHANG Xiao1、2 and LUO A-li1
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • show less
    DOI: 10.3964/j.issn.1000-0593(2019)10-3292-05 Cite this Article
    ZHANG Xiao, LUO A-li. XGBOOST Based Stellar Spectral Classification and Quantized Feature[J]. Spectroscopy and Spectral Analysis, 2019, 39(10): 3292 Copy Citation Text show less

    Abstract

    Star spectral classification is a foundational work of stellar research. The Morgan-Keenan (MK) classification system which was developed in 1970s is the most widely used classical classification system. However, MK based interactive decision classification system has some difficulties when dealing with massive quantity of astronomical spectral data. Nowadays the most widely used method of automatically classification is template match which neglects measuring the spectral line. As a result, one of the most popular topics is how to extract features from massive data objectively and precisely and to apply the features for making classification decisions. In this paper, we processed the spectral data of LAMOST DR4 stars to obtain the line index as input data and used the official released labels of the spectrum as outcome. The XGBoost algorithm was applied to automatically classify the stellar spectra and rank the features. In this way, the identified and potential line indices which are sensitive to classification were revealed. Firstly, we labeled and selected the spectral data of stars with B, A, F and M by LAMOST high signal-to-noise ratio (S/N>30) with the sample size amounting to around 41. 4 million. Then, the line indices of spectral data was calculated to reduce the dimension and to filter out the redundant information. Secondly, the processed star spectral data were randomly divided to a training set and a test set. By modifying the parameters, the required classification decision tree model was fitted by training set using XGBoost algorithm and the stability and availability of the model were validated by test set to avoid over-fitting. In the meantime, the classification features were extracted by the algorithm’s own function. Finally, the branch with the highest probabilities was selected as the final decision tree model. Through experiments, it is shown that the XGBoost model has a better performance in self-adaptability under fixed parameters with less affection in data sets and the overall accuracy rate as high as 88.5%. Moreover, the output classification decision tree is more consistent with identified features and the numerical characteristics of spectrum and its corresponding range are obtainable through the model. This would shed light on providing quantitative rules for evaluating classification decision trees with numerical spectral features.
    ZHANG Xiao, LUO A-li. XGBOOST Based Stellar Spectral Classification and Quantized Feature[J]. Spectroscopy and Spectral Analysis, 2019, 39(10): 3292
    Download Citation