• Spectroscopy and Spectral Analysis
  • Vol. 40, Issue 9, 2913 (2020)
JIANG Bin, ZHAO Zi-liang, WANG Shu-ting, WEI Ji-yu, and QU Mei-xia*
Author Affiliations
  • [in Chinese]
  • show less
    DOI: 10.3964/j.issn.1000-0593(2020)09-2913-05 Cite this Article
    JIANG Bin, ZHAO Zi-liang, WANG Shu-ting, WEI Ji-yu, QU Mei-xia. Decomposition and Classification of Stellar Spectra Based on t-SNE[J]. Spectroscopy and Spectral Analysis, 2020, 40(9): 2913 Copy Citation Text show less

    Abstract

    With the development of astronomy and the improvement of telescope observation ability, many large sky survey telescopes have produced petabytes of stellar spectra. Stellar spectra are a kind of complex frequency domain signal, which is usually composed of continuous spectrum and absorption lines. The differences are mainly caused by the effective temperature, surface gravity acceleration and chemical abundance of elements of stars. The automatic classification of stellar spectra is an important part of astronomical data processing and the basis of studying stellar evolution and parameter measurement. The massive stellar spectra require efficient and accurate classification methods. The traditional manual classification methods have the disadvantages of low speed and accuracy, which cannot meet the actual needs of automatic classification of massive stellar spectra. Machine learning algorithms have been widely used in spectra classification. A significant feature of the stellar spectra is the high data dimension. Dimensionality reduction can not only achieve feature extraction, but also reduce the amount of computation, which is the primary task of spectra classification. The traditional linear dimensionality reduction method only reduces the spectra according to the variance, and different types of spectra will cross in the feature space, while manifold learning can produce good classification boundaries to avoid overlap, which is conducive to subsequent classification. In this paper, the distribution of spectra in high dimensional space and the principle of manifold learning to dimensionality reduction of high dimensional linear data are studied. The effects of two dimensionality reduction methods: t-SNE and principal component analysis were compared and the improved k-nearest neighbor algorithm based on the correlation distance of attribute values was used for spectra classification. Python and Scikit-learn were used to implement the algorithm. 12 000 low signal/noise stellar spectra from SDSS were tested and high precision automatic processing and classification of spectral data are realized finally. Experimental results show that the t-SNE method based on manifold learning can restore the low-dimensional manifold structure in high dimensional spectral data. The low-dimensional manifold features in high-dimensional spaces are found and the corresponding embedded mappings are solved. In the process of dimension reduction, the differences between spectral samples of different categories are preserved to the greatest extent. The three-dimensional visualization of the experimental results shows that PCA can lead to the crossover of the distribution of stellar spectra of different categories, while the t-SNE algorithm can produce more obvious category boundaries. The k-nearest neighbor algorithm based on attribute value correlation distance can achieve satisfactory classification accuracy on test data sets after feature extraction. The method used in this paper can also be applied to the automatic classification of massive spectra generated by other telescopes and data mining of rare objects.
    JIANG Bin, ZHAO Zi-liang, WANG Shu-ting, WEI Ji-yu, QU Mei-xia. Decomposition and Classification of Stellar Spectra Based on t-SNE[J]. Spectroscopy and Spectral Analysis, 2020, 40(9): 2913
    Download Citation