Automatic Classification Method of Star Spectra Data Based on Convolutional Neural Network

SHI Chao-jun; QIU Bo; ZHOU Ya-tong; DUAN Fu-qing

doi:10.3964/j.issn.1000-0593(2019)04-1312-05

Abstract

Star spectral automatic classification is the basis for the study of Star Spectral analysis. The fast and accurate automatic identification and classification of the star spectra can improve the search for the speed of the special celestial bodies, which is of great significance to the study of astronomy. At present, LAMOST, a large-scale spacecraft project in China, releases millions of spectral data every year. Fast and accurate automatic identification and classification of massive star spectra has become one of the hot spots in the field of astronomical data analysis and processing. Aiming at the problem of star spectral automatic classification, a new spectral classification method of K, F stellar based on convolutional neural network (CNN) is proposed. Support Vector Machine (SVM) and Back Propagation (BP) neural network algorithms are compared algorithms. The cross-validation method is used to verify the performance of the classifier. Compared with the traditional method, CNN has the advantages of sharing the weight and reducing the learning parameters of the model. It can automatically extract training data features. The experiment uses the Tensorflow depth learning framework and the Python 3.5 programming environment. The K, F stellar spectral dataset uses the LAMOST DR3 data provided by the National astronomical observatory of the Chinese academy of sciences. Spectra with wavelengths in the 3 500 to 7 500 range are sampled evenly to generate data sets. Data sets were normalized using the min-max normalization method. The CNN structure includes an input layer, a convolution layer C1, a pooling layer S1, a convolution layer C2, a pooling layer S2, a convolution layer C3, a pooling layer S3, a full connection layer and an output layer. The input layer is the flow value at 3 700 wavelength points of a group of K and F stars. The C1 layer has 10 convolution kernels in size of 1×3 steps of 1. S1 layer using the maximum pooling method. The size of the sampling window is 1×2, no overlapping sampling. The sampling result produces 10 features, which is the same as the number of the C1 features, and each feature is one-half the size of the C1 feature. The C2 layer has 20 convolution kernels of size 1×2 steps of 1 which outputs 20 feature maps. S2 layer outputs 20 features. The C3 layer has 30 convolution kernels of size 1×3 steps of 1 which outputs 30 feature maps. S3 layer outputs 30 features. The number of fully connected layer neurons is set to 50, and each neuron is connected to all the neurons in the S3 layer. The number of neurons in the output layer is set to 2, and the output classification results are obtained. The activation function of convolution layer uses the ReLU function, and the activation function of output layer uses the softmax function. The contrast algorithm SVM type is C-SVC, and its kernel function uses the radial basis function. The BP algorithm has three hidden layers, each with 20, 40 and 20 neurons. Data set is divided into training data and test data. The training data of 40%, 60%, 80% and 100% are used as training sets and the test data is used as a test set. The training sets are put into the model for training. Each training iteration 8 000 times. Each trained model is validated with a test set. The training data of 100% are used as a training set for comparative experiments. And test data are used as a test set. The accuracy, recall, F-score and accuracy are used to evaluate the performance of the model. The results of experiments are analyzed in detail. Analysis results show that CNN algorithm can quickly and automatically classify and screen K, F star spectra. The greater the amount of data in the training set, the stronger the model generalization ability and the higher the classification accuracy. Contrast experiment results demonstrate that CNN algorithm significantly outperform the competitors SVM and BP algorithms on automatic classification method of K and F star spectra data.