• Spectroscopy and Spectral Analysis
  • Vol. 38, Issue 9, 2763 (2018)
LI Wei1, LI Jin-long1, LI Wei-jun2, LIU Li-wei3, LI Hao-guang2, CHEN Chen1, and CHEN Shao-jiang1
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • 3[in Chinese]
  • show less
    DOI: 10.3964/j.issn.1000-0593(2018)09-2763-07 Cite this Article
    LI Wei, LI Jin-long, LI Wei-jun, LIU Li-wei, LI Hao-guang, CHEN Chen, CHEN Shao-jiang. Near Infrared Spectroscopy Analysis Based Machine Learning to Identify Haploids in Maize[J]. Spectroscopy and Spectral Analysis, 2018, 38(9): 2763 Copy Citation Text show less

    Abstract

    Haploid identification is a very important part of doubled haploid technology in maize. In this reasearch, we studied the near-infrared transmission spectra of a large number seeds of haploids and heterozygous diploids to establish an accurate model for haploid identification. Compared with the average spectrum of all haploids and heterozygous diploids, it was found that the absorption peak position of the two spectra was almost the same, but the haploid absorbance was slightly higher than that of heterozygous diploid, especially at the wavelengths of 940~1 120 and 1 180~1 316 nm which shared larger differences. Based on the near infrared spectra of haploids and heterozygous diploids from three different sourcegermplasm, different machine learning algorithms were called to construct a haploid selection model, accuracy of models developed with different spectral preprocessing methods were compared, and the effects of datasets to model evaluation were also studied. By comparison with several models, the haploid identification accuracy of the partial least squares method and the neural network algorithm reached a high accuracy of 95.42% and 93.26% respectively. The results of the testing set were consistent with the accuracy of the model, indicating that the two algorithms are suitable for large-scale screening of haploids. By using the partial least squares model, the accuracy of the model developed from the spectral preprocessing methods of smoothing was the best. Compared with the modeling results of different data size, it was found that increasing the data set in a certain range could improve the accuracy of the model. And when proportion of haploids was high enough, the recall rate of haploid prediction would reach up to 100%. In addition, haploids and heterozygous diploids which was difficult to be identified by R1-nj color were selected to form a new dataset. The accuracy of the partial least squares method trained by this dataset was 93.39%. This showed the advantages of NIR machine learning method for haploid identification, which could be used to achieve accurate identification in the case independent of R1-nj color expression. The method of NIR haploid identification based on machine learning has high accuracy and efficiency, and the method can be optimized with increasing data. This research paved a way for the intelligent identification of haploid.
    LI Wei, LI Jin-long, LI Wei-jun, LIU Li-wei, LI Hao-guang, CHEN Chen, CHEN Shao-jiang. Near Infrared Spectroscopy Analysis Based Machine Learning to Identify Haploids in Maize[J]. Spectroscopy and Spectral Analysis, 2018, 38(9): 2763
    Download Citation