• Spectroscopy and Spectral Analysis
  • Vol. 40, Issue 5, 1495 (2020)
HU Yi-ran1, LI Jie-qing1, LIU Hong-gao2, FAN Mao-pan1、*, and WANG Yuan-zhong3
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • 3[in Chinese]
  • show less
    DOI: 10.3964/j.issn.1000-0593(2020)05-1495-08 Cite this Article
    HU Yi-ran, LI Jie-qing, LIU Hong-gao, FAN Mao-pan, WANG Yuan-zhong. Infrared Spectral Study on the Origin Identification of Boletus Tomentipes Based on the Random Forest Algorithm and Data Fusion Strategy[J]. Spectroscopy and Spectral Analysis, 2020, 40(5): 1495 Copy Citation Text show less

    Abstract

    Boletus tomentipes Earleas a kind of healthy food is favored by the majority of consumers. The nutrient accumulation of the fruiting body is affected by the growth environment (altitude, climate, etc. ). There is a significant difference in the content of nutrient between different regionsIt is urgent to establish an accurate, rapid and cheap origin identification technology. In this paper, a data fusion strategy combined with random forest algorithm (RF) was used to identify the origin of B. tomentipes, and the effects of various eigenvalue extraction methods on the classification of RF models were compared. Fourier transform near infrared and Fourier transform mid-infrared spectra of 87 samples from 4 producing areas (north subtropics, north temperate zones, south subtropical zones and middle subtropical zones) were scanned to analyze their spectral characteristics. All the sampleswere divided into two thirds of the training set (58) and a third of the validation set (29) by the kennard-stone algorithm. Based on 4 kinds of infrared spectra ( near-infrared average spectra of stipes (N-b), near-infrared average spectra of caps (N-g), mid-infrared average spectra of stipes (M-b), mid-infrared average spectra of caps (M-g)) and three data fusion strategies (low-level fusion strategies, mid-level fusion strategies, high-level fusion strategies) of data, combining with the RF building identification model, the effects of different characteristic value (variable importance in projection, Boruta, latent variables) on the classification results of the model are compared. Among them, the optimal ntree and mtrywere selected according to oob. The classification performance of the model was evaluated with specificity, sensitivity, training set correctness, and validation set accuracy. Finally, the best method to identify the origin of B. tomentipes was found by multiple evaluation indicators. The results showed that (1) near infrared and middle infrared spectra could identify the origin of B. tomentipes. (2) It is not ideal for establish a discriminant model with a single spectrum combined with RF. (3) All three fusion strategies can improve the origin identification effect of B. tomentipes. Theresults of origin identification from good to bad are in order of high-level fusion, mid-level fusion, low-level fusion. By scanning the near infrared and middle infrared spectra of B. tomentipes, a high-level fusion strategy based on characteristic value LV was adopted, and the identification model of B. tomentipes from different regions was established with RF, which has high verification set accuracy (99.6%), high sensitivity (0.969) and high specificity (0.986). As a reliable method, it can identify the geographical origin of B. tomentipes quickly and accurately.
    HU Yi-ran, LI Jie-qing, LIU Hong-gao, FAN Mao-pan, WANG Yuan-zhong. Infrared Spectral Study on the Origin Identification of Boletus Tomentipes Based on the Random Forest Algorithm and Data Fusion Strategy[J]. Spectroscopy and Spectral Analysis, 2020, 40(5): 1495
    Download Citation