• Chinese Journal of Lasers
  • Vol. 48, Issue 3, 0311002 (2021)
Qi Wang1, Wandan Zeng1、*, Zhiping Xia2、*, Zhiping Li2, and Han Qu2
Author Affiliations
  • 1College of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai 201418, China
  • 2Military Veterinary Institute, Changchun, Jilin 130062, China
  • show less
    DOI: 10.3788/CJL202148.0311002 Cite this Article Set citation alerts
    Qi Wang, Wandan Zeng, Zhiping Xia, Zhiping Li, Han Qu. Recognition of Food-Borne Pathogenic Bacteria by Raman Spectroscopy Based on Random Forest Algorithm[J]. Chinese Journal of Lasers, 2021, 48(3): 0311002 Copy Citation Text show less

    Abstract

    Objective Food and drug safety is of great concern to society. Food pathogenic bacteria are pathogenic bacteria that can cause food poisoning or bacteria that use food as the vector of transmission. Therefore, quick and effective detection of food-borne pathogenic bacteria in food is crucial to protect public health. The culture separation method, which is traditionally used to examine microorganisms, depends on the medium used for culturing, separation, and biochemical identification. Detection of food-borne pathogenic bacteria generally requires five to seven days and includes a series of detection procedures such as pre-enrichment, selective enrichment, microscopic examination and serological verification. Therefore, traditional detection methods are insufficient for preventing and controlling food-borne pathogenic bacteria. However, Raman spectroscopy is a nondestructive method that can be used to rapidly and accurately identify molecules existing in the functional groups. In this study, 11 food-borne pathogenic bacteria samples were used to construct a recognition and classification model based on a random forest algorithm and Raman spectra. This model was then used to build a classification and recognition model to resolve the problems of low classification accuracy and long detection time required by traditional methods used to detect food-borne pathogenic bacteria. The results of this study will help to ensure public health safety by rapidly and effectively detecting pathogens in food and drugs.

    Methods All of the food-borne pathogenic bacteria in this study were purchased from China Center of Industrial Culture Collection. First, a sample of food-borne pathogenic bacteria was detected by Raman spectrometry in a shift range of 500--1600 cm -1. LabSpec 6.0 software was used for spectral collection, and each sample was collected 15 times. After screening, 132 Raman spectral data were obtained. Min-max normalization was performed on the Raman spectral data in the spectral preprocessing stage, and the intensity was mapped to a range of [0, 1] for comparison. The Savitzky-Golay algorithm was used for smooth denoising to remove noise and fluorescence interference. Principal component analysis (PCA) was used for feature dimensionality reduction for sample data with high-dimensional characteristics to avoid problems caused by excessively high dimensions. In the model evaluation stage, K-fold cross-validation was used to verify whether the model balanced underfitting and overfitting phenomena and to evaluate the model stability. According to these criteria, the Raman spectral recognition model based on the random forest algorithm proposed in this study was able to effectively distinguish different food-borne pathogenic bacteria among the collected samples.

    Results and Discussions In this study, K-nearest neighbors (KNN), logistic regression, support vector machine (SVM), decision tree, and random forest models were used for classification prediction of the pre-treated Raman spectral data of the food-borne pathogenic bacteria (Table 4). Among the 10-fold cross-validation models, the accuracy of the random forest model was better than that of the traditional machine learning algorithms. The decision tree model presented the worst results, with an accuracy rate of 82.63%. This is because the decision tree results in a single weak learner, whereas the random forest model includes multiple votes that are combined to form strong learning (Fig. 5). Therefore, the classification ability of the random forest algorithm is higher than that of a single decision tree classifier. Compared with traditional machine learning algorithms, the random forest algorithm adds two randomness elements in the model construction: sampling randomness and feature selection randomness (Table 2). Because the random forest is composed of decision trees, a higher correlation of decision trees results in a higher error rate. Random sampling determines the decrease degree in the correlation of each tree in the random forest. Among a small number of features selected randomly by each tree in the random forest, the features of optimal splitting ability are chosen as the left and right subtrees of the decision tree. This expands the effect of randomness and further enhances the robustness of the model. Because the introduction of the two randomness elements has a strong effect on reducing the variance of the model, the random forest generally does not need additional pruning. That is, it can achieve better generalization and a stronger ability to avoid overfitting, resulting in low variance. In addition, the Savitzky-Golay filtering algorithm was used for denoising in the preprocessing stage of the Raman spectral data (Fig. 3) to ensure good anti-interference ability in the model.

    Conclusions Raman spectroscopy is a mature technology that has a significant effect on the detection and classification of food-borne pathogenic bacteria. In this study, a Raman spectrometer was used to detect the spectral data of 11 food-borne pathogens. According to the spectral properties, the spectral data were normalized, smoothed, and denoised in the preprocessing stage, which facilitated the model construction and training. In addition, a method was developed for identification and analysis of food-borne pathogenic bacteria by using Raman spectroscopy. The experimental results show that the classification model of PCA combined with the random forest algorithm proposed in this study has higher accuracy for Raman spectral data than that of the single machine learning method used conventionally for detecting food-borne pathogens. In addition, the new method improves the speed of manual identification of the Raman spectra. However, the random forest model was prone to overfitting in the sample sets with large noise processing. Future research to improve the accuracy of the model will show that denoising can be optimized in the data pretreatment stage and that the data feature selection algorithm can be optimized using the random forest algorithm. Only 11 samples of food-borne pathogenic bacteria were used in this study. Additional samples could be introduced in the construction of a later model to build a more complete Raman spectral database.

    Qi Wang, Wandan Zeng, Zhiping Xia, Zhiping Li, Han Qu. Recognition of Food-Borne Pathogenic Bacteria by Raman Spectroscopy Based on Random Forest Algorithm[J]. Chinese Journal of Lasers, 2021, 48(3): 0311002
    Download Citation