• Spectroscopy and Spectral Analysis
  • Vol. 42, Issue 7, 2148 (2022)
Ping JIANG1、1;, Hao-xiang LU2、2;, and Zhen-bing LIU2、2; *;
Author Affiliations
  • 11. School of Computer and Information Technology, Guangxi Police College, Nanning 530028, China
  • 22. College of Computer and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
  • show less
    DOI: 10.3964/j.issn.1000-0593(2022)07-2148-08 Cite this Article
    Ping JIANG, Hao-xiang LU, Zhen-bing LIU. Drugs Identification Using Near-Infrared Spectroscopy Based on Random Forest and CatBoost[J]. Spectroscopy and Spectral Analysis, 2022, 42(7): 2148 Copy Citation Text show less

    Abstract

    Drug quality is related to people’s health and national lifeblood. The rapid development of the economy and society plays an extremely important role in the rapid and effective identification of drug quality. Spectral analysis technology has high accuracy, fast analysis speed and no pollution to samples, and is widely used in the chemical industry, petroleum, medicine and other important areas of people’s livelihood. In order to solve the problems of low accuracy, low identification speed and poor stability of the traditional drug identification model, the spectrometer was used to collect near-infrared spectroscopy data of drugs to achieve the purpose of pollution-free drugs. Then, random forest and CatBoost were combined to classify and identify drugs quickly and accurately. The proposed method firstly uses Random Forest (RF) to screen the effective characteristic wavelength of the spectrometer’s spectral data to eliminate the irrelevant wavelength in the drug spectral data and screen out the characteristic wavelength that can best characterize the sample properties. Then Extreme Learning Machine (ELM) was used as CatBoost weak classifier to analyze the feature wavelengths of the screening for drug attribute identification. Since ELM only contains one hidden layer and no iterative optimization is required to ensure the faster running of the identification model, CatBoost can improve the model’s identification accuracy by integrating a weak classifier. In order to effectively evaluate the performance of the drug identification model proposed in this paper, the spectral data of drugs of different sizes were constructed by randomly selected training sets, and experiments were carried out independently. The mean value of 10 running results was taken as the final result. In addition, Back Propagation with CatBoost, Support Vector Machine (SVM), BP, ELM, Summation Wavelet Extreme Learning Machine (SWELM) and Boosting were compared to evaluate the performance of the proposed model further. As can be seen from the classification results of training sets of different sizes, with the increase of training sets, the highest classification accuracy is 100%, and the prediction standard deviation tends to be 0. The experimental results show that the RF-CATBoost identification model proposed in this paper has higher classification accuracy, faster speed and stronger robustness than the comparison method on drug data sets of different sizes and can be widely used in the accurate identification of drug categories, to achieve effective supervision of drug quality.
    Ping JIANG, Hao-xiang LU, Zhen-bing LIU. Drugs Identification Using Near-Infrared Spectroscopy Based on Random Forest and CatBoost[J]. Spectroscopy and Spectral Analysis, 2022, 42(7): 2148
    Download Citation