• Journal of Geo-information Science
  • Vol. 22, Issue 9, 1799 (2020)
Mingjie LIU1、2, Zhuokui XU1、3, Yunbing GAO2、4、*, Jing YANG2、4, Yuchun PAN2、4, Bingbo GAO5, Yanbing ZHOU2、4, Wanpeng ZHOU2、6, and Ling WANG7
Author Affiliations
  • 1School of Traffic and Transportation Engineering, Changsha University of Science and Technology, Changsha 410114, China
  • 2Beijing Research Center for Information Technology in Agriculture, Beijing 100097, China
  • 3Engineering Laboratory of Spatial Information Technology of Highway Geological Disaster Early Warning in Hunan Province (Changsha University of Science & Technology),Changsha 410114, China
  • 4National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
  • 5China Agricultural University, Beijing 100083, China
  • 6Henan Polytechnic University, Jiaozuo 454003, China
  • 7Institute of Agricultural Resources and Environment, Hebei Academy of Agriculture and Forestry Sciences, Shijiazhuang 050051, China
  • show less
    DOI: 10.12082/dqxxkx.2020.190441 Cite this Article
    Mingjie LIU, Zhuokui XU, Yunbing GAO, Jing YANG, Yuchun PAN, Bingbo GAO, Yanbing ZHOU, Wanpeng ZHOU, Ling WANG. Estimating Soil Organic Matter based on Machine Learning Under Sparse Sample[J]. Journal of Geo-information Science, 2020, 22(9): 1799 Copy Citation Text show less

    Abstract

    To improve the accuracy of soil organic estimation in the case of sparse samples and to construct the soil organic predictive models applying the machine learning methods, GRNN (Generalized Regression Neural Network) and RF(Random Forest). The soil was diluted into 8 samples with different sampling density (2703, 1352, 676, 339, 169, 85, 43, 22 samples) according to the soil organic matter sampling data of Daxing agricultural land in 2007 applying the MMSD (Minimization of the Mean of the Shortest Distances) criterion. GRNN (Generalized Regression Neural Network), RF (random forest) and Ordinary Kriging are applied to predict each sampling density espectively. Cross Validation is used to verify the prediction accuracy of unknown samples at each sampling density. With the decrease of sampling point density, the spatial correlation between sampling points decreases gradually, thus the semivariogram's fitting precision deteriorates, the errorofprediction point result increases, and the confidence of the prediction decreases. The spatial correlation between sampling points is close to disappear when the sample is diluted under 43 and 22 samples, and the coefficient of determination of the semivariogram function is low and the residual is large. The impacts the Ordinary Kriging receives, which are from the changes in the number of the sampling points, sampling density and spatial structures of samples is obvious. The prediction accuracy of the method decreases with the decrease of the number of sampling points. There is no significant correlation between the predicted values and the observed values at or below 85 sampling points. The prediction accuracy of GRNN and RF is almost independent of the sampling density. The predicted values fluctuate within a certain threshold space around the observed values, and has good correlation. At sampling points of 85 and below, the prediction accuracy is greatly improved compared with Ordinary Kriging. Ordinary Kriging is not suitable for spatial interpolating calculation in the case of sparse samples, especially in the case of weak spatial correlation. The machine learning models can fully learn the environmental information and spatial proximity information of soil sampling points. They combine attribute similarity and spatial correlation and have better stability and adaptability, not being easy to be affected by the number of sampling points, configuration and sampling density, and can make stable and accurate predictions even when the spatial autocorrelation between sampling points is very weak.
    Mingjie LIU, Zhuokui XU, Yunbing GAO, Jing YANG, Yuchun PAN, Bingbo GAO, Yanbing ZHOU, Wanpeng ZHOU, Ling WANG. Estimating Soil Organic Matter based on Machine Learning Under Sparse Sample[J]. Journal of Geo-information Science, 2020, 22(9): 1799
    Download Citation