• Journal of Geographical Sciences
  • Vol. 30, Issue 5, 794 (2020)
Ahmed DERDOURI1、* and Yuji MURAYAMA2
Author Affiliations
  • 1Division of Spatial Information Science, Graduate School of Life and Environmental Sciences, Uni-versity of Tsukuba, Tennodai, Tsukuba, Ibaraki, Japan
  • 2Faculty of Life and Environmental Sciences, University of Tsukuba, Tennodai, Tsukuba, Ibaraki, Japan
  • show less
    DOI: 10.1007/s11442-020-1756-1 Cite this Article
    Ahmed DERDOURI, Yuji MURAYAMA. A comparative study of land price estimation and mapping using regression kriging and machine learning algorithms across Fukushima prefecture, Japan[J]. Journal of Geographical Sciences, 2020, 30(5): 794 Copy Citation Text show less
    Fukushima prefecture and its administrative boundaries, topographic features, transportation lines, and evacuation zones after the Fukushima Daiichi Nuclear Plant disaster (as of September 2015)
    Fig. 1. Fukushima prefecture and its administrative boundaries, topographic features, transportation lines, and evacuation zones after the Fukushima Daiichi Nuclear Plant disaster (as of September 2015)
    Changes in land prices averaged by land type in Fukushima prefecture (2005-2018)
    Fig. 2. Changes in land prices averaged by land type in Fukushima prefecture (2005-2018)
    Methodological framework of the study
    Fig. 3. Methodological framework of the study
    The distribution of land price samples in the study area
    Fig. 4. The distribution of land price samples in the study area
    Fitted semi-variograms for the kriging models for the year 2015: (a) Exp: Exponential (b) Gau: Gaussian (c) Sph: Spherical. The nugget, range, and sill values and the mathematical models are shown in the bottom right corner
    Fig. 5. Fitted semi-variograms for the kriging models for the year 2015: (a) Exp: Exponential (b) Gau: Gaussian (c) Sph: Spherical. The nugget, range, and sill values and the mathematical models are shown in the bottom right corner
    The results of the regression kriging for the year 2015 using the exponential model (upper), Gaussian model (middle), and spherical model (lower). On the left are the estimated log-transformed land prices using regression kriging. On the right are the validation errors in the training samples. Capital letters denote major cities within Fukushima prefecture, which are A: Fukushima, B: Koriyama, C: Iwaki, D: Aizuwakamtsu, and E: Shirakawa
    Fig. 6. The results of the regression kriging for the year 2015 using the exponential model (upper), Gaussian model (middle), and spherical model (lower). On the left are the estimated log-transformed land prices using regression kriging. On the right are the validation errors in the training samples. Capital letters denote major cities within Fukushima prefecture, which are A: Fukushima, B: Koriyama, C: Iwaki, D: Aizuwakamtsu, and E: Shirakawa
    Land price maps for the year 2015 predicted from officially published land price observations using regression kriging based on three mathematical models (ordered from left to right): (1) Krig.EXP: Exponential model, (2) Krig.GAU: Gaussian model, and (3) Krig.SPH: Spherical model
    Fig. 7. Land price maps for the year 2015 predicted from officially published land price observations using regression kriging based on three mathematical models (ordered from left to right): (1) Krig.EXP: Exponential model, (2) Krig.GAU: Gaussian model, and (3) Krig.SPH: Spherical model
    Boxplots of performance of machine learning methods in terms of the MAE, the RMSE, and R2 for the year 2015
    Fig. 8. Boxplots of performance of machine learning methods in terms of the MAE, the RMSE, and R2 for the year 2015
    Observed land prices vs. predicted land prices for the year 2015 in the testing samples by different machine learning methods (ordered from left to right, up to down): (1) GLM: generalized linear model, (2) GAMS: generalized linear model using splines, (3) SVMLinear: support vector machines with linear kernel, (4) MARS: multivariate adaptive regression spline, (5) kNN: k-nearest neighbors, (6) SVMRadial: support vector machines with radial basis function kernel, (7) Cubist, (8) GBM: stochastic gradient boosting and (9) RF: random forest
    Fig. 9. Observed land prices vs. predicted land prices for the year 2015 in the testing samples by different machine learning methods (ordered from left to right, up to down): (1) GLM: generalized linear model, (2) GAMS: generalized linear model using splines, (3) SVMLinear: support vector machines with linear kernel, (4) MARS: multivariate adaptive regression spline, (5) kNN: k-nearest neighbors, (6) SVMRadial: support vector machines with radial basis function kernel, (7) Cubist, (8) GBM: stochastic gradient boosting and (9) RF: random forest
    Land price maps for the year 2015 predicted from officially published land price observations using machine learning algorithms (ordered from left to right, up to down): (1) GLM: generalized linear model, (2) GAMS: generalized linear model using splines, (3) SVMLinear: support vector machines with linear kernel, (4) MARS: multivariate adaptive regression spline, (5) kNN: k-nearest neighbors, (6) SVMRadial: support vector machines with radial basis function kernel, (7) Cubist, (8) GBM: stochastic gradient boosting and (9) RF: random forest
    Fig. 10. Land price maps for the year 2015 predicted from officially published land price observations using machine learning algorithms (ordered from left to right, up to down): (1) GLM: generalized linear model, (2) GAMS: generalized linear model using splines, (3) SVMLinear: support vector machines with linear kernel, (4) MARS: multivariate adaptive regression spline, (5) kNN: k-nearest neighbors, (6) SVMRadial: support vector machines with radial basis function kernel, (7) Cubist, (8) GBM: stochastic gradient boosting and (9) RF: random forest
    Maps of differences in the 2015 land prices between the best-performing machine learning algorithms: (1) RF: Random Forest, (2) Cubist, (3) MARS: Multivariate Adaptive Regression Spline and (4) GAMS: Generalized Linear Model using Splines and kriging exponential model. A1, A2, A3, and A4 show zoomed-in maps of Koriyama city and its outskirts
    Fig. 11. Maps of differences in the 2015 land prices between the best-performing machine learning algorithms: (1) RF: Random Forest, (2) Cubist, (3) MARS: Multivariate Adaptive Regression Spline and (4) GAMS: Generalized Linear Model using Splines and kriging exponential model. A1, A2, A3, and A4 show zoomed-in maps of Koriyama city and its outskirts
    Area percentage of RF- and krig.EXP-based estimated land price for the year 2015 distributed by predefined ranges in Fukushima prefecture and its subregions
    Fig. 12. Area percentage of RF- and krig.EXP-based estimated land price for the year 2015 distributed by predefined ranges in Fukushima prefecture and its subregions
    Estimation approachStudyStudy areaMethod(s)MappingObjectiveHighlighted results
    Hedonic models(Löchl, 2006)Canton Zurich, SwitzerlandHedonic regressionYesDeveloping an estimation model of rent and land pricesTwo classified maps of land prices for residential and commercial uses
    (Kim and Kim, 2016)Seoul, South KoreaOLS and spatial regression modelsNoEstimation of land value using OLS and generalized regression modelsSpatial error model (SEM) found to be the best of the tested models
    (Hilal et al., 2016)Côte-d’Or, FranceOLSNoEstimation of the price of agricultural lands at cadastral levels based on previous real estate transactionsHedonic prices were calculated based on a range of attributes influencing agricultural lands most notable time effects
    Geostatistical methods(Luo and Wei, 2004)Milwaukee, Wisconsin, USAKrigingNoPredicting urban land values of different land use categories using kriging modelsOverall average standard error of 2%
    (Chica-Olmo, 2007)City of Granada, SpainKriging and cokrigingYesEstimating and mapping housing prices using kriging and cokriging approachesCokriging has a lower standard error compared with that of kriging
    (Inoue et al., 2007)Tokyo 23 wards, JapanKrigingYesMapping estimated land prices in Tokyo’s 23 wards from 1975 to 2004Kriging model-based results were more accurate than those for OLS with the average error ranging from 2% to 10%
    Geostatistical methods(Tsutsumi et al., 2011)Tokyo metropolitan area, JapanRegression krigingYesDeveloping a system to estimate and map residential land price in the Tokyo metropolitan area10% was the average error ratio for the exponential model but 18.3% for the Gaussian model
    (Kuntz and Helbich, 2014)Metropolitan area of Vienna, AustriaKriging and cokrigingYesMapping predicted real estate pricesUniversal cokriging showed better results in terms of cross-validation results
    (Chica-Olmo et al., 2019)City of Grenada, SpainRegression and universal cokrigingYesSpatiotemporally estimating housing price variations 1988-2005Regression cokriging was found to be slightly better
    (Palma et al., 2019)ItalyJackknife krigingNoPredicting real estate prices based on socioeconomic factors for the period 2014-2016Accuracy of the model improved when considering the spatio-temporal correlation
    Machine learning algorithms(Gu et al., 2011)A district of Tangshan city, ChinaHybrid genetic algorithm and support vector machine model (G-SVM), Grey Model (GM)NoForecasting housing pricesG-SVM outperformed GM in many aspects
    (Antipov and Pokryshevskaya, 2012)Saint Petersburg, RussiaMachine learning algorithmsNoEstimating residential apartmentsRandom forest was found to be the most robust among all methods
    (Wang et al., 2014)Chongqing city, ChinaSVM optimized by particle swarm optimization (PSO), BP neural networkNoForecasting real estate price based on PSO-optimized SVM compared to other BP neural networkPSO-SVM showed higher forecasting accuracy than BP neural network
    (Park and Bae, 2015)Fairfax County, Virginia, USAMachine learning algorithms (C4.5, RIPPER, Naïve Bayesian, and AdaBoost)NoPrediction of housing prices using different machine learning methodsRIPPER model outperformed all selected methods
    Comparison of various approaches(Bourassa et al., 2010)Jefferson County, Kentucky, USAOLS, nearest neighbors, geostatistical and trend surface modelsNoComparing the outcomes of several methods estimating house pricesThe geostatistical model showed better results in terms of prediction errors
    (Sampathkumar et al., 2015)Chennai metropolitan area, IndiaMultiple regression and neural networkNoModeling and estimation of land prices based on economic and social factorsNeural network and multiple regression performed well with a slight superiority of the former
    (Hu et al., 2016)Wuhan city, ChinaEmpirical Bayesian kriging (EBK), GWR, OLSYesModeling and visualizing dependency of urban residential land price and the influential variablesEstimated coefficients of variables impacting land prices depend on the location based on GWR results which outperformed OLS
    (Schernthanner et al., 2016)Potsdam, GermanyHedonic regression, kriging, and random forestYesComparing estimated rental prices by three methods and visualize the outcomeRF found to be the most accurate method
    Table 1.

    Descriptive list of reviewed literature regarding land price estimation/mapping grouped by estimation approach: (1) hedonic models, (2) geostatistical methods, (3) machine learning algorithms, and (4) comparison of various approaches

    CategoryModelAbbreviationR package
    GeostatisticalUniversal krigingExponentialkrig.EXPgstat (Pebesma, 2004)
    Gaussiankrig.GAU
    Sphericalkrig.SPH
    Table 2.

    The three mathematical models used for kriging and their abbreviations

    CategoryModelAbbreviationR package
    LinearGeneralized linear modelGLMbase
    Generalized additive model using splinesGAMSmgcv
    Support vector machines with linear kernelSVMLinearkernlab
    NonlinearMultivariate adaptive regression splineMARSearth
    k-nearest neighborskNNbase
    Support vector machines with radial basis function kernelSVMRadialkernlab
    Regression treesCubistCubistCubist
    Stochastic gradient boostingGBMgbm (Ridgeway, 2005)
    Random forestRFrandomForest (Breiman, 2001)
    Table 3.

    Summary of spatial prediction models used in this study: Linear, nonlinear, and regression trees models are grouped as proposed by Kuhn and Johnson (2013). Abbreviations are used to refer to each method in the manuscript

    Explanatory variablesDataGIS functionVariable descriptionAbbreviation
    Distance to the nearest railway station (m)Railway stationsNearCalculated using the railway stations layerDistance
    Area of rice fields [m2]Land uses within a square kilometerSpatial JoinThe areas of different land-uses within one square kilometer classified according to the National Land Numerical InformationPaddy
    Area of other agricultural land (m2)Agricultural
    Area of forests (m2)Forests
    Area of uncultivated land (m2)Uncultivated
    Area of roads (m2)Roads
    Area of railways (m2)Railways
    Area of other land uses (m2)Other uses
    Area of water bodies (m2)Water
    Area of seashore (m2)Seashore
    Area of the surface of the sea (m2)Sea
    Area of golf courses (m2)Golf
    Dummy variable for urbanization promoting areaPromoted urbanization areasSpatial JoinA dummy variable; if the point location falls inside the area, the variable value receives 1, else 0Promotion
    Population density (persons/km2)PopulationSpatial JoinCalculated using the population data of 2015 for every minor municipal districtDensity
    Number of enterprisesEnterprisesSpatial JoinStatistical GIS data of 2015 for every minor municipal districtEnterprises
    Number of employeesEmployeesEmployees
    Elevation (m)DEMExtract Multi Values to PointsElevation of the point locationElevation
    Table 4.

    List of explanatory variables selected in this study with their data sources and the related abbreviations

    Data layersSourceYear
    Land price observations (published and prefectural)National Land Numerical Information2015
    Railway stations2015
    Land uses within 1 km2 area and their areas2014
    Promoted urbanization areas2011
    Population of every minor municipal districtStatistics Bureau of Japan2015
    Number of enterprises and employees of every minor municipal district
    DEMUSGS-
    Table 5.

    Overview of datasets used in the study, their sources, and the year of release

    VariablesUnitCoefficients’ estimate
    Intercept-4.439***
    Distance to the nearest railway stationm-2.09 × 10-5***
    Population densitypersons/km23.104 × 10-5***
    Area of rice fieldsm2-3.935 × 10-7***
    Area of other agricultural landm2-4.731 × 10-7***
    Area of forestsm2-2.733 × 10-7***
    Area of uncultivated landm2-7.437 × 10-7.
    Area of roadsm27.211 × 10-7**
    Area of railwaysm2-3.301 × 10-8
    Area of other land usesm2-8.97 × 10-8
    Area of water bodiesm2-3.086 × 10-7***
    Area of seashorem2-1.922 × 10-6
    Area of the surface of the seam2-1.25 × 10-7
    Area of golf coursesm2-5.843 × 10-8
    Dummy variable for urbanization promoting area-1.819 × 10-1***
    Elevationm-1.556 × 10-4**
    Number of enterprises-3.363 × 10-4**
    Number of employees--2.951 × 10-5*
    Number of samples = 1092; residual standard error = 0.1683, multiple R2 = 0.7408, adjusted R2 = 0.7349; F-statistic = 125.7, p-value = < 2.2 × 10-16*** = sign. at 1% level ** = sign. at 5% level
    Table 6.

    Regression results with detailed explanatory variables and their estimated coefficients

    Mathematical modelsValidationCross-validation
    RMSEV (%)RMSECV (%)
    Exponential15.3215.1
    Gaussian15.8615.57
    Spherical15.5715.5
    Table 7.

    Prediction errors of validation and cross-validation tests for the three kriging models

    Method10-fold cross-validationTesting samplesDifference
    MAE (%)RMSE (%)R2CV (%)R2test (%)R2CV (%) - R2test (%)
    LinearGLM13.5017.2972.4759.94+12.53
    GAMS12.0315.3778.1368.72+9.41
    SVMLinear13.3817.2572.7359.12+13.61
    NonlinearMARS12.1115.5277.9070.78+7.12
    kNN13.3817.3572.2468.03+4.21
    SVMRadial12.5516.2775.5370.02+5.51
    Regression treeCubist12.1915.6077.7272.74+4.98
    GBM12.1615.6877.4070.83+6.57
    RF11.3914.9779.1777.68+1.49
    Table 8.

    Prediction errors and accuracy of machine learning methods

    Ahmed DERDOURI, Yuji MURAYAMA. A comparative study of land price estimation and mapping using regression kriging and machine learning algorithms across Fukushima prefecture, Japan[J]. Journal of Geographical Sciences, 2020, 30(5): 794
    Download Citation