Abstract

While it remains the primary source of safe drinking and irrigation water in northwest Iran’s Maku Plain, the region’s groundwater is prone to fluoride contamination. Accordingly, modeling techniques to accurately predict groundwater fluoride concentration are required. The current paper advances several novel data mining algorithms including Lazy learners [Instance-Based K-nearest Neighbours (IBK); Locally Weighted Learning (LWL); and KStar], a tree-based algorithm (M5P) and a meta classifier algorithm [Regression by Discretization (RBD)] to predict groundwater fluoride concentration. Drawing on several groundwater quality variables (e.g., Ca2+,Mg2+,Na+,K+,HCO3,CO23,SO24 and Cl concentrations), measured in each of 143 samples collected between 2004 and 2008, several models predicting groundwater fluoride concentrations were developed. The full dataset was divided into two subsets: 70% for model training (calibration) and 30% for model evaluation (validation). Models were validated using several statistical evaluation criteria and three visual evaluation approaches (i.e., scatter plots, Taylor and Violin diagrams). Although Na+ and Ca2+ showed the greatest positive and negative correlations with fluoride (r = 0.59 and -0.39, respectively), they were insufficient to reliably predict fluoride levels; therefore, other water quality parameters, including those weakly correlated with fluoride, should be considered as inputs for fluoride prediction. The IBK model outperformed other models in fluoride contamination prediction, followed by KStar, RBD, M5P, and LWL. The RBD and M5P models were the least accurate in terms of predicting peaks in fluoride concentration values. Results of the current study can be used to support practical and effective management of water and groundwater resources. This article is protected by copyright. All rights reserved.

*Original abstract online at https://www.ncbi.nlm.nih.gov/pubmed/31736062