Highlights
- Three algorithms, Random Forest, Artificial Neural Network, and Logistic Regression classifiers, were employed for the Prediction of groundwater fluoride
- The variables influencing the fluoride in the study area were evaluated
- Python 3.7 was used for the analysis of the three models
Abstract
Groundwater fluoride is posing a health risk to humans, and analyzing groundwater quality is time-wasting and expensive. Statistical methods provide a valuable approach to study the spatial distribution of groundwater fluoride. Random Forest (RF), Artificial Neural Network (ANN), and Logistic Regression (LR) were used in this study for groundwater fluoride prediction in Datong Basin. The groundwater chemistry of 482 groundwater samples was collected and used to figure out the performance of three statistical technologies and extract the main factors controlling the enrichment of fluoride in groundwater. The data was separated into two parts for the statistical analysis, 80% for training and 20% for testing. The Chi-squared was applied to select the most relevant variables, and TDS, Cl–, NO3–, Na+, HCO3–, SO42-, K+, Zn, Ca2+, and Mg2+ were selected as best inputs for the fluoride prediction. Models were evaluated using the confusion matrix and The receiver operating characteristic area under the curve ROC (AUC). The results suggest that within ten input variables, the accuracies of RF, ANN, and LR were 0.89, 0.85, and 0.76, respectively. The mean decrease in impurity (MDI) and permutation feature demonstrates that eight of ten parameters, including TDS, Cl–, NO3–, Na+, HCO3–, SO42-, Ca2+ and Mg2+ are the variables influencing the groundwater fluoride in the study area. RF exhibited the best model with high conformity and confidence in predicting groundwater fluoride contamination in the study area.
Keywords
*Original article online at https://www.sciencedirect.com/science/article/abs/pii/S0883292721001852