Research Studies
Study Tracker
Geochemical and machine learning approaches to groundwater fluoride prediction in Karaga District, Northern Ghana.Abstract
Full-text original study online at
https://www.nature.com/articles/s41598-026-45867-6
Fluoride contamination of groundwater affects over 200 million people globally, with Africa serving as a primary hotspot. The Karaga District in Ghana’s Northern Region represents a critical fluoride hotspot, where 4 out of 10 children likely face exposure to concentrations exceeding 1.5 mg/L. Despite being identified as high-risk, the specific geochemical mechanisms controlling fluoride mobilization in the district’s Voltaian Supergroup aquifers remain inadequately understood, limiting the development of targeted mitigation strategies. This study aimed to develop and validate an integrated framework combining geochemical modelling, compositional data analysis, and machine learning to predict fluoride concentrations and elucidate mobilization mechanisms in Karaga District’s groundwater. About 34 groundwater samples from the Karaga District were collected and analyzed for hydrochemical parameters. The data was processed using PHREEQC for geochemical modelling and isometric log-ratio transformation for compositional analysis. Additionally, 6 supervised machine learning algorithms were trained on 152 archived samples from neighbouring districts and subsequently validated using the 34 newly collected groundwater samples. A mechanistic Mobility Index was developed using fluoride-independent components and entropy-based weighting. Fluoride concentrations ranged from 0.07 to 6.04 mg/L, with 17.6% exceeding WHO guidelines. Na-HCO3 waters dominated (64.7%), but Na-Cl waters exhibited the highest fluoride (mean 3.75 mg/L), revealing that evaporite dissolution drives extreme contamination. Machine learning identified total dissolved solids and pH as primary predictors, demonstrating nonlinear fluoride behaviour. The Multilayer Perceptron model achieved R2 of 0.668, while the Mobility Index demonstrated exceptional discrimination for WHO exceedance (AUROC 0.94), with robust spatial transferability across communities. This integrated approach provides a mechanistically grounded, field-deployable framework for fluoride risk assessment. The Mobility Index enables cost-effective community screening using only basic measurements, supporting targeted intervention strategies in fluoride-endemic regions globally.
EXCERPTS
Introduction
Fluoride contamination of groundwater is a widespread global issue that affects over 200 million people in more than 100 countries, with Africa, Asia, and Latin America serving as primary hotspots for this environmental and public health challenge1. The severity and global distribution of this contamination is exemplified by regional statistics: in India, fluoride levels in groundwater can reach up to 48 mg/L, affecting over 90 million people with dental and skeletal fluorosis2, while in Mexico, 97% of tested groundwater samples exceeded fluoride limits, with concentrations reaching 8.8 mg/L3.
The health implications of fluoride exposure demonstrate clear dose-response relationships that vary according to concentration and duration of exposure. Chronic exposure to fluoride above 1.5 mg/L is directly linked to dental and skeletal fluorosis, while prolonged exposure can cause arthritis, kidney, liver, and neurological problems4; Sunkari & Ambushe, 2024). Children represent a particularly vulnerable population, with studies in Thailand revealing a 54.3% prevalence of dental fluorosis among children exposed to groundwater containing > 1.5 ppm fluoride5. Beyond skeletal effects, elevated fluoride levels are associated with hypertension and cardiovascular impairment in endemic regions6, underscoring the critical importance of accurate prediction and risk assessment capabilities for fluoride-affected groundwater systems.
The Karaga District in Ghana’s Northern Region exemplifies the severity of geogenic contamination in West African sedimentary aquifers and represents a critical fluoride hotspot that demands urgent scientific attention. Recent country-wide hazard modelling identified the Karaga District as one of the most severely affected areas in Ghana, where 4 out of 10 children are potentially exposed to fluoride concentrations exceeding 1.0 mg/L7. The geological setting of the Karaga District is particularly problematic for fluoride contamination, as the weathering of fluoride-bearing rocks from the Voltaian Supergroup and dissolution of fluoride-rich minerals (fluorapatite, amphiboles, fluorite, biotite, and muscovite) create hydrogeochemical conditions conducive to elevated groundwater fluoride concentrations8, 2025a). The Voltaian formations contain fluoride-bearing minerals within their mudstones and sandstones, with documented fluoride concentrations that frequently exceed safe drinking water limits in neighbouring districts9. The broader Northern Ghana region, including the Karaga District, falls within an estimated 920,000 people at risk from fluoride contamination, with approximately 240,000 children (0–9 years) living in at-risk areas7.
The mobilization of fluoride in groundwater systems is governed by complex, interconnected geochemical processes that operate across multiple spatial and temporal scales. Primary control mechanisms include the presence and dissolution of fluoride-bearing minerals, with fluorite (CaF2), fluorapatite, biotite, muscovite, hornblende, and amphiboles serving as the principal geological sources10,11,12. Long-term water-rock interactions facilitate fluoride leaching from host minerals, with the weathering of granitic and basaltic aquifers being particularly effective in releasing fluoride through silicate dissolution processes10,13.
The solubility and mobility of fluoride in groundwater systems are strongly influenced by hydrogeochemical parameters including pH, calcium concentration, and alkalinity. High pH conditions enhance fluoride solubility by promoting desorption from mineral surfaces and facilitating fluorite dissolution11,14. Low calcium levels reduce fluorite saturation, enabling more fluoride to remain in solution rather than precipitating as calcium-fluoride minerals11,13. Alkalinity, represented primarily by bicarbonate (HCO3–) concentrations, exhibits positive correlations with fluoride levels by facilitating mineral dissolution and competitive desorption from mineral surfaces12.
Competitive ion effects play a significant role in fluoride mobilization, particularly the presence of competing anions such as OH– and HCO3–, which compete with fluoride for adsorption sites on mineral surfaces including goethite and gibbsite, thereby enhancing fluoride mobility in alkaline environments11,14. Additionally, sodium-rich water types (Na-HCO3) are frequently associated with higher fluoride concentrations, likely due to cation exchange processes that alter the ionic strength and competitive equilibria within the groundwater system10.
Various methodological approaches have been employed to predict fluoride concentrations in groundwater, each offering distinct advantages while suffering from specific limitations that constrain their effectiveness in complex hydrogeological settings. Statistical methods including Random Forest (RF), Artificial Neural Networks (ANN), and Logistic Regression (LR) have demonstrated varying degrees of success, with RF achieving 89% accuracy, followed by ANN (85%) and LR (76%) in Chinese groundwater systems15. Kriging interpolation methods have proven effective for spatial prediction of fluoride concentrations based on point measurements in Pakistani aquifers16, though these geostatistical approaches assume stationarity and may oversimplify regional variability in complex geological settings.
Thermodynamic modelling approaches utilizing software such as PHREEQC have provided mechanistic insights into geochemical processes controlling fluoride enrichment, with simulations revealing that fluoride mobilization is often driven by fluorite dissolution and other fluoride-bearing mineral interactions, supported by specific pH and calcite precipitation conditions17. However, thermodynamic models require detailed chemical input parameters and accurate mineral assemblage data that are often unavailable under field conditions, limiting their applicability in data-sparse environments.
Machine learning applications have gained considerable popularity in hydrogeochemistry, with advanced algorithms including Extreme Learning Machine (ELM), Support Vector Machine (SVM), Multilayer Perceptron (MLP), and XGBoost demonstrating strong predictive capabilities. In Punjab, India, ELM achieved impressive performance with R2 of 0.95 and RMSE of 0.3318, while in Turkey, XGBoost and Convolutional Neural Networks emerged as top performers for fluoride prediction using diverse water quality parameters19. In Pakistan, Random Forest modelling successfully mapped high-risk fluoride zones and estimated that approximately 13 million people were exposed to concentrations exceeding 1.5 mg/L20.
Compositional data analysis (CoDA) methods, particularly Principal Component Analysis (PCA) applied to hydrogeochemical datasets, have been employed to identify underlying processes influencing fluoride levels, revealing relationships between fluoride and parameters such as Na+, HCO3-, and total dissolved solids21,22; Sunkari et al., 2025b). However, most fluoride modelling studies apply PCA heuristically without proper compositional data transformations, potentially introducing spurious correlations and misrepresenting relationships among chemical constituents. The application of rigorous CoDA frameworks, including isometric log-ratio (ilr) transformation, remains a notable gap in fluoride prediction literature.
Despite the diversity of approaches applied to fluoride prediction, several fundamental knowledge gaps and methodological limitations persist that constrain the development of robust, transferable prediction frameworks. Single method approaches frequently fail to capture the nonlinear, multivariate nature of fluoride mobilization and transport processes. Geostatistical methods such as kriging assume spatial stationarity and may oversimplify regional variability in heterogeneous geological settings15. Thermodynamic models, while mechanistically robust, rely heavily on accurate input parameters including saturation indices and detailed mineral presence data that are rarely available in field applications17.
This study advances fluoride risk assessment beyond ML-only or geochemistry-only workflows by (i) coupling PHREEQC-based thermodynamic interpretation with rigorously transformed compositional features (ilr/CoDA) to avoid spurious correlation artefacts; (ii) enforcing leakage-aware, fluoride-blind feature design for supervised prediction and validating models on an independently collected Karaga dataset; and (iii) introducing a mechanistically interpretable Mobility Index (MI) that translates geochemical controls into a screening-oriented risk signal, supported by calibration and discrimination testing. Related integrative groundwater-quality and entropy-based risk studies have demonstrated the value of combining indexing, multivariate structure, and health-risk framing (Kumar & Singh23,24,25; however, they typically do not integrate CoDA-consistent geochemical representation with externally validated fluoride prediction and interpretability in a single, reproducible workflow.
A particularly significant limitation in current fluoride research is the underutilization of model interpretability methods. Machine learning studies often focus exclusively on predictive accuracy while neglecting interpretability tools such as SHAP (SHapley Additive exPlanations) or comprehensive feature importance analysis, which are crucial for understanding key variables influencing fluoride levels19. This limitation severely constrains the ability to translate complex models into actionable public health insights or evidence-based policy decisions, particularly in fluoride-endemic areas where mechanistic understanding is essential for effective intervention strategies.
Studies increasingly advocate for the integration of machine learning approaches with geochemical knowledge to improve both predictive power and mechanistic interpretability. Combining ion-specific geochemical inputs with ensemble machine learning models has demonstrated superior fluoride risk mapping capabilities in Pakistan20. Such hybrid approaches leverage the explanatory strength of mechanistic models with the pattern recognition capabilities of artificial intelligence, particularly valuable in settings where detailed chemical datasets are limited18.
Compositional data analysis methods remain significantly underutilized in fluoride research, despite their ability to address the closure problem inherent in hydrochemical data where concentrations sum to a constrained total. Most studies apply PCA without appropriate compositional data transformations, potentially creating spurious correlations and misrepresenting relationships among chemical elements21,22. Proper application of CoDA frameworks, including isometric log-ratio transformation, represents a critical methodological gap that limits the robust interpretation of multivariate hydrogeochemical relationships in fluoride systems.
The complexity of fluoride behaviour in sedimentary aquifer systems, combined with the critical public health implications in regions such as the Karaga District, necessitates the development of integrated methodological frameworks that combine mechanistic understanding with advanced predictive capabilities. Despite being identified as a high-risk fluoride area through national modelling studies, detailed mechanistic understanding of fluoride mobilization processes in the Karaga District’s Voltaian Supergroup aquifers remains limited, with existing studies focusing primarily on regional hazard mapping rather than local geochemical controls and process-based prediction.
This study aims to address these critical knowledge gaps by developing and validating an integrated approach that combines geochemical modelling, compositional data analysis, and machine learning techniques for predicting fluoride concentrations and understanding mobilization mechanisms in the Karaga District groundwater system. The specific objectives are to: (1) characterize the hydrogeochemical controls on fluoride mobilization in Voltaian Supergroup aquifers through comprehensive water chemistry analysis and thermodynamic modelling; (2) develop and validate machine learning models for predicting fluoride concentrations and WHO guideline exceedance using geochemically-informed feature sets; (3) apply rigorous compositional data analysis techniques, including isometric log-ratio transformation and sequential binary partitioning, to identify fundamental geochemical processes controlling fluoride behaviour; (4) integrate SHAP analysis and other interpretability methods to translate complex model predictions into mechanistic insights and actionable risk assessment tools; and (5) establish critical thresholds and develop a composite fluoride mobility index for early warning and targeted intervention strategies.
The integrated methodology is designed to be portable to similar sedimentary aquifer contexts; however, broader transferability beyond Northern Ghana should be evaluated using additional seasons and independent regions. The findings will advance both scientific understanding of fluoride geochemistry in sedimentary systems and practical capabilities for risk assessment, targeted mitigation strategies, and evidence-based decision-making in regions where geogenic fluoride contamination threatens water security and public health.
Materials and methods
Study area
Geographic setting and climate
The Karaga District is located in the northeast of Ghana’s Northern Region, covering an area of 3,119.3 km2 with a population of 114,225 as of 202126. The district capital, Karaga, is situated 24 km from Gushegu and 94 km from Tamale, the regional capital. Karaga District lies between latitudes 9°30’ South to North and longitudes 0° to 45’ West, bordering West and East Mamprusi to the north, Savelugu/Nanton to the west, and Gushegu to the south and east (Fig. 1a). The district experiences a tropical continental climate, with a rainy season from May to October and mostly dry conditions for the rest of the year. Annual rainfall ranges between 900 and 1000 mm, with the heaviest precipitation occurring in August and September. Temperatures remain high throughout the year, reaching a maximum of 36 degrees Celsius or higher in March and April, while the lowest temperatures are observed between November and February. The district’s vegetation is characterized by typical Guinea Savannah, consisting of tall grasses interspersed with drought-resistant trees such as shea and dawadawa, which serve as a source of income for the local population26.
(a) Study area map with sampling points and WHO-exceedance symbols, (b) geology.
Geology and Hydrogeology
The Karaga District is predominantly underlain by rocks of the Voltaian Supergroup, which covers most of northern Ghana and unconformably overlies the lower Proterozoic Birimian Supergroup, associated granitoids, and the lower to middle Protozoic Tarkwaian Supergroup27. The Voltaian Supergroup, dating from the late Proterozoic to early Paleozoic era, is divided into Upper, Lower, and Middle formations, dominated by sandstones with minor mudstone, arkoses, feldspars, shales, graywackes, siltstones, evaporites, limestones, and conglomerates. The sediments were primarily sourced from a glacial period followed by prolonged marine invasions28,29. The supergroup is further classified into three lithostratigraphic groups: Oti/Pendjari, Kwahu/Bombouaka, and Tamale/Obosum beds (in chronological order)30. The study area is predominantly composed of rocks from the Oti/Pendjari group and the Tamale/Obosum beds (Fig. 1b), primarily consisting of mudstone and sandstones27. Figure 1b presents the geological map of the district, including the mapped lithostratigraphic units of the Voltaian Supergroup and major structural lineaments/fault traces, together with towns and sampling locations. The dominant surface units are mudstone–siltstone packages with interbedded arkosic/lithic sandstones (Bimbila Formation) and undifferentiated mudstone–siltstone–sandstone units (Obosum Group), with localized sandstone-dominated units (Bunya and Panabako formations) and minor tuffaceous/laminated intervals (Darebe Member of the Kodjari Formation).
The hydrogeology of the area is mainly controlled by secondary porosity31. Due to the absence of primary porosity in the lithologies, groundwater occurrence is mostly attributed to secondary porosity caused by jointing, shearing, fracturing, and weathering. The success rate of drilling boreholes in the Oti-Pendjari Group is about 56%, with yields ranging from 0.41 to 9 m3/h and a mean yield of approximately 6.2 m3/h32. The recharge rate of the Voltaian varies between 2.07 × 10-5 m/day and 2.85 × 10-4 m/day, contributing about 0.3% to 4.1% of the region’s annual precipitation33,34.
Within the Voltaian sedimentary sequence, spatial variability in mudstone–sandstone proportions and the presence of carbonate-bearing and evaporite-influenced horizons can plausibly generate the observed contrast between Na–HCO3 waters (dominant, lower mean fluoride) and Na–Cl waters (minority, highest fluoride). In particular, mudstone-rich intervals and clay-bearing units promote cation exchange and calcium depletion (natural softening), while saline end-member signatures (Na–Cl) are consistent with evaporite dissolution or saline mixing that further suppresses Ca²+ availability, thereby enhancing fluorite undersaturation and fluoride persistence in solution.
Sampling
A total of 34 groundwater samples were collected from active public boreholes drilled and maintained by World Vision International in the Karaga District, Northern Ghana (Fig. 1a). The samples were collected using 0.5 L polyethylene bottles in October 2022. The sample bottles were thoroughly precleaned with deionized water, 10% nitric acid, and distilled water to ensure they were free from contamination35. Before sampling, the boreholes were pumped for about 5 min to purge the aquifers and prevent cross-contamination. The groundwater samples were filtered using hand-held syringes with filter heads of 0.45 um cellulose filter membrane. Physicochemical parameters such as pH, temperature, electrical conductivity (EC), and total dissolved solids (TDS) were monitored using water quality probes. The 0.5 L polyethylene bottles were tightly sealed, and to avoid chemical alterations, the samples were kept in an ice chest at 4 °C. The samples were then transported to the Ghana Atomic Energy Commission’s (GAEC) Laboratory for ion analysis. These 34 Karaga samples were reserved exclusively as an external validation set; model development used an independent archive of 152 groundwater samples from Bawku West, Garu, Gusheigu, Kintampo South, Saboba, Savelugu, Talensi, West Gonja, and Zabzugu (details in “Machine learning framework“).
Discussion
Hydrogeochemical controls on fluoride enrichment
Weathering and water-rock interactions
The hydrogeochemical analysis indicates that fluoride enrichment in the groundwater of the Karaga District is primarily governed by water–rock interactions, especially silicate weathering processes within the Voltaian Supergroup formations. As outlined earlier (Section …, Fig. 1), the study area is underlain mainly by mudstone–sandstone rich units belonging to the Oti/Pendjari Group and the Tamale/Obosum beds. The predominance of the Na–HCO3 water type (64.7%) reflects silicate weathering44 and natural groundwater softening through cation exchange, which lowers Ca2+ concentrations. This reduction in Ca2+ activity promotes fluorite undersaturation, thereby enhancing fluoride mobilisation. The highest fluoride concentrations occur in Na–Cl waters, suggesting a saline end-member influence likely linked to evaporite dissolution, further suppresses calcium availability and increases ionic strength, intensifying fluoride persistence. Overall, these hydrochemical patterns point to lithological heterogeneity (variations in mudstone–sandstone ratios, evaporitic layers, limestones, and conglomerates) as a major control on the geochemical pathways that drive extreme fluoride levels in the area33.
Evidence for residence time effects on fluoride mobilization is reflected in the relationship between salinity indicators and fluoride enrichment. The Gibbs diagram analysis (Fig. 2b) revealed that higher fluoride concentrations clustered predominantly in the evaporation domain, suggesting that evaporative concentration not only increases total dissolved solids but also enhances fluoride mobilization45. This pattern indicates that longer residence times, allowing for both progressive mineral dissolution and evaporative concentration, are crucial for achieving elevated fluoride concentrations. The spatial heterogeneity observed in fluoride distributions, with concentrations ranging from 0.07 to 6.04 mg/L with a mean of 1.34 ± 1.31 mg/L (Table 2), suggests variable flow paths and residence times across the aquifer system1. Comparative analysis of fluoride concentrations in other regions highlights the severity of the issue in the Karaga District and the role of geological setting in controlling fluoride mobilization. In the Nubian Sandstone Aquifer System of North Africa, which shares similarities with the Voltaian Supergroup in terms of sedimentary rock dominance fluoride concentrations range from 0.3 to 2.5 mg/L, with higher levels attributed to groundwater circulation in deeper, confined portions of the aquifer46. The Karaga District’s fluoride levels (0.07 to 6.04 mg/L) extend to even higher concentrations, potentially reflecting more intensive weathering or localized mineral enrichment. In the volcanic-hosted Main Ethiopian Rift aquifer system, fluoride concentrations up to 68 mg/L have been reported, with the highest levels associated with rhyolitic and basaltic lava flow dissolution47. While the Karaga District’s fluoride levels are lower than these extreme values, they still exceed the concentrations found in many other sedimentary aquifers worldwide, such as the Rio Cuarto sedimentary aquifer in Argentina, where fluoride values ranged from 0.12 to 0.6 mg/L, attributed to Ca–HCO3-type groundwater with high calcium content that suppresses fluoride mobility through CaF2 precipitation48.
Geochemical mechanisms of fluoride mobilization
Water type influence
The relationship between hydrochemical facies and fluoride enrichment reveals distinct evolutionary pathways for fluoride mobilization across different water types. The Na-HCO3 water type, while dominant, showed relatively modest fluoride concentrations with fluoride concentrations (Table S3), suggesting that basic silicate weathering processes alone are insufficient to generate severe fluoride contamination. In contrast, Na-Cl waters comprised 8.8% of samples and exhibited the highest fluoride concentrations, ranging from 1.40 to 6.04 mg/L (mean = 3.75 ± 2.32 mg/L), likely driven by evaporite dissolution and cation exchange that reduces calcium and enhances fluoride mobility49,50. This pattern demonstrates that the evolution toward more saline water types facilitates enhanced fluoride mobilization.
Ion exchange processes significantly impact fluoride behaviour, as evidenced by the compositional data analysis results. The strong negative correlation between Na+ and K+ (r = -0.49) suggests ion-exchange processes, while the positive correlation between K+ and Mg2+ (r = 0.55) implies that fertilizer-impacted areas also exhibit elevated magnesium from parallel weathering processes (Fig. 3). The intermediate K-HCO3 water type shows elevated fluoride concentrations with fluoride concentrations ranging from 1.00 to 4.70 mg/L (mean = 2.54 ± 1.56 mg/L), likely indicating areas influenced by fertilizer inputs or K-feldspar weathering processes51. This suggests that ion exchange processes involving potassium release may create geochemical conditions favourable for enhanced fluoride mobilization.
The evolution of water chemistry along flow paths is clearly demonstrated by the principal component analysis results. PC1 represents a “Salinity & Weathering Intensity Axis” capturing a continuum from low-TDS, magnesium-carbonate-dominated waters (negative scores) to high-TDS, mixed evaporite/carbonate-weathering and agricultural waters (positive scores). The biplot analysis reveals that samples in the upper-right quadrant (high PC1, high PC2) combine high TDS/weathering with strong fluoride mobilization, while upper-left samples (low PC1, high PC2) represent low-salinity but fluoride-rich waters characteristic of geogenic fluoride release in dilute aquifers. This spatial organization suggests that different flow path evolutionary stages create distinct hydrogeochemical environments, with both fresh geogenic waters and evolved saline waters capable of supporting elevated fluoride concentrations through different mechanistic pathways.
The mixed water types provide evidence for hydrogeochemical mixing processes that influence fluoride behaviour. The Na-HCO3-Cl mixed type (5.9% of samples) showed intermediate characteristics with fluoride levels from 0.24 to 1.56 mg/L (mean = 0.90 ± 0.93 mg/L) (Table S3), indicating hydrochemical mixing between silicate weathering and evaporitic influences. This intermediate behaviour suggests that the transition between different water types may create transient geochemical conditions that either enhance or suppress fluoride mobilization, depending on the specific mixing ratios and competitive ion effects operating within the system.
Evaluation of machine learning approaches
Comparative performance of models
The multilayer perceptron achieved the highest cross-validated performance (R² = 0.668 ± 0.189; MAE = 0.654 ± 0.141), indicating moderate explanatory power given the limited sample size and the heteroscedastic, tail-heavy fluoride distribution. Rather than relying solely on predictive accuracy, the principal contribution of this framework is that it constrains modelling to fluoride-blind, geochemically interpretable predictors, quantifies mechanistic controls via SHAP/feature importance, and converts these controls into an operational screening tool (the Mobility Index) with strong discrimination for WHO exceedance. This finding aligns with several studies conducted in India52 Pakistan53 and China54, which have similarly demonstrated the nonlinear nature of factors influencing fluoride levels in groundwater. Also, non-linear ensemble methods, specifically histogram-based gradient boosting and random forest, provided strong secondary performance, with histogram-based gradient boosting achieving R2 of 0.648 ± 0.208 and random forest yielding R2 of 0.586 ± 0.156. In contrast, linear regression models performed poorly, with ridge and lasso yielding negative R2 values that indicate substantial overfitting and failure to generalize across validation folds. XGBoost demonstrated moderate performance with R2 of 0.350 ± 0.090. This stark performance hierarchy reveals a fundamental characteristic of fluoride behaviour in hydrogeochemical systems: the relationship between fluoride concentrations and predictive variables is inherently nonlinear and cannot be adequately captured by linear parameterizations. Model comparison results are reported under the nested cross-validation and Optuna tuning protocol described in Methods (Machine learning framework), with full optimisation specifications provided in Supplementary Methods S-ML1. Optuna convergence behaviour is illustrated in Fig. 5c, and the resulting best-parameter configurations are summarised in Table 4. Feature importance analysis revealed that total dissolved solids dominated predictions with an importance score of 0.328, reflecting salinity control on fluoride mobility. pH (0.140) and saturation indices for calcite (0.139) and magnesite (0.135) captured the thermodynamic controls governing fluoride speciation and mineral dissolution. Individual ion concentrations, including chloride (0.071) and bicarbonate (0.035), showed secondary importance. This ranking validates our mechanistic understanding that bulk water properties and carbonate equilibrium are more influential than specific ion concentrations, emphasizing that fluoride mobilization is regulated by overall mineralization and pH-driven changes in calcium availability. The learning curve analysis indicated a high-variance regime in which validation R2 improves markedly with training set size but remains consistently below the training R2. This gap suggests that gains in predictive accuracy would likely emerge from collecting additional data and employing tail-aware or heteroscedastic modelling strategies to better capture extreme fluoride levels. Residuals showed increasing variance with fitted values and a slight negative bias at the upper range, reflecting underprediction of rare, high-fluoride samples. These patterns indicate that while the model generalizes reasonably well across the typical fluoride concentration range, its ability to predict extreme values remains limited by both data scarcity and inherent model limitations in handling the heteroscedastic noise structure in high-fluoride regimes.
Integration of geochemical and machine learning insights
Single-method approaches often fail to capture the nonlinear and multivariate nature of fluoride mobilisation in heterogeneous hydrogeochemical systems. Thermodynamic models are mechanistically grounded but depend on reliable mineralogical constraints and input chemistry, while purely statistical or geostatistical approaches may oversimplify spatial heterogeneity. In this study, we integrate PHREEQC-derived thermodynamic descriptors and compositional balances (CoDA) with machine-learning pattern recognition to connect predictive signals to chemically interpretable processes.
Interpretability analyses show that the model relies primarily on bulk mineralisation and carbonate-equilibrium indicators rather than any single dissolved ion. SHAP patterns indicate that higher total salinity (TDS) and ionic strength, together with higher pH and carbonate saturation behaviour (e.g., SI calcite and SI magnesite), consistently push fluoride predictions upward, consistent with geochemical controls on calcium activity and fluorite solubility. Dependence plots further suggest nonlinear threshold behaviour, with carbonate-equilibrium indicators switching influence as waters evolve from clearly undersaturated toward near-saturation conditions, consistent with carbonate precipitation reducing free Ca2+ and favouring fluoride persistence55. Sample-level predictions through waterfall plots demonstrated mechanistic coherence: concentrated, alkaline waters produced elevated predictions through elevated Mg2+ and positive magnesite saturation, while dilute, acidic waters remained suppressed due to low calcite saturation and low TDS56,57.
These model-derived patterns align with hydrochemical evolution observed in the Karaga dataset. The dominance of Na–HCO3 waters (64.7%) reflects silicate weathering and base-exchange processes typical of Voltaian Supergroup settings, whereas the Na–Cl facies (8.8%) exhibits the highest fluoride concentrations (1.40–6.04 mg/L), indicating that evolution toward more saline waters through saline mixing and/or evaporite-related inputs coupled with cation exchange corresponds to conditions that favour enhanced fluoride mobilisation. The intermediate K–HCO3 facies also shows elevated fluoride (mean 2.54 ± 1.56 mg/L), consistent with ion-exchange–driven geochemical shifts that can support fluoride liberation. Overall, the convergence between facies evolution and ML interpretability indicates that the model is encoding chemically coherent processes rather than spurious correlations.
Critical thresholds and nonlinear relationships
Fluorite undersaturation emerges as the critical mineral control governing fluoride mobilization, with a strong positive correlation of r = 0.75 between fluorite saturation indices and measured fluoride concentrations. All groundwater samples in the study area were undersaturated with respect to fluorite (SI ranging from -8.48 to -1.09, mean = -3.37 ± 1.51), confirming that fluorite dissolution is thermodynamically favoured throughout the Karaga District and provides the dominant fluoride source. This widespread undersaturation demonstrates that only waters positioned near the fluorite equilibrium boundary maximize fluoride concentrations while remaining in the dissolution regime. Calcite saturation showed threshold behaviour, with SHAP dependence analysis revealing negative contributions when clearly undersaturated (SI < -2.5) and positive effects approaching saturation (SI > 0), mechanistically linked to CaCO3 precipitation reducing free calcium and favouring fluoride persistence through reduced fluorite formation. pH exhibited a monotonic positive relationship across the full measured range, consistent with surface deprotonation and alkaline desorption mechanisms that mobilize fluoride from mineral surfaces. Salinity variables displayed critical synergistic thresholds: TDS showed minimal influence below ~ 1000 mg/L but sharp positive effects at elevated concentrations, particularly when coupled with pH > 7.5. Ionic strength demonstrated similar behaviour with threshold effects near 0.05 mol/L, with tail effects revealing suppression at extreme levels coupled with low pH due to charge shielding mechanisms57,55. Water-type evolution toward Na-HCO3 facies comprising 64.7% of samples with mean fluoride 0.83 mg/L created high-risk hydrogeochemical conditions, while the highest fluoride concentrations (1.40–6.04 mg/L) occurred in Na-Cl waters comprising only 8.8% of samples, demonstrating compositional transitions marking critical thresholds for fluoride mobilization.
Public health and management implications
Risk assessment framework
The binary classification model provides a quantitative foundation for risk stratification, achieving excellent discrimination (AUC = 0.94) but with diagnostic trade-offs relevant for operational deployment. On external Karaga samples, the classifier demonstrated 76.5% accuracy with high specificity (82.1%) and negative predictive value (88.5%), effectively ruling out exceedance cases. However, sensitivity was limited at 50.0%, missing half of true exceedances, indicating a conservative decision threshold with prevalence of 17.6%. This performance profile suits the model for first-pass screening to identify wells unlikely to exceed the WHO standard of 1.5 mg/L, with sensitivity enhancement possible through lowering the probability threshold or implementing class-weighted training strategies while monitoring false positive trade-offs. Geochemical characterization reveals actionable risk indicators for targeted intervention. High-risk communities exhibit Na-HCO3 or Na-Cl water types with high TDS (> 1000 mg/L), pH > 7.5, and low calcium concentrations (< 20 mg/L), particularly in locations like Tong, Tamaligu, Nyong Kuma, and Bagurugu Fulaniyili where fluoride concentrations ranged from 0.07 to 6.04 mg/L. Conversely, communities with Ca-Mg-HCO3 water types, low TDS (< 300 mg/L), and circumneutral pH represent lower-risk hydrogeochemical signatures. The Mobility Index framework demonstrates robust geographic transferability for expanded deployment. Spatial cross-validation by community-maintained discrimination (AUROC = 0.828 ± 0.069), with the standard deviation indicating robust transferability. Transfer to external areas required intercept adjustment (a* = -1.601) to correct baseline risk while preserving discrimination, enabling operationalization in new communities through minimal local parameterization without compromising mechanistic validity.
Mitigation strategies
Na-HCO3-type and Na-Cl waters present distinct mitigation challenges requiring targeted interventions. The dominant Na-HCO3 water type (64.7% of samples) with mean fluoride of 0.83 ± 0.39 mg/L represents alkaline, low-calcium conditions promoting fluoride mobility through calcium sequestration and alkaline desorption mechanisms. Na-Cl waters, comprising 8.8% of samples with the highest mean fluoride of 3.75 ± 2.32 mg/L, reflect halite or evaporite dissolution and potential saline intrusion that reduces calcium availability. For Na-HCO3-type waters, a multi-pronged approach is warranted: managed aquifer abstraction prioritizing Ca-Mg-HCO3 boreholes where available; water blending combining high-fluoride Na-HCO3 waters with fresher, higher-calcium sources to simultaneously reduce salinity and promote fluorite precipitation; and pH neutralization using calcium hydroxide to raise pH controllably while introducing Ca2+ ions that reduce fluoride solubility58. Point-of-use defluoridation performance can be sensitive to co-occurring salinity and competing ions; therefore, technology selection should explicitly account for the observed ionic strength/TDS range in Karaga groundwaters. Where higher salinity is present, pilot testing is recommended to confirm media performance under local water chemistry, and composite or modified sorbents may offer improved robustness depending on competing-ion loads. Agricultural activities intensify fluoride mobilization through complex mechanisms. Nitrate shows moderate positive correlation with fluoride (r = 0.34), suggesting deep flushing of fluoride-bearing minerals under intensive recharge from fertilizer use. The K+-Mg2+ correlation (r = 0.55) indicates fertilizer-impacted areas exhibit concurrent geochemical changes. Community-level interventions should prioritize: promoting organic or slow-release fertilizers in recharge zones; reducing abstraction intensity where agricultural demand drives deep drawdown; and constructing wetlands to attenuate agricultural runoff before recharge.
Early warning system development
The Mobility Index provides a cost-effective, field-operationalized early warning framework for community-level fluoride surveillance. Input parameters require only basic field instrumentation: electrical conductivity measured with portable probes, total derived from EC, pH with inexpensive meters, and routine major ion analysis available in regional laboratories.
In this study, MI computation is intended to be feasible once a standard major-ion dataset is available (field + routine laboratory chemistry), with implementation supported by a fixed-weight calculator so that end-users do not need to reproduce the full modelling workflow (Fig. 9).
Full size imageConceptual workflow linking hydrogeochemistry to operational screening. Field chemistry and hydrogeological context (major ions, pH, EC/TDS) are interpreted through PHREEQC-derived thermodynamic descriptors and compositional balances (CoDA/ilr). These fluoride-blind features support ML prediction and interpretability (feature importance/SHAP). Mechanistic components are aggregated into the fluoride-independent Mobility Index (MI), which is then calibrated for WHO exceedance screening to support targeted testing and intervention prioritisation.
These field-measurable inputs replace computationally intensive geochemical modelling (PHREEQC), which served a calibration role but need not accompany operational deployment. The framework requires only that entropy-weighted components computed during development remain locked during external application, preserving mechanistic interpretability while enabling practical implementation. Karaga District’s tropical continental climate with rainy season from May to October and dry season November to April creates seasonal fluoride variability amenable to temporal surveillance. Quarterly MI calculations capture seasonal transitions: dry season concentration effects potentially elevate fluoride through reduced recharge, while wet season dilution effects lower concentrations if recharge brings fluoride-poor waters. Threshold-based decision triggers categorize risk simply: wells with MI < 0.33 indicate very low risk; 0.33–0.67 intermediate risk; >0.67 high risk (corresponding to > 80% probability of exceedance based on logistic calibration AUROC = 0.976). Institutional implementation requires minimal training: District Health Directorate oversees monitoring and coordination, Regional Water Authority links results to infrastructure maintenance, and Community Water Committees conduct quarterly measurements and alert authorities. Color-coded community maps using MI categories enable spatially explicit identification of high-risk clusters prompting targeted intervention. Intercept adjustment (a* = -1.601) accommodates local prevalence shifts without recalibrating component weights, eliminating community-specific recalibration requirements. This framework complements rather than replaces direct fluoride analysis by serving as a cost-effective pre-screening tool, reducing analytical burden while prioritizing wells warranting immediate testing and treatment activation.
Limitations and future directions
Limitations
While this study demonstrates strong internal validation and external validation within a single district, several limitations constrain broader spatial transferability. First, the Karaga external validation dataset comprises only 34 samples collected in a single season (October 2022), which may not capture seasonal variation in groundwater chemistry and fluoride concentrations. Multi-seasonal sampling would strengthen confidence in year-round model applicability. Second, validation was performed solely within the Karaga District, a specific geological setting (Voltaian Supergroup, semi-arid climate). Extension to other regions, particularly those with different geological formations (e.g., granitic aquifers, coastal aquifers) or climatic regimes (e.g., tropical, temperate), requires additional multi-region validation studies. The model parameters and feature relationships identified here may not apply directly to fundamentally different hydrogeochemical systems. Third, the archived regional dataset (n = 152) used for model training, while valuable, represents a static snapshot and may not account for temporal trends in groundwater chemistry. Fourth, the Mobility Index has been validated specifically for predicting WHO guideline exceedance (1.5 mg/L threshold) and may require recalibration for other decision thresholds or health-based standards used in different jurisdictions. Despite these limitations, the methodological framework integrating CoDA, PHREEQC modeling, and interpretable ML is generalizable and can be adapted to other regions through similar multi-stage validation protocols.
Uncertainties
Uncertainty arises from (i) sampling limitations (single-season sampling and limited exceedance cases in the external set), (ii) measurement and modelling uncertainty in derived thermodynamic descriptors (e.g., saturation indices and ionic strength from PHREEQC inputs), (iii) structural uncertainty because localized controls (e.g., clay–water interactions and micro-scale mineral heterogeneity) may not be captured by major-ion chemistry alone, and (iv) decision uncertainty because screening sensitivity depends on the selected exceedance threshold and probability cut-off. These uncertainties primarily affect confidence in the upper tail (rare high-fluoride events) and motivate expanded multi-season sampling, targeted mineralogical/trace-element constraints, and validation in additional regions.
Future research directions
Advancing fluoride prediction in the Karaga District requires a strategic program integrating expanded data collection, process-based investigations, and methodological refinements. Currently, the external test set contains only six exceedance cases, creating unstable estimates and limiting sensitivity to 50%. Deliberately targeting the 26 communities with hazard quotient values exceeding 1.0 to increase exceedance samples from six to 20–30 cases would reduce confidence interval width and better capture underprediction bias at high concentrations. Concurrent systematic sampling across all identified water types and distinct Voltaian Formation members would ensure adequate representation of the full compositional space and clarify geological controls independent of regional patterns. Temporal monitoring through bi-monthly sampling at sentinel wells over 24 months would quantify seasonal fluoride variability and enable development of state-space predictive models incorporating temporal dynamics. Companion investigations should apply rigorous compositional data analysis to trace elements including iron, aluminium, and silicon, which control fluoride behaviour through adsorption onto goethite, gibbsite, and clay minerals processes not captured by major-ion analysis alone. Laboratory experiments examining fluorite dissolution kinetics and fluoride adsorption on local Voltaian sediments would provide rate expressions necessary for reactive transport modelling. Isotope hydrology using ?2H and ?18O, combined with stable fluorine isotope analysis (?19F), would distinguish geogenic fluoride sources and link mineral origins to observed concentration patterns. Machine learning enhancements incorporating quantile regression and heteroscedastic neural networks would provide uncertainty quantification critical for public health decision-making, while SHAP interaction analysis would reveal mechanistic feature combinations driving extreme outcomes. These interconnected approaches would establish a robust, transferable framework for fluoride prediction applicable across Ghana and similar geological settings globally.
Conclusion
Fluoride contamination in the Karaga District poses a critical public health threat, with 17.6% of groundwater samples exceeding the WHO drinking water standard of 1.5 mg/L and affecting thousands of residents, particularly children vulnerable to dental and skeletal fluorosis. Although national hazard assessments identified the Karaga area as a high-risk zone, the specific geochemical mechanisms driving fluoride mobilization in the region’s Voltaian Supergroup aquifers have remained poorly understood. This study addressed that gap through an integrated framework combining geochemical modelling, compositional data analysis, and machine learning to uncover both the drivers and predictability of fluoride contamination at the local scale. Our findings revealed three pivotal insights. First, geochemical analysis showed that while Na-HCO3 waters dominated (64.7% of samples), the most severe contamination occurred in Na-Cl waters, which reached concentrations as high as 6.04 mg/L, indicating that evaporite dissolution and cation exchange fundamentally control enhanced fluoride mobilization. Second, machine learning analysis exposed nonlinear relationships governing fluoride behaviour, with total dissolved solids and pH emerging as primary predictors not individual ion concentrations. Third, we developed the Mobility Index, a mechanistic tool that successfully identified high-risk waters while remaining independent of measured fluoride, achieving exceptional discrimination (AUROC of 0.94) for WHO guideline exceedance. The strength of our approach lies in integrating mechanistic understanding with predictive power. Single-method frameworks cannot capture fluoride’s nonlinear behaviour; geochemical models alone require extensive lab data, while machine learning alone lacks interpretability. Our hybrid strategy addressed both limitations, encoding genuine hydrogeochemical relationships into a model that learns from data patterns. Operationally, the Mobility Index requires only field-measurable parameters electrical conductivity, pH, and routine ion analysis making it immediately deployable by district health authorities and community water committees for cost-effective screening and targeted intervention. Beyond Karaga, this framework is transferable to similar sedimentary aquifer settings globally, opening pathways for expanded validation and adaptation in other fluoride-endemic regions worldwide. This study is novel in three specific ways. First, it couples PHREEQC-based thermodynamic modelling with rigorously transformed compositional features (isometric log-ratio/CoDA) to avoid the spurious correlations that arise when standard statistics are applied to constrained hydrochemical data, an approach that remains largely absent from fluoride prediction literature. Second, it enforces a leakage-aware, fluoride-blind feature design throughout model development and validates the resulting models on a fully independent, externally collected dataset from the Karaga District, providing a more rigorous test of generalisability than internal cross-validation alone can offer. Third, it introduces the Mobility Index, a mechanistically interpretable screening tool derived entirely from fluoride-independent components, which translates complex geochemical controls into a practical, field-deployable risk signal without requiring measured fluoride as an input.
Data availability
The authors declare that the data supporting the findings of this study are available within the paper.
References
-
Shaji, E. et al. Fluoride contamination in groundwater: A global review of the status, processes, challenges, and remedial measures. Geosci. Front. 15 (2), 101734. https://doi.org/10.1016/j.gsf.2023.101734 (2024).
-
Bera, B. et al. Fluoride dynamics in precambrian hard rock terrain of North Singhbhum Craton and effect of fluorosis on human health and society. Groundwater Soc. Appl. Geospatial Technol., 319–348. (2021).
-
Padilla-Reyes, D. A. et al. Arsenic and fluoride in groundwater triggering a high risk: Probabilistic results using Monte Carlo simulation and species sensitivity distribution. Chemosphere 359, 142305. https://doi.org/10.1016/j.chemosphere.2024.142305 (2024).
-
Dar, F. A. & Kurella, S. Utilization of organic waste from Chinar leaves as sustainable and eco-friendly adsorbent for fluoride removal. Environ. Sci. Pollut. Res. 1–24. https://doi.org/10.1007/s11356-024-35147-z (2024).
-
Rojanaworarit, C. et al. Hydrogeogenic fluoride in groundwater and dental fluorosis in Thai agrarian communities: a prevalence survey and case–control study. BMC Oral Health. 21 (1), 1–16. https://doi.org/10.1186/s12903-021-01902-8 (2021).
-
Varol, E. & Varol, S. Does fluoride toxicity cause hypertension in patients with endemic fluorosis? Biol. Trace Elem. Res. 150 (1–3), 1–2. https://doi.org/10.1007/s12011-012-9499-1 (2012).
-
Araya, D., Podgorski, J., Kumi, M., A Mainoo, P. & Berg, M. Fluoride contamination of groundwater resources in Ghana: Country-wide hazard modeling and estimated population at risk. Water Res. 212 (September 2021), 118083. https://doi.org/10.1016/j.watres.2022.118083 (2022).
-
Sunkari, E. D., Adams, S. J., Okyere, M. B. & Bhattacharya, P. Groundwater fluoride contamination in Ghana and the associated human health risks: Any sustainable mitigation measures to curtail the long term hazards? Groundw. Sustain. Dev. 16, 100715. https://doi.org/10.1016/j.gsd.2021.100715 (2022).
-
Apambire, W. B., Boyle, D. R. & Michel, F. A. Geochemistry, genesis, and health implications of fluoriferous groundwaters in the upper regions of Ghana. Environ. Geol. 33 (1), 13–24. https://doi.org/10.1007/s002540050221 (1997).
-
Alam, N. et al. Geochemistry of fluoride mobilization in the hard-rock aquifers of central India: Implication for fluoride-safe drinking water supply. Appl. Geochem. 171, 106106. https://doi.org/10.1016/j.apgeochem.2024.106106 (2024).
-
Ali, W. et al. Elucidating various geochemical mechanisms drive fluoride contamination in unconfined aquifers along the major rivers in Sindh and Punjab, Pakistan. Environ. Pollut. 249, 535–549. https://doi.org/10.1016/j.envpol.2019.03.043 (2019).
-
Aravinthasamy, P., Karunanidhi, D., Subramani, T., Srinivasamoorthy, K. & Anand, B. Geochemical evaluation of fluoride contamination in groundwater from Shanmuganadhi River basin, South India: implication on human health. Environ. Geochem. Health. 42 (7), 1937–1963. https://doi.org/10.1007/s10653-019-00452-x (2020).
-
Chen, K., Liu, Q., Yang, T., Ju, Q. & Yu, H. Geochemical characteristics, influencing factors and health risk assessment of groundwater fluoride in a drinking water source area in North Anhui Plain, Eastern China. Stoch. Env. Res. Risk Assess. 37 (10), 3879–3891. https://doi.org/10.1007/s00477-023-02485-2 (2023).
-
Luo, W., Gao, X. & Zhang, X. Geochemical processes controlling the groundwater chemistry and fluoride contamination in the yuncheng basin, China—an area with complex hydrogeochemical conditions. PLoS ONE. 13 (7), e0199082. https://doi.org/10.1371/journal.pone.0199082 (2018).
-
Nafouanti, M. B., Li, J., Mustapha, N. A., Uwamungu, P. & AL-Alimi, D. Prediction on the fluoride contamination in groundwater at the Datong Basin, Northern China: Comparison of random forest, logistic regression and artificial neural network. Appl. Geochem. 132, 105054. https://doi.org/10.1016/j.apgeochem.2021.105054 (2021).
-
Ahmad, M., Mustafa, G., Ali, N. & Laiq, M. Statistical Prediction of Fluoride Concentration in Groundwater of District Multan, Pakistan, Using Kriging Methods. Fluoride 56 (2), 156–168 (2023).
-
Liu, X. & Chen, K. Characterization, formation mechanism, and human health risk assessment of fluoride in shallow groundwater of Suzhou city, East China. Water Supply. 24 (9), 3196–3207. https://doi.org/10.2166/ws.2024.202 (2024).
-
Kerketta, A., Kapoor, H. S. & Sahoo, P. K. Groundwater fluoride prediction modeling using physicochemical parameters in Punjab, India: a machine-learning approach. Front. Soil. Sci. 4 https://doi.org/10.3389/fsoil.2024.1407502 (2024).
-
Demir Yeti?, A., ?lhan, N. & Kara, H. Integrating deep learning and regression models for accurate prediction of groundwater fluoride contamination in old city in Bitlis province, Eastern Anatolia Region, Türkiye. Environ. Sci. Pollut. Res. 31 (34), 47201–47219. https://doi.org/10.1007/s11356-024-34194-w (2024).
-
Ling, Y. et al. Monitoring and prediction of high fluoride concentrations in groundwater in Pakistan. Sci. Total Environ. 839, 156058. https://doi.org/10.1016/j.scitotenv.2022.156058 (2022).
-
Liu, J., Peng, Y., Li, C., Gao, Z. & Chen, S. A characterization of groundwater fluoride, influencing factors and risk to human health in the southwest plain of Shandong Province, North China. Ecotoxicol. Environ. Saf. 207, 111512. https://doi.org/10.1016/j.ecoenv.2020.111512 (2021).
-
Narsimha Adimalla. Assessment and Mechanism of Fluoride Enrichment in Groundwater from the Hard Rock Terrain: A Multivariate Statistical Approach. Geochem. Int. 58 (4), 456–471. https://doi.org/10.1134/S0016702920040060 (2020).
-
Kumar, A. & Singh, A. Pollution source characterization and evaluation of groundwater quality utilizing an integrated approach of Water Quality Index, GIS and multivariate statistical analysis. Water Supply. 24 (10), 3517–3539. https://doi.org/10.2166/ws.2024.213 (2024b).
-
Kumar, A. & Singh, A. Geospatial mapping and entropy-based analysis for groundwater evaluation with estimation of potential health risks due to nitrate and fluoride exposure. Environ. Sci. Pollut. Res. 31 (59), 66953–66976. https://doi.org/10.1007/s11356-024-35691-8 (2024a).
-
Kumar, A. & Singh, A. Entropy-based groundwater quality evaluation with multivariate analysis and Sobol sensitivity for non-carcinogenic health risks in mid-Gangetic plains, India. Environ. Geochem. Health. 47 (6), 186. https://doi.org/10.1007/s10653-025-02495-9 (2025).
-
Ghana Statistical Service (GSS). Ghana 2021 Population and Housing Census General Report (Ghana Statistical Service Accra, 2021).
-
Ghana Geological Survey (GGS). Geological Map of Ghana – Scale 1:1 000 000 (Geological Survey Department (GSD), 2009).
-
Achcampong, S. Y. & Hess, J. W. Hydrogeologic and hydrochemical framework of the shallow groundwater system in the southern Voltaian Sedimentary Basin, Ghana. Hydrogeol. J. 6 (4), 527–537. https://doi.org/10.1007/s100400050173 (1998).
-
Anani, C. Sandstone petrology and provenance of the Neoproterozoic Voltaian group in the southeastern Voltaian Basin, Ghana. Sed. Geol. 128 (1–2), 83–98. https://doi.org/10.1016/S0037-0738(99)00063-9 (1999).
-
Abu, M., Sunkari, E. D. & ?ener, M. Untapped Economic Resource Potential of the Neoproterozoic to Early Paleozoic Volta Basin, Ghana: A Review. Nat. Resour. Res. 28 (4), 1429–1445. https://doi.org/10.1007/s11053-019-09478-5 (2019).
-
Menyeh, A. & Sarpong Asare, V. D. Geo-Electrical Investigation Of Groundwater Resources And Aquifer Characteristics In Some Small Communities In The Gushiegu And Karaga Districts Of Northern Ghana. Int. J. Sci. Technol. Res. 2, 25–35 (2013).
-
Dapaah-Siakwan, S. & Gyau-Boakye, P. Hydrogeologic framework and borehole yields in Ghana. Hydrogeol. J. 8 (4), 405–416. https://doi.org/10.1007/PL00010976 (2000).
-
Sunkari, E. D., Hudu, A., Fosu, S., Gyimah, E. & Oppong, O. Hydrogeochemistry, sources, enrichment mechanism and human health risk assessment of groundwater fluoride in Saboba District in the Oti sub-basin of the Volta River Basin, northern Ghana. Groundw. Sustainable Dev. 25 101132. https://doi.org/10.1016/j.gsd.2024.101132 (2024).
-
Sunkari, E. D., Zango, M. S. & Korboe, H. M. Comparative Analysis of Fluoride Concentrations in Groundwaters in Northern and Southern Ghana: Implications for the Contaminant Sources. Earth Syst. Environ. 2 (1), 103–117. https://doi.org/10.1007/s41748-018-0044-z (2018).
-
Sunkari, E. D. & Abu, M. Hydrochemistry with special reference to fluoride contamination in groundwater of the Bongo district, Upper East Region, Ghana. Sustain. Water Resour. Manage. 5 (4), 1803–1814. https://doi.org/10.1007/s40899-019-00335-0 (2019).
-
Shelton, J. L., Engle, M. A., Buccianti, A. & Blondes, M. S. The isometric log-ratio (ilr)-ion plot: A proposed alternative to the Piper diagram. J. Geochem. Explor. 190 (February), 130–141. https://doi.org/10.1016/j.gexplo.2018.03.003 (2018).
-
Parkhurst, D. L. & Appelo, C. A. J. PHREEQC Version 3 (3). https://doi.org/10.3133/tm6A43 (2021).
-
Müller, M., Parkhurst, D. & Charlton, S. Programming PHREEQC Calculations with C?+?+?and Python A Comparative Study. MODFLOW and More 2011: Integrated Hydrological Modeling, 632–636. http://docplayer.net/7809789-Programming-phreeqc-calculations-with-c-and-python-a-comparative-study.html (2011).
-
Kontogeorgis, G. M., Maribo-Mogensen, B. & Thomsen, K. The Debye-Hückel theory and its importance in modeling electrolyte solutions. Fluid. Phase. Equilibria. 462, 130–152. https://doi.org/10.1016/j.fluid.2018.01.004 (2018).
-
Hem, J. D. Study and interpretation of the chemical characteristics of natural water. In US Geol. Surv. Water-Supply Paper (Vol. 2254).https://doi.org/10.3133/wsp2254 (1985).
-
Parkhurst, D. L. & Appelo, C. A. J. User’s Guide to PHREEQC (Version 2): A Computer Program for Speciation, Batch-Reaction, One-Dimensional Transport, and Inverse Geochemical Calculations. In Water-Resources Investigations Report 99-4259 (Issue Water-Resources Investig. Rep. 99-4259) https://doi.org/10.3133/wri994259. (1999).
-
Salih, A. M. et al. A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME. Adv. Intell. Syst. 7 (1). https://doi.org/10.1002/aisy.202400304 (2024).
-
Gibbs, R. J. Mechanisms controlling world water chemistry. Science 170 (3962), 1088–1090. https://doi.org/10.1126/science.170.3962.1088 (1970).
-
Yadav, A., Kumari, N., Kumar, R., Kumar, M. & Yadav, S. Fluoride distribution, contamination, toxicological effects and remedial measures: a review. Sustain. Water Resour. Manage. 9 (5), 150. https://doi.org/10.1007/s40899-023-00926-y (2023).
-
Hossain, S., Hosono, T., Yang, H. & Shimada, J. Geochemical Processes Controlling Fluoride Enrichment in Groundwater at the Western Part of Kumamoto Area, Japan. Water Air Soil. Pollut. 227 (10), 385. https://doi.org/10.1007/s11270-016-3089-3 (2016).
-
Mosaad, S., Eissa, M. & Alezabawy, A. K. Geochemical modeling and geostatistical categorization of groundwater in Nubian Sandstone Aquifer, El Bahariya Oasis, Egypt. Environ. Earth Sci. 81 (17), 421. https://doi.org/10.1007/s12665-022-10524-4 (2022).
-
Rango, T. et al. Groundwater quality and its health impact: An assessment of dental fluorosis in rural inhabitants of the Main Ethiopian Rift. Environ. Int. 43 (1), 37–47. https://doi.org/10.1016/j.envint.2012.03.002 (2012).
-
Blarasin, M. et al. Arsenic and Fluoride in Groundwater of the Sedimentary Aquifer in. IOSR J. Environ. Sci. 12 (4), 71–77. https://doi.org/10.9790/2402-1204017177 (2018).
-
Rena, V. et al. Hydrogeological investigation of fluoride ion in groundwater of Ruparail and Banganga basins, Bharatpur district, Rajasthan, India. Environ. Earth Sci. 81 (17), 430. https://doi.org/10.1007/s12665-022-10520-8 (2022).
-
Saini, A., Kanwar, P., Kumar, S., Tembhurne, S. & Roy, I. A study on the hydrogeochemical mechanisms controlling groundwater fluoride enrichment in Jaipur: a semi-arid terrain in India. Int. J. Environ. Anal. Chem. 103 (20), 8825–8845. https://doi.org/10.1080/03067319.2021.1998473 (2023).
-
Tyagi, S. & Sarma, K. Expounding major ions chemistry of groundwater with significant controlling factors in a suburban district of Uttar Pradesh, India. J. Earth Syst. Sci. 130 (3), 169. https://doi.org/10.1007/s12040-021-01629-8 (2021).
-
De, A. et al. Investigating spatial distribution of fluoride in groundwater with respect to hydro-geochemical characteristics and associated probabilistic health risk in Baruipur block of West Bengal, India. Sci. Total Environ. 886 https://doi.org/10.1016/J.SCITOTENV.2023.163877 (2023).
-
Rashid, A. et al. Geochemical modeling, source apportionment, health risk exposure and control of higher fluoride in groundwater of sub-district Dargai. Pakistan Chemosphere. 243, 125409. https://doi.org/10.1016/j.chemosphere.2019.125409 (2020).
-
Tian, J. et al. Hydrochemical characteristics, driving factors and health risk of fluoride in groundwater from the northwestern Ordos Basin, China. Geosci. Front. 16 (5). https://doi.org/10.1016/J.GSF.2025.102123 (2025).
-
Ayub, M. et al. Hydrogeochemical properties, source provenance, distribution, and health risk of high fluoride groundwater: Geochemical control, and source apportionment. Environ. Pollut. 362 https://doi.org/10.1016/J.ENVPOL.2024.125000 (2024).
-
Lone, S. A., Jeelani, G. & Mukherjee, A. Hydrogeochemical controls on contrasting co-occurrence of geogenic Arsenic (As) and Fluoride (F-) in complex aquifer system of Upper Indus Basin, (UIB) western Himalaya. Environ. Res. 260 https://doi.org/10.1016/J.ENVRES.2024.119675 (2024).
-
Paikaray, S. & Mahajan, T. Hydrogeochemical processes, mobilization controls, soil-water-plant-rock fractionation and origin of fluoride around a hot spring affected tropical monsoonal belt of eastern Odisha, India. Appl. Geochem. 148 https://doi.org/10.1016/J.APGEOCHEM.2022.105521 (2023).
-
Narsimha, A., Venkatayogi, S. & Geeta, S. Hydrogeochemical data on groundwater quality with special emphasis on fluoride enrichment in Munneru river basin (MRB), Telangana State, South India. Data Brief. 17, 339–346. https://doi.org/10.1016/J.DIB.2018.01.059 (2018).
-
Liu, Y., Zhou, K. & Carranza, E. J. M. Compositional balance analysis for geochemical pattern recognition and anomaly mapping in the western Junggar region, China. Geochem. Explor. Environ., Anal. 18 (3), 263–276. https://doi.org/10.1144/geochem2017-050 (2018).
-
Oh, J., Kim, K. H., Kim, H. R., Park, S. & Yun, S. T. Using isometric log-ratio in compositional data analysis for developing a groundwater pollution index. Sci. Rep. 14 (1), 12196. https://doi.org/10.1038/s41598-024-63178-6 (2024).
-
Sauro Graziano, R., Gozzi, C. & Buccianti, A. Is Compositional Data Analysis (CoDA) a theory able to discover complex dynamics in aqueous geochemical systems? J. Geochem. Explor. 211, 106465. https://doi.org/10.1016/j.gexplo.2020.106465 (2020).
-
Scealy, J. L., de Caritat, P., Grunsky, E. C., Tsagris, M. T. & Welsh, A. H. Robust principal component analysis for power transformed compositional data. J. Am. Stat. Assoc. 110 (509), 136–148. https://doi.org/10.1080/01621459.2014.990563 (2015).
-
Zhou, X., Ma, Y. & Wu, W. Statistical depth for point process via the isometric log-ratio transformation. Comput. Stat. Data Anal. 187, 107813. https://doi.org/10.1016/j.csda.2023.107813 (2023).
Acknowledgements
The authors gratefully acknowledge all those who contributed to the fieldwork and other aspects of this study, whose efforts significantly enhanced its quality. The first author thanks the University of Johannesburg, South Africa for the continuous support as a Senior Research Associate at the Department of Chemical Sciences. The first author is also grateful to the Centre of Excellence in Environmental Science and Sustainability at Sir Padampat Singhania University, India, for the enabling environment provided for this research. All authors acknowledge the editor and anonymous reviewers for the invaluable critiques that improved the quality of this study.
Ethics declarations
Competing interests
The authors declare no competing interests.
Sampling permissions
Groundwater samples used in this study were collected from active public boreholes drilled and maintained by World Vision International in the Karaga District, Northern Ghana. These boreholes are publicly accessible and do not fall within restricted or privately owned land. Therefore, no special permissions from landowners were required for sample collection.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.

