Abstract

  • Off-the-shelf water quality test kits for measuring iron, copper, manganese, and fluoride were evaluated.
  • Test kits showed variable performance.
  • Many test kits performed well in deionized water but performed poorly when measuring concentrations in tap or river water.
  • While many test kits were not accurate, they were still able to inform potential users of above or below regulatory limits.

Water consumers in the United States may want to test their drinking water using at-home commercially available test kits rather than a certified laboratory due to convenience and affordability. However, while numerous do-it-yourself test kits are available for purchase online or at local stores, these kits are unregulated and lack data on their performance. We evaluated off-the-shelf home drinking water test kits that measure iron, copper, manganese, and fluoride concentrations to investigate whether these kits could reliably provide meaningful results. We evaluated their performance in three water matrices: deionized water (DI), tap water, and river water, and with laboratory-trained personnel compared to untrained users. Our results showed highly repeatable but variable performance in the test kits’ ability to detect potential contaminants in the water. Most kits performed best in the DI water matrix with no interference. Our results suggest that there are concerns about their accuracy and usefulness and that whether the results can be relied on depends on which parameter is being measured in which water with which kit and for which purpose.

EXCERPTS:

INTRODUCTION

While the United States reports high access to safely managed drinking water, recent analyses have highlighted disparities in communities’ access to safe drinking water. While the majority of the population in the United States is served by a piped public water system, which is regulated under the Safe Drinking Water Act, another 20 million houses in the United States are estimated to be served by private water sources, such as wells, which are largely unregulated (National Research Council 1997; US EPA 2015a; Murray et al. 2021). Even among public water systems that are monitored regularly for water quality, not every service connection is tested, and water from premise plumbing is largely excluded from sampling (except for some parameters regulated at the tap, such as lead and copper). Lack of trust in tap water is reported across the United States and has been recognized as a public health concern (Patel & Schmidt 2017; Pierce & Gonzalez 2017; Pierce et al. 2019). Consumers of unregulated sources such as private wells often want to test their well water by sending samples to a private lab, participating in a program that facilitates testing, or buying home test kits (Flanagan et al. 2015).

State agencies recommend that if a homeowner wants to test their drinking water – whether from a public water supplier or a private well – they use a state-certified lab, with fees set by the individual labs. However, there exist numerous do-it-yourself test kits available for private individuals to purchase online or at local stores that consumers looking for a more affordable option may elect to do. These do-it-yourself kits that are commercially sold do not undergo any formal certification or accreditation process to ensure their accuracy. Prior studies have evaluated the ability of off-the-shelf (OTS) test kits, including test strips and colorimetric vials, to accurately and precisely measure lead (Kriss et al. 2021), chlorine residual (Murray & Lantagne 2015), arsenic (George et al. 2012; Powers et al. 2019; Reddy et al. 2020), and nitrate (Nielsen et al. 2008; Aukema & Wackett 2019), among others. Overall, there has been wide variability reported in the ability of these test kits to measure these water quality parameters accurately or in their ability to accurately classify a water sample as above or below a threshold (whether that be of detection or above a regulatory limit or guideline). However, to date, evaluations of test kits for home users have been reported in the literature for only a limited set of water quality parameters (lead, nitrate, and arsenic; Nielsen et al. 2008; Reddy et al. 2020; Kriss et al. 2021), and often the test kits evaluated have been single-parameter methods, while much of what is available on the market and marketed for consumers are multi-parameter tests. While these test kits are widely available, consumers who already do not know the status of their drinking water and may lack knowledge of water quality also do not know whether to trust the results of these test kits. In particular, consumers may want to measure water quality parameters that may originate from the distribution system and premise plumbing systems; for example, being able to measure and understand the source of discolored water, which often originates from elevated iron or manganese, may not be a health risk but is an esthetic concern important for water consumers (Tang et al. 2018; Vidmar et al. 2023).

In this study, we evaluated whether available OTS test kits could accurately measure several water quality parameters (copper, iron, fluoride, and manganese) in different water matrices and by non-specialist users to identify how well currently available methods perform.

Water matrices

We performed experiments using three water matrices: deionized (DI) water, tap water, and river water. DI water was used to represent a control of high purity with no background interference. Tap water was used to represent water that consumers may test; we obtained the tap water from a tap at the University of Massachusetts Amherst campus, supplied by the Amherst, MA, public water system, which supplies treated surface water. River water was obtained from the Mill River (Amherst, MA), a nearby untreated source of water. We sought to represent different water matrices by including both treated and untreated surface water to evaluate the impact of organic matter and other ions on measurement. We originally intended to use a phosphate buffer to hold the pH of each solution at high and low pHs but the addition of the buffer interfered with the iron solutions. Instead, we recorded the pH and temperature for each solution but did not adjust either.

Parameters for analysis

We selected four drinking water constituents for analysis: iron, copper, manganese, and fluoride. In the United States, these are regulated by the US EPA under the Safe Drinking Water Act as primary or secondary standards (maximum contaminant level or secondary maximum contaminant level (SMCLs)) (US EPA 2015b). Iron has an SMCL of <0.3 mg/L; while concentrations above the SMCL do not pose a health risk but can negatively affect the esthetics or taste of water and cause infrastructure damage. Copper has a maximum contaminant level (MCL) of <1.3 mg/L and an SMCL of <1.0 mg/L, as concentrations above this will cause a metallic taste and a blue staining. Copper in drinking water systems is regulated by the EPA under the lead and copper rule, as copper can enter the water due to corrosion in premise plumbing. Manganese also has an SMCL of <0.05 mg/L; at higher concentrations, consumers will notice a black or brown color, black staining, and a bitter taste. Fluoride is both a primary and secondary contaminant with an MCL of <4.0 mg/L and an SMCL of <2.0 mg/L. While no adverse health effects are expected between 2.0 and 4.0 mg/L, prolonged exposure may cause tooth discoloration. Concentrations >4.0 mg/L can cause fluorosis (bone disease) (Srivastava & Flora 2020).

Selection of test kits

We purchased OTS test kits from a major online retailer. While consumers may purchase test kits from local stores (e.g. hardware stores), we sought nationally available kits and therefore purchased from a nationally available retailer (www.amazon.com). Kits were selected based primarily on those that appeared as top-ranked kits in the supplier algorithm at the time of the search, as a consumer might decide to purchase a kit. We selected kits measuring single or multiple (‘multiparameter’) constituents simultaneously: four multiparameter kits, four iron-only kits, two copper-only kits, and one manganese-only kit (Table 1). There are hundreds of test kits on the market in the United States at any given time, and the availability of what is on the market or in stock can change daily; therefore, we did not seek to conduct a thorough study of all kits available on the market, but rather to use a process of a consumer attempting to identify a test kit to use and following through with whether this kit may provide reliable water quality results. Costs for each test kit were recorded at the time of purchase and retailer (2019) to provide overall context and comparability between selected kits.

Our results showed highly variable performance in the test kits’ ability to detect potential contaminants in the water. In the measurements, results showed that results were largely consistent; therefore, test kit results were generally highly precise but not always accurate. Accuracy was often affected by the water matrix: in general, there were few noticeable differences in performance between tests performed in tap water compared to river water, while differences were frequently observed when comparing results in DI water as compared to the tap or river water. Notably, no one brand or kit performed consistently well across multiple water quality parameters; the same brand in some cases performed well for one parameter and poorly for another, or the same multiparameter kit was able to adequately measure one parameter but not another. In general, the test kits that measured only single parameters performed better than the multiparameter kits when measuring iron or copper. Two of the test kits measuring only iron that performed well included a reducing agent, which would reduce any ferrous iron to ferric iron. It is important to consider the intervals at which each kit measures: a concentration measured by a test kit that was markedly different from the laboratory-obtained concentration may have been limited not by the intensity of color it produced but by the design of the OTS kit as having only the ability to measure in measurement bins (e.g. 0–5 mg/L rather than the ability to differentiate 1 vs. 4 mg/L).

Our results are consistent with those found by studies focused on the measurement of other contaminants, which have also identified challenges that home users had or would have with obtaining accurate results, such as for nitrate or lead, showing that false negatives or user-measured results as lower were common (Nielsen et al. 2008; Kriss et al. 2021). The observed differences in performance are likely due to test kit chemistry and water matrix interference. While the test strip kit chemistries are not known (information was not disclosed by manufacturers), our results demonstrated that the iron measurement test kits involving reducing agents performed better as they could then likely detect both ferrous and ferric iron (similar challenges with metal speciation detection via home test kits have been previously observed with the measurement of lead (Kriss et al. 2021)). Multi-parameter test kits often performed worse than single-parameter kits, potentially as they tried to optimize for multiple analytes at once, many of which may need different pH for optimum performance. Also, as noted in the literature on colorimetric methods, interference from the water matrices likely affected kit performance; in our experiments, performance in tap and river water was poorer than in DI water, suggesting interference from organic matter and other ions likely occurred. However, notably, our laboratory-determined values also relied on colorimetric methods. Additionally, we selected sulfate salts as the source compounds for iron, copper, and manganese; however, to facilitate future standardization for evaluating commercial test kit performance, future work could compare kit performance with the addition of different anions to the ability to mimic those found in drinking water, particularly where water matrices may influence results, as well as better understand speciation and its effect on results.

Some of the multiparameter kits outperformed the parameter-only kits if the goal was a binary outcome of above or below the SMCL; while the copper-only kits may be more accurate, the sensitivity analysis revealed that some of the multiparameter kits outperformed the copper-only kits when assessing their ability to compare to the SMCL. Therefore, the selection of test kits should consider its application: for a water consumer, the ability of a test kit to meaningfully provide information about a water sample’s compliance with an MCL may be more important than accuracy or the ability to detect >0 mg/L.

This study had limitations that also suggest future directions. Our ‘true value’ test was with another colorimetric method; while a more accurate method such as ICP-MS would have yielded more accurate results for our true value, we selected the single-parameter colorimetric method as our base of comparison as many water utilities and scientific studies are conducted with this method, and this is what the test strips are trying to emulate. For example, the kit testing manganese was a direct modification of the Hach PAN LR spec method, with the steps and reagents closely matching each other. Future work could also compare to results from an ICP-MS. For these experiments, tap and river water were sometimes collected on different days due to logistical constraints, which would result in different background concentrations of parameters of interest or constituents that could potentially interfere with results; however, these sources were not expected to vary by much over the few days in which water was collected, and we measured the background concentration of the parameter of interest on each day water was collected. Future studies should also incorporate testing in different waters, particularly groundwater, as those are the sources that many US homeowners use, as well as with varying pH and temperature, which would be expected to significantly affect results. Future work should investigate the constituents in source waters that could be causing interference with the test kits, such as through the use of synthetic water and testing potential interference (e.g. organic matter, alkalinity, hardness, and pH) and conducting more detailed analysis of water matrix chemistry. While consistent lab personnel were used in this study, factors such as time of day, lighting conditions, colorblindness, and eyesight may affect results, although the kit instructions did not mention these factors. A procedure for consistently reading color charts could help eliminate this large source of error, such as other efforts to use a mobile phone to read the results of colorimetric methods. Future work should also focus on varying some of the key method procedures to reproduce mistakes likely made by potential users, such as waiting more or less time than instructed to dip the strip in the water or matching the color. Finally, we tested only a very small fraction of the test kits that are available on the market, and the selection was nonrandom; we sought to replicate the procedure a home user may use when selecting test kits and acknowledged that manufacturer and kit variability changes daily. Future studies could focus on more holistically evaluating kits on the market and other procedures for selecting test kits, such as by kit chemistry or aligned result bins, as well as evaluating the many other parameters that users may want to measure.

Overall, these test kits are being sold and used by consumers across the United States. However, our results suggest that there are concerns about their accuracy and usefulness and that whether the results can be relied on depends on which parameter is being measured in which water with which kit and for which purpose. While many of these studies have focused on the assessment of commercially available test kits, recent efforts have highlighted opportunities to co-design test kits with users such as middle and high school teachers (Haynes et al. 2019). Previous studies focused on preparing and validating field test kits, particularly for use in low- and middle-income countries to measure chlorine or fecal indicator bacteria, have shown that, with method development and evaluation of user experiences, such kits can be reliable and accurate; however, many of these are still focused on use by those with some training in water quality rather than home users (Bain et al. 2012; Khush et al. 2013; Murray & Lantagne 2015; Bain et al. 2021). The results of our study suggest a need to develop standardized protocols for evaluating the appropriateness and validity of these home test kits given different goals and water sources. There are significant gaps in our knowledge of the performance of these test kits and how to engage with potential users in selecting appropriate parameters to measure and in interpreting the results into actionable steps. Considering both the reliability and accuracy of the results, as well as how potential users may interact with the kits, is important for home users, community groups, or citizen science initiatives that may be relying on these results.

FUNDING

We acknowledge funding from the University of Massachusetts Amherst Civil and Environmental Engineering.

Supplementary datasee original article for link

FULL -TEXT STUDY ONLINE AT https://iwaponline.com/jwh/article/23/3/350/107444/Evaluation-of-drinking-water-quality-test-kits-for