Hydrometerological Variables Predict Fecal Indicator Bacteria Densities in Freshwater: Data-driven Methods for Variable Selection
Jones, Rachael M.
MetadataShow full item record
Statistical models of microbial water quality inform risk management for water recreation. Current research focuses on resource-intensive, location-specific data collection and water quality modeling, but this approach may be cost-prohibitive for risk managers responsible for numerous recreation sites. As an alternative, we tested the ability of two data-driven models, tree regression and random forests with conditional inference trees, to select readily available hydrometeorological variables for use in linear mixed effects (LME) models predicting bacterial density. The study included the Chicago Area Waterway System (CAWS) and Lake Michigan beaches and harbors in Chicago, Illinois, at which Escherichia coli and enterococci were measured seasonally in 2007-2009. Tree regression node variables reduced data dimensionality by > 50 %. Variable importance ranks from random forests were used in a forward-step selection based on R (2) and root mean squared prediction error (RMSPE). We found two to three variables explained bacteria densities well relative to random forests with all variables. LME models with tree- or forest-selected variables performed reasonably well (0.335 < R (2) < 0.658). LME models for Lake Michigan had good prediction accuracy with respect to the single sample maximum standard (72-77 %), but limited sensitivity (23-62 %). Results suggest that our alternative approach is feasible and performs similarly to more resource-intensive approaches.
combined sewer overflow