^{1}

^{1}

^{2}

^{2}

^{1}

The authors have declared that no competing interests exist.

Threats to public health and environmental quality from septic systems are more prevalent in areas with poorly draining soils, high water tables, or frequent flooding. Significant research gaps exist in assessing these systems’ vulnerability and evaluating factors associated with higher rates of septic systems replacement and repair. We developed a novel GIS-based framework for assessing septic system vulnerability using a database of known septic system specifications and a modified Soil Topographic Index (STI) that incorporates seasonal high groundwater elevation to assess risks posed to septic systems in coastal Georgia. We tested the hypothesis that both the modified STI and septic system specifications such as tank capacity per bedroom and drainfield type would explain most of the variance in septic system repair and replacement using classification inference tree and generalized logistic regression models. Our modeling results indicate that drainfield type (level vs. mounded) is the most significant variable (

Septic systems consist of a tank and a soil treatment area or a drainfield [

Globally, 4.2 billion people lack safely managed sanitation [

According to the USEPA, septic system functionality is defined by the system’s ability to remove settleable solids, nutrients, and pathogens from wastewater discharges [

No state has directly measured its septic systems failure rate and definitions of failure vary [

Assessments of septic system failures are very local, generally limited to a single city, county, or neighborhood, and the results vary widely and are not transferable to other regions [

Significant research gaps exist in assessing septic systems vulnerability and factors associate with higher rate of septic systems replacement and repair. Kohler et al., 2016 assessed the association of “Onsite Wastewater Treatment System (OWTS) fragility” (the degree to which a system loses functionality) with local temperature, rainfall, and streamflow conditions over a range of time scales for 225 septic systems with available frequency of septic system repairs report in Boulder County, Colorado, USA. The results of their generalized linear regression model showed that high temperature, frequency of wetter-than-normal months, and magnitude of peak streamflow in the watershed impact on complete loss of septic system functionality [

Because septic effluent treatment relies on hydrologic, microbial and chemical processes, wastewater treatment in the drainfield area is sensitive to changes in soil moisture [

In this study, we use a novel application of the STI incorporating groundwater levels to develop methods and create a GIS-based framework for septic system vulnerability. Since many of the conditions associated with system failure are common conditions in coastal areas, making coastal regions problematic for siting and maintaining septic systems [

To assess the degree of risks posed to septic systems under current groundwater elevation, we used a newly developed GIS database of recorded septic systems in Bryan County, Georgia in our analysis. This database contains numerous data fields for the septic system attributes. We hypothesize that both the modified STI and septic systems specifications such as tank capacity per bedroom and drainfield type would explain most of the variance in septic systems failure resulting in repair or replacement. We use a classification inference tree and generalized logistic regression models to analyze a binary qualitative variable, like whether a septic system installation is new or replacing an existing system, as a function of a number of expletory variables such as the modified STI, whether the septic drainfield is installed in a mound, and septic tank capacity per bedroom. Based on the results of statistical analyses, we create a map depicting relative risk of septic systems vulnerability in southern Bryan County, Georgia. With a methodology to assess the vulnerability of septic systems to failure, we discuss the policy, planning, and management implications and opportunities presented by the availability of this data.

Coastal Georgia remains among the least developed coastlines in the US and is characterized by extensive marshes, estuaries, and barrier islands, many of which remain undeveloped. The success at preserving Georgia’s coastal resources is fueling increasing pressure for residential and commercial growth in the coast region. Bryan County, Georgia was selected for this research project both because they potentially face significant impacts from septic systems impacted by rising sea levels, and their septic data was robust and well maintained, which facilitated its use in this study.

Bryan County, Georgia, USA, is located in the Georgia Coastal Plain in the Ogeechee River basin near the center of the Georgia Bight and southwest of Savannah (^{2}, with approximately four percent of the total area is water [

Adopted and modified from Georgia Department of Natural Resources, Coastal Resources Division [

Since 2010, Bryan County has been the second fastest growing county in Georgia; the population increased by 31% from 2010, with 30,233 residents to 2019, with 39,627 residents [

Most of the Georgia Coastal Plain has very low topographic relief. The LiDAR DEM ranges from 0 m adjacent to intercoastal rivers and tidally influenced marsh areas to a maximum value of 16 m (North American_1983 datum) in the southern or lower part of the county (

The surficial aquifer system in Bryan County is composed of Holocene to Pliocene aged sediments and ranges in thickness from over 5 m in the southeastern portion of the county to approximately 43 m in the northern portion of the county [

Long term water table data were utilized to construct the initial groundwater model and evaluate seasonal influences in the surficial aquifer from the University of Georgia Center for Research and Education at Wormsloe (CREW) located in Chatham County, Georgia immediately to the northeast of Bryan County (

Long-term water level data was used from the Wormsloe Site located to the NE to model seasonal variations in the elevation of the water table [

The seasonal timing of the SHWT was estimated based on the long-term water level data from the background groundwater monitoring site and has been observed to occur during the late winter months when evapotranspiration rates are lowest (December–February). The water table elevation data from the abovementioned location from 2016 to 2018, and LiDAR DEM were used to develop the initial groundwater surface model. Verification of the surface of the water table was performed by using a MALA Ground Penetrating Radar (GPR) system and a controller paired with 160 MHz and 450 MHz antennae. These are shielded antennae that incorporate both transmitter and receiver in one unit at fixed spacings. The GPR data were post-processed for Time-Zero adjustment, spatial interpolation, background removal, 2D spatial filtering, amplitude correction and bandpass filtering.

The depth to groundwater was measured via GPR at fifty-eight locations in Lower Bryan County where accessible and the locations were situated to provide an even spatial distribution and located at varying elevations to evaluate topographic influences (

GPR profile locations were recorded using a Trimble Geo 7X (Model 88161, Trimble Corp., CA) Global Positioning System (GPS) accurate to within 0.15m. The elevation of the water table was calculated at each GPR location by using the LiDAR surface elevation minus the depth to groundwater as indicated in the GPR data (GW_{EL} = Surface_{EL}−GW_{Depth}). The relationship between groundwater elevation and LiDAR surface elevation for each GPR location indicated a linear relationship with an ^{2} of 0.99 as expected in an area of low topographic relief. Therefore, it was assumed that the water table surface strongly follows or mimics the relatively lower elevation surface topography in the area. A water table elevation raster file was generated to model the seasonal low and high-water table elevations using the linear equation and known LiDAR DEM for each pixel of lower Bryan County. Depth to groundwater level raster file for each pixel in the south part of Bryan County was generated from known LiDAR DEM and the calculated water table elevation (

The original STI is derived from widely used Topographic Wetness Index (TWI) [_{sat} is the mean saturated hydraulic conductivity of the soil (m day ^{-1}), and

In _{wt} is depth to the seasonal high water table (m) from ground surface obtained from known LiDAR DEM data and the calculated water table elevation. We used the multiple flow direction D-infinity algorithm [^{2} (10 acre) and the mean of modified STI minus one standard deviation of pixels within parcels with the size equal or greater than 0.04 km^{2} (10 acre).

We obtained the septic system inventory record for the southern part of Bryan County from the Georgia Department of Public Health (GDPH) online system, which is a publicly accessible database [

To obtain septic system characteristics (e.g., septic tank capacity, year that septic system was installed, year that structures were built, number of bedrooms, and depth to the water table or restrictive layer) for each septic system point feature from septic system inspection report and property report, we used Joins and Relates feature in ArcGIS to joint data spatially or based on attributes from a table. At the end, the total number of 3343 septic point features with available inspection and characteristics reports was extracted (

In the original inspection report the condition of septic systems was reported in three groups: new (n = 3295), repair (n = 44), and addition (n = 4). ‘Addition’ refers to the installation of a larger septic tank due to the addition to a house, which is required by law. A partial replacement reported as a ‘repair’ system and a full system replacement reported as a ‘new’ system.

The average time required for an septic system to malfunction typically is more than 10 years [

In soil that is generally unsuitable for siting a septic system due to conditions such as low permeability, rock formations, or high groundwater elevations septic systems with mounded drainfields may be considered [

Septic tank capacity per bedroom was calculated as a tank capacity for each septic system (L) divided by the total number of bedrooms.

Recursive binary partitioning methods are a popular statistical tool for regression analysis. These methods provide an alternative to generalized linear models for categorical responses. A Conditional inference trees estimate a regression relationship by binary recursive partitioning in a conditional inference framework. Classification tree models in R (e.g., CART) use internal cross-validation to balance model complexity against the goodness of fit. These models are subjected to overfitting. However, conditional inference trees (e.g., ctree) are unbiased and do not suffer from overfitting. Also, the prediction accuracy of conditional inference trees is equivalent to the prediction accuracy of optimally pruned trees, and no “pruning” or cross-validation is needed [

The algorithm involves two steps; in the first step, the algorithm tests the global null hypothesis of independence between any of the input variables and the response variables. The algorithm stops if the null hypothesis cannot be rejected. Otherwise, the algorithm selects the input variable with the strongest association to the response variable. This association is measured by a

In the second step, the algorithm implements a permutation test framework to find the optimal binary split in the selected covariate in step one. Then, these two steps repeat recursively [

We built a conditional inference tree with ‘septic system’s New or Replace’ as the response (dependent) variable and septic tank capacity per bedroom, modified STI, and septic system drainfield types (

We assessed the strength of the relationship between a binary response variable _{0} is the intercept, the _{n} independent variables, and _{n} their coefficients. We used the maximum likelihood method to estimate _{0,} _{1},…,_{n}. The continuous probability

The _{n}) to an estimate of its standard error [

We evaluated the models’ performance with two accuracy measures; the overall accuracy for confusion matrix and the area under the Receiver Operating Characteristic (ROC) curve. For each model (conditional inference model and logistic regression) we calculated the overall accuracy measure for the confusion matrix, a table that contrasts predicted vs. observed classifications, for training and validation datasets. Then we calculated the accuracy measure for the confusion matrix for the validation dataset to assess the ability of the model to predict septic systems’ replacement rate.

Overall accuracy for the models is calculated from the confusion matrix, and it is the sum of the true positive and true negative of classifier divided by a total number of observations on training or validation dataset. Values close to one represents the ability of the model to predict a binary response more accurately [

The calculated modified STI for all parcels has a normal distribution and ranges from 5 (dryer) to 13 (wetter) (

The modified STI for parcels smaller than 0.04 km^{2} (10 acres) was calculated as a mean of the modified STI for each pixel within that parcel and for parcels equal or greater than 0.04 km^{2} (10 acres) was calculated as a mean of the modified STI minus one standard deviation of each pixel within that parcel. Small numbers represent dryer soil conditions and large numbers represent wetter soil conditions.

Based on the methodology described in the Materials and Methods section, the total numbers of ‘new’ systems in the revised version of the inspection report was 1479 and the total number of ‘replace’ systems including partial repair, addition, and full replacement was 1864 (

We illustrate the relationship between the response variable (septic systems Replace or New) and input variables with a dichotomous tree diagram for the training dataset (

Values and letters (

Each oval node represents the splitting variable and branches that connect nodes. Each rectangular box in the conditional classification tree represents a terminal node that represents the final groups. The “n” in each rectangular box represents the sample size. The

Similarly, following the left branch of the tree (

The logistic regression analysis showed the nonlinear relationships between the independent variables and septic system replacement probability (response variable). The model regression coefficients, standard errors of the slope coefficients,

Variables | Coefficient ( |
Standard error | Wald | ||
---|---|---|---|---|---|

Drainfield.type | -4.66 | 0.16 | 827.73 | <0.0001 | |

Modified.STI | 0.17 | 0.05 | 11.20 | <0.001 | |

Tank.capacity | 4 × 10^{−4} |
1× 10^{−4} |
13.02 | 0.001 | |

Drainfield.type | -4.66 | 0.16 | 790.77 | <0.001 | |

Modified.STI | 0.12 | 0.08 | 2.71 | 0.1 | |

Drainfield.type | -4.67 | 0.16 | 793.28 | <0.0001 | |

Tank.capacity | 5 × 10^{−4} |
2 × 10^{−4} |
6.25 | 0.002 | |

Modified.STI | 0.18 | 0.05 | 12.09 | 0.0005 | |

Tank.capacity | 4 × 10^{−4} |
1 × 10^{−4} |
11.61 | 0.001 | |

Drainfield.type | -4.66 | 0.17 | 790.40 | <0.0001 | |

Modified.STI | 0.13 | 0.08 | 3.03 | < 0.05 | |

Tank.capacity | 5 × 10^{−4} |
2 × 10^{−4} |
8.73 | <0.001 |

The

The intercept for our final model was 1.53. Based on the model 7 in

To explore both conditional inference tree and logistic regression models’ performance, the AUCs were quantitatively compared for validation datasets (

Classification model | Overall accuracy ^{a} |
True Positive ^{a} |
AUC |
---|---|---|---|

0.87 | 0.80 | 0.90 | |

0.86 | 0.76 | 0.88 |

^{a} Overall accuracy and True Positive were obtained from the confusion matrix for the validation dataset.

Overall accuracy was calculated from the confusion matrix, which is the sum of true positive and true negative divided by total validation data point (here n = 668). The overall rate of correct classification for validation dataset was 87% for conditional inference tree and 86% for logistic regression model showing low model misclassification error rate, with lower misclassification error rate for the conditional inference tree model (12%) than for the logistic regression model (14%). The correct classification or true positive (sensitivity) is the group of septic systems that actually experienced replacement. True positive is 80% and 76% for the conditional inference tree model and logistic regression model, respectively (

To visualize this, because the AUC and true positive values for both models are not significantly different, averaging the output probabilities is an effective ensemble method to get the final septic system replacement probability for each parcel. Then the calculated probability for system replacement was classified using natural breaks as low (≤ 18%, first quartile), moderate (19% - 28%, median), high (29% - 96%, third quartile), and very high (97% - 100%, maximum) (

We used classification inference and logistic regression models to predict the probability of septic system replacement. Both models have high AUC values (0.90 and 0.88 for classification tree and logistic regression, respectively) (

The results of the classification inference tree indicate that septic systems with a level drainfield (

We observed the lowest replacement rate of 4% in septic systems with mounded drainfield and septic tank capacity per bedroom of 1136 L (300 US gallon per bedroom) (

A larger replacement rate (16%) was expected for septic systems with mounded drainfield and tank capacity of 946 L (250 US gallon per bedroom) or less (Node 8, terminal Node 9) than systems with larger tank capacity per bedroom (Node 8, terminal Node 10) (

The results of the final generalized linear regression model (^{β} > 1 the variable increases the expectation of response variable and ^{β} < 1 decreases the expectation [

Once regarded as a temporary solution for wastewater management in areas where centralized sewage treatment would eventually be installed, septic systems are now recognized as a permanent component of community infrastructure in many areas [

Few communities actively monitor the functioning of systems or mandate regular system maintenance. Recognizing that poorly sited and improperly designed systems can cause significant problems, septic system research and regulation has focused on optimizing their performance through system design and siting. In Georgia, for instance, the state has implemented regulatory measures increasing the required distance between the septic drainfield and the groundwater table or impermeable layers, and requiring dual-chambered septic tanks that help prevent wastewater solids from clogging the drainfield [

Developing better local data on the location and condition of septic system promises to make management and regulation of installed septic systems more feasible and beneficial. The methodology described here provides the ability to assess the functionality of septic systems over time and at scale potentially allows for a much more sophisticated and effective septic system management. For instance, systems might be prioritized for additional attention if it is deemed highly likely to fail, or additional siting requirements could be instituted to account for variance in the STI. In addition, this analysis could be used to target outreach efforts regarding septic system maintenance and operation, or to direct water quality monitoring efforts based on likely concentrations.

In addition to improving septic system maintenance and regulation, this methodology can also benefit future planning and development decisions in coastal Georgia and other coastal areas with similar geophysical characteristics. The use of the digitized and geo-located septic records in this process allows the analysis of how hydrological factors interact with other variables such as tank capacity per bedroom, drainfield type, housing characteristics, system age, and other elements to influence system performance. In addition, this process allows planners and regulators to examine the impacts of septic systems at landscape or watershed scales. Past efforts to report on the impacts of septic systems to environmental quality and public health have relied on testing individual systems to identify failures and extrapolate the effects of those failures across larger areas [

We encountered some limitations in this project. First, the septic system location data identified parcels with septic systems, and the drainfields’ location was not marked within the parcel. As a result, the parcel centroid was used as the drainfield location. For most parcels, this was not a significant issue because they are smaller than 0.04 km^{2} (4 acres), which means the actual location was nearby. However, for larger parcels, this is more problematic, though this only represents 2.5% of the systems assessed, and the efforts to create a generalized assessment of the parcel ameliorated some of the impacts of this limitation. Another limitation of note is the difficulty translating the condition and maintenance data fields into repair and replacement data as many of the systems reported to “new” even though it is clear that these are actually new installation of replacement systems. The process used to identify replacement systems based on the year the associated structure was constructed resolved much of this issue, but it is likely that some number of system failures and replacements were missed as a result. Despite these constraints, the overall results of the analysis provided valuable information that improved the understanding of the hydrological factors associated with septic system vulnerability in coastal areas.

By demonstrating how modified STI values can be related to the vulnerability of septic system failure, we create a valuable metric that could radically improve the way septic systems are managed. Current septic system siting and management decisions are based on a single assessment of site conditions at the time of installation. There are no dynamic variables that account for changes in soil moisture due to changes in the groundwater table or other climatic changes. This novel application of the STI incorporating the baseline seasonal high groundwater elevation and known septic system specifications to create a GIS-based framework for septic system vulnerability using classification inference tree and generalized logistic regression models creates a metric that can be used to evaluate how a site’s suitability for a septic system changes over time. This process is based on a newly developed Bryan County Health Department’s database of septic system characteristics such as septic tank capacity, year the system was installed, year that the attached structures were built, number of bedrooms in those structures, and depth to the water table or restrictive layer. The results confirmed our hypothesis that both the modified STI and septic system specifications, such as tank capacity per bedroom and drainfield type explained most of the variance in septic system repair and replacement. The model validation outputs showed that both models can use to predict septic system replacement rate. Although, the conditional inference tree provided a better prediction of the replacement rate compared to the logistic regression model with a higher true positive rate.

Overall, we found that septic system drainfield type (level vs. mounded) is a significant variable in predicting septic systems vulnerability. Systems with a

The septic system vulnerability data and maps will be valuable tools to aid decision-making with respect to existing system maintenance and operation, as well as future site selection, design specifications, and even future land use planning. Also, the modeling tool may also serve as the basis of septic system policy and management decisions at a larger scale such as at county level and watersheds scales.

Future investigations that are possible with this methodology include improving our understanding of septic system viability social, economic, and demographic variable by lining additional data sources such as census data or a social vulnerability index with our model. Furthermore, coastal areas are particularly vulnerable to climate change, specifically the hydrological changes that cause by sea-level rise. Future research that integrates plausible sea-level rise scenarios will help to assess the impacts of sea-level rise on groundwater elevation, soil saturation, and septic system performance and viability.

The authors would like to acknowledge Jessica E. Alcorn, Ph.D., and Courtney M. Balling, Ph.D. candidate at the University of Georgia Marine Extension and Georgia Sea Grant, for their assistance with accessing and interpreting the septic system location data. Thanks are also due to Skip Youmans and Michael May with the Bryan County Health Department for their assistance with the septic data and understanding conditions in the field. We also thank Sarah Ross, Director of the University of Georgia Center for Research and Education at Wormsloe (CREW) for access to the groundwater monitoring well network at Wormsloe and logistical support, and Tim Herold and Albert Killingsworth for their assistance with the GPR field work.