Edited by: Jianfang Chen, Ministry of Natural Resources, China
Reviewed by: Haiyan Jin, Ministry of Natural Resources, China; Michael S. Wetz, Texas A&M University Corpus Christi, United States
*Correspondence: Dante M. L. Horemans,
This article was submitted to Coastal Ocean Processes, a section of the journal Frontiers in Marine Science
This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Aquaculturists, local beach managers, and other stakeholders require forecasts of harmful biotic events, so they can assess and respond to health threats when harmful algal blooms (HABs) are present. Based on this need, we are developing empirical habitat suitability models for a variety of Chesapeake Bay HABs to forecast their occurrence based on a set of physicalbiogeochemical environmental conditions, and start with the dinoflagellate
Harmful algal blooms (HABs) manifest themselves when aquatic algal species grow to such levels that they negatively affect humans, fish, or other aquatic organisms. Examples of such harmful effects are a critical reduction of the oxygen concentration (
Various modeling techniques have been proposed to forecast and model HABs [see
Numerous metrics can be used to assess the performance or goodnessoffit of such statistical models (
In this contribution, we apply various statistical models to forecast the likelihood of occurrence of high concentrations or ‘blooms’ of the dinoflagellate
Approximately 3,600
The Chesapeake Bay is more than 300 km long, approximately 50 km at its widest point, and has an average water depth of approximately 7 m, ranging from 040 m (
The Chesapeake Bay and its bathymetry and main tributaries, located on the East Coast of the U.S.A.
The Chesapeake Bay has been systematically sampled within the Chesapeake Bay Program, resulting in longterm, biweekly or monthly,
The nineteen environmental variables used to train the
Variable  Definition  Arithmetic mean  Standard deviation  Units 

S  Salinity  14.2  6.7  / 
T  Water temperature  17.6  8.0  °C 
pH  Water acidity  8.10  0.37  NBS scale 
Si  Silica concentration  7.5×10^{1}  7.1×10^{1}  mg L^{1} 
swrad  Solar irradiance at the water surface  192  54  W m^{2} 
TON  Total organic nitrogen concentration  5.1×10^{1}  2.4×10^{1}  mg L^{1} 
TDP  Total dissolved phosphorus concentration  1.6×10^{2}  1.4×10^{2}  mg L^{1} 
O_{2}  Dissolved oxygen saturation  104  16  % 
TDN  Total dissolved nitrogen concentration  4.8×10^{1}  3.2×10^{1}  mg L^{1} 
TN  Total nitrogen concentration  7.0×10^{1}  3.9×10^{1}  mg L^{1} 
gradS  Vertical gradient of the salinity  3.3×10^{1}  2.6×10^{1}  m^{1} 
TDN : TDP  Molar ratio of the total nitrogen to phosphorus concentration  106  117  / 
TP  Total phosphorus concentration  3.9×10^{2}  3.0×10^{2}  mg L^{1} 
gradT  Vertical gradient of the water temperature  1.1×10^{1}  1.8×10^{1}  °C m^{1} 
NH4  Ammonium concentration  2.8×10^{2}  4.2×10^{2}  mg L^{1} 
depth  Total water depth  16.4  6.8  m 
wind  Magnitude of the wind velocity  3.3  1.6  m s^{1} 
rain  precipitation  2.0×10^{5}  4.6×10^{5}  kg m^{2} s^{1} 
TSS  Total suspended solids concentration  9.5  7.0  mg L^{1} 
/, dimensionless.
We only use observations in the surface waters (i.e., < 1 m water depth), with the exception of the vertical gradient of salinity and water temperature. The reason is that
In this section, we briefly introduce the binomial distribution, and the two statistical models that we apply: GLMs and GAMs.
Before presenting the GLM and GAM, we introduce some concepts of the binomial distribution that are required to understand the core assumptions made to construct these models: probability
Our bloom data are binary: a bloom occurs or it does not. Therefore, we assume that the bloom data follows a Bernoulli distribution or, because multiple Bernoulli trials are considered, a binomial distribution. The probability mass function of the latter distribution in 1D reads as
in which
with
The expected value
Summation of Eq. (2) from
where
The latter function is also known as the logit or logodds function.
A GLM assumes that
The variables
in which
Once we estimated
GAMs are an extension of GLMs in which we allow for nonlinear (interaction) terms:
where the coefficients
in which λ
The
We computed the goodnessoffit of the statistical models using three quantities: the AIC, the accuracy of forecasting a bloom
in which
To avoid overfitting the GLM, an optimal combination of up to five environmental variables were identified based on the AIC (
in which
Of the nineteen variables examined, S, T, and pH were found to be the optimal predictors in forecasting a
Ranking the nineteen variables based on their effectiveness in forecasting
Variable  Definition  Probability selected  Correlation to bloom 

S  Salinity  0.93  1.00 
T  Water temperature  0.58  1.00 
pH  Water acidity  0.57  1.00 
Si  Silica concentration  0.38  0.96 
swrad  Solar irradiance at the water surface  0.30  0.82 
TON  Total organic nitrogen concentration  0.26  0.98 
TDP  Total dissolved phosphorus concentration  0.21  0.99 
TDN : TDP  Ratio of the total dissolved nitrogen to phosphorus concentration  0.19  0.90 
O_{2}  Dissolved oxygen saturation  0.18  0.23 
TDN  Total dissolved nitrogen concentration  0.17  0.21 
TN  Total nitrogen concentration  0.16  0.75 
gradS  Vertical gradient of the salinity  0.14  0.92 
gradT  Vertical gradient of the water temperature  0.14  0.53 
TP  Total phosphorus concentration  0.13  0.52 
NH4  Ammonium concentration  0.12  0.44 
depth  Total water depth  0.13  0.80 
wind  Magnitude of the wind velocity  0.12  0.81 
TSS  Total suspended solids concentration  0.10  0.61 
rain  Precipitation  0.09  0.40 
Based on the AIC, the optimal fivevariable combination is {T, S, pH, swrad, TON} (
Optimal variable combination with and without consideration of biogeochemical environmental variables to train the generalized linear model and the corresponding goodnessoffit.
Variable combination  Goodnessoffit  

AIC^{†} 



Considering all variables  
T, S, pH, swrad, TON  2848 ± 46  78.7 ± 2.4  77.8 ± 0.9 
T, S, pH, swrad  2922 ± 40  76.0 ± 2.6  77.7 ± 1.0 
T, S, pH  3025 ± 45  77.7 ± 3.0  76.4 ± 1.0 
Considering physical variables only  
T, S, swrad, depth, wind  3148 ± 38  79.7 ± 3.1  73.5 ± 1.1 
T, S, swrad, depth  3179 ± 43  79.6 ± 2.2  73.2 ± 1.1 
T, S, swrad  3214 ± 45  80.8 ± 2.9  73.2 ± 1.2 
† Akaike Information Criterion.
‡ Accuracy of forecasting a bloom.
‡† Accuracy of forecasting a nobloom occurrence.
Model accuracy of forecasting a bloom and nobloom occurrence varies in both space and time for the optimal variable combination {T, S, pH, swrad, TON} (
To better understand the temporal and spatial variability in the accuracy of forecasting
Goodnessoffit corresponding to the generalized additive model with and without consideration of interaction terms.
Variable combination  Goodnessoffit  

AIC^{†} 



Without considering interaction terms  
T, S, pH, swrad, TON  2287 ± 38  79.1 ± 2.4  81.9 ± 0.9 
T, S, swrad  2428 ± 56  77.9 ± 3.9  79.6 ± 1.6 
Including interaction terms  
T, S, pH, swrad, TON  1697 ± 74  82.7 ± 2.5  86.1 ± 1.2 
T, S, swrad  1986 ± 66  84.2 ± 2.6  82.9 ± 1.0 
† Akaike Information Criterion.
‡ Accuracy of forecasting a bloom.
‡† Accuracy of forecasting a nobloom occurrence.
Inclusion of the interaction terms clearly improves model performance when we analyze the temporal and spatial variability of the model accuracy. For example, when we compare the accuracy with and without interactions (
Accuracy of forecasting a
Including nonlinear effects in the GAM allows the model to have a range of environmental conditions in which the probability of a
The nonlinear interaction of terms captures the seasonal pattern of the probability of bloom occurrence, and result in an increase of goodnessoffit and more consistent model accuracy over time. Specifically, the swradT interaction term partly captures the seasonality of bloom occurrence, that is, only a few blooms were observed in fall and most blooms occur in spring (
By training statistical models using more than 16,000 variable combinations of nineteen physical and biogeochemical variables, we showed that salinity, water temperature, pH, total organic nitrogen (TON), and solar irradiance is the optimal variable combination to forecast
Adding nonlinear (interacting) dependencies between predictor variables results in valuable insights into the impact of these dependencies on the probability of a
Several assumptions and limitations are imposed to construct our habitat suitability models, which may impact model performance and its application. Though we compared
The relatively small number of coincident observations of
Forecasts of harmful algal blooms (HABs) are highly desired by stakeholders, such as coastal managers and aquaculturists, so they are able to assess risks associated with the presence of HABs and respond accordingly. With this in mind, our objective is to add HABs into our suite of forecasts available through the Chesapeake Bay Environmental Forecasting System (CBEFS) (
Publicly available datasets were analyzed in this study. This data can be found here:
DH: Conceptualization, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing: original draft preparation, review and editing, Data curation. MF: Conceptualization, Investigation, Writing: review and editing. PSL: Conceptualization, Investigation, Resources, Writing: review and editing. RH: Conceptualization, Investigation, Writing: review and editing. CB: Conceptualization, Investigation, Writing: review and editing. All authors contributed to the article and approved the submitted version.
This paper is the result of research funded by the National Oceanic and Atmospheric Coastal Ocean and Modeling Testbed Project under award NA21NOS0120167 to VIMS. Chris Brown was supported by the NOAA Center for Satellite Applications and Research.
The authors acknowledge William & Mary Research Computing (
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
The Supplementary Material for this article can be found online at: