Understanding the relationship between remotely sensed snow disappearance and seasonal water supply may become vital in coming years to supplement limited ground based, in situ measurements of snow in a changing climate. For the period 2001–2019, we investigated the relationship between satellite derived Day of Snow Disappearance (DSD)—the date at which snow has completely disappeared—and the seasonal water supply, i.e., the April—July total streamflow volume, for 15 snow dominated basins across the western U.S. A Monte Carlo framework was applied, using linear regression models to evaluate the predictive skill—defined here as a model’s ability to accurately predict seasonal flow volumes—of varied predictors, including DSD and in situ snow water equivalent (SWE), across a range of spring forecast dates. In all basins there is a statistically significant relationship between mean DSD and seasonal water supply (
Across much of the western United States, a substantial amount of total water supply originates as mountain snowpack in the colder, winter months. Especially at higher elevations, many regions are snow dominated, with estimates of up to 70% of runoff originating as snow ablation [
With relatively low summer precipitation, western U.S. water supplies are dependent on water from snow ablation, with streamflow peaking around April, May, June, and July (AMJJ). One major challenge in the skillful prediction of AMJJ total streamflow volume—defined here as seasonal water supply—is the spatially limited network of in situ measurements of SWE and related characteristics, such as incremental precipitation and snow density. Measurements are sparse and preferentially located, often in areas of easy access, which leads them to under sample a variety of terrains while oversampling more accessible elevations. This relatively homogenous snow sampling makes estimates of watershedscale hydrology difficult, reducing the accuracy of in situ measurements in regions located in the midlatitudes [
As early as 1906, in situ snow observations have been collected to support hydrologic prediction [
In the last few decades, the emergence of satellitebased snow cover data has allowed for new insights into the state of snow in the West. Satellitebased data is spatiotemporally continuous, in contrast to the limited spatial coverage of SNOTEL data. Several satellite snow products exist, including Landsat’s fractional Snow Covered Area (fSCA) dataset [
In the western U.S., the Natural Resources Conservation Service (NRCS) have primarily used SNOTEL SWE as a predictor in water supply forecasts [
Although a major source of predictability used in water supply forecasting can be attributed to the SNOTEL network, satellite data provides potentially complementary information that may offer unique skill to predictions. Other studies have explored combining in situ and satellite data within computationally demanding physically based models. For example, the combination of MODIS snowcovered area and in situ data was used to estimate daily SWE within the Sierra Nevada and to improve predictions of streamflow and hydrologic models in the Upper Colorado River Basin using physically based models [
A description of relevant data is included in
A total of 15 basins were selected across the western U.S. to span a range of snowdominated conditions. Multiple screening criteria followed Modi et al., 2022, resulting in the selected basins and their attributes shown in
MODISderived raster data of satellite snow cover timing for the western U.S. was obtained from Heldmyer et al., 2021, spanning the years of 2001–2019 at a 500 m resolution. They found that applying a 10% fractional snow covered area (fSCA) threshold in obtaining a binary snow cover series from MODIS snow covered area minimized errors in calculating the day of snow disappearance (DSD), reducing some of the uncertainty associated with canopy masking of snow cover. The basinwide mean DSD—defined here as the “mean DSD”—is a central predictor variable in this analysis and was calculated for all pixels that experienced complete snow ablation (n_{valid}) during a oneyear period. The pixelwise means for all locations within the basin were then averaged to obtain the annual basinwide mean DSD. From the same pixelwise data, we further calculated the daily snow free fraction, (SFF) or the fraction of a given basin for which snow has ablated by a specific date. This value was calculated with Equation (1), by determining the fraction of pixels which had DSD values less than or equal to the forecast date (n_{DSD} ≤ day) with respect to n_{valid}. The dates chosen for this analysis began on 1 April and were evaluated biweekly (on 15 April, 1 May, 15 May), ending on 1 June.
Two sources of in situ data were also used: USGS stream gage data, as well as data from the NRCS’ Snow Telemetry (SNOTEL) sites. The stream gage data, initially accessed as an average daily flow rate, was accumulated to a single value of seasonal water supply volume for that year. For each basin, one SNOTEL site was manually selected to represent in situ SWE at a daily timestep, using beginning of day measurements.
To determine the strength of the linear relationship between satellite variables (DSD and SFF) and seasonal water supply, the correlation between annual mean water supply and mean DSD was calculated under the assumption that each pixel behaves as a spatially discrete snow pillow. The correlations shown in
Historically, 1 April has been an important date for water supply forecasting, as SWE is near its maximum annual value [
For each basin, a Monte Carlo (MC) approach was used to portray the central tendency of the predictions. Following previous studies, which used MC simulations to estimate probabilistic flood hydrographs, 10,000 random MC simulations were utilized. [
To resemble the setup of a true forecast, only data that would have been available on or before the corresponding forecast date was considered for each experiment. For example, on 1 April–the 91st day of the year—DSD values larger than 91 meant that snow had not yet disappeared for those grid cells. Therefore, these values were dropped before computing the spatial mean DSD for each year. A similar procedure was used in calculating SFF (in Equation (1)) to ensure that the model was only provided data that was available on or prior to the forecast date. A linear ordinary least squares (OLS) regression via the scikitlearn and statsmodels Python packages [
For each of the 10,000 MC simulations, model fit and error statistics were calculated for all linear regression models. After each model was fit, the estimated β values were used in conjunction with the reserved testing data to evaluate the model. The intercept, as well as the relative root mean squared error (rRMSE), Pearson’s correlation (R), and percent bias (PBIAS) were reported. These metrics all provide insight into different facets of the model fit. rRMSE, the standard deviation of the model residuals, was chosen to represent the overall model fit, while the PBIAS was chosen as an operational metric determining how accurate the predictions were with respect to total volume, as well as the direction of the errors. After 10,000 iterations, the median scores and interquartile range (IQR) were reported to capture the central tendency and uncertainty associated with each model. This process was repeated for each basin on each of the five forecast dates (1 April, 15 April, 1 May, 15 May, 1 June). Equations (4) and (5) describe the chosen model fit and error statistics:
When considering all years of available data, a significant relationship (
While this relationship differs in strength within each basin, approximately half of the yearly variance in seasonal water supply can be explained by the annual mean DSD. The mean correlation for all basins is 0.46 with a median correlation of 0.50. R^{2} values expected to be larger later in the season, as the DSD information for each year has captured more snow disappearance. Other variables, such as the SFF or SNOTEL SWE, may increase the R2 values within these basins. By discussing the relative predictive skill of each model, the following Sections (
The predictive strength of the satellitederived snow information varies throughout the forecasting season. In the case of the example study basin, the East R. (
The same generally holds true when considering the other basins. The simplest satellite model (Sat_DSD) has a median 1 April PBIAS of −0.78, whereas the 1 June PBIAS has a median of −0.57. In terms of the overall model fit, the rRMSE varies most substantially from model to model, with a median 1 April rRMSE of 0.33 when considering DSD mean and SFF, and 0.42 only considering DSD mean. In some basins, changes in model skill are not always positive, as can be seen in
In the San Juan R. basin, in all 10,000 simulations, the RSS for the training set was always reduced when both DSD mean and SFF were introduced as independent variables; i.e., the Sat_combo case. In other basins plotted in
For the East R. gage, the Phys_SWE model outperforms all combinations of the satellite models from 15 April to 15 May with respect to PBIAS (
When considering the full set of study basins in
Using satellite snow timing data compiled by Heldmyer et al., 2021 (detailed further in
In our attempt to answer whether remotely sensed snow disappearance can explain seasonal water supply variability, a few features appeared. First, due to a lack of snow disappearance at the beginning of the forecast period, the initial hypothesis —that linear models depending on DSD and SFF data improve in their predictive skill over time, as snow disappears—cannot be rejected in most basins. In some basins, such as the East R. and Pacific Cr. basins, PBIAS skill did not consistently improve from 1 April to 1 June. However, the central tendency of other error metrics, such as the rRMSE, did improve over the course of the forecast period. In those same basins, as well as seven others (Walker R., Lamar R., Stehekin R., Carson R., Little Wood R., Santiam R., and Sevier R.) there was at least one instance in which the rRMSE of the Sat_DSD and Sat_combo models increased from one date to the next. In five of the nine basins, which saw increases in rRMSE over time, this decrease came before the date of mean DSD.
Interestingly, a strong correlation exists between basin SWE/P ratio and model error, as shown in
The Sat_DSD and Sat_SFF models varied in predictive skill across basins, as presented in
The relative skill of satellite and in situ models also varies with time. Using a MannWhitney U Test (also known as a Wilcoxon Rank Sum Test), a nonparametric comparison of model distributions, the distribution of the satellite model predictive metrics with lowest median PBIAS (i.e., Sat_DSD, Sat_SFF, or Sat_combo) was compared to the distribution of the Phys_SWE model, to provide an understanding of when and where the skill is significantly different.
A visualization of the mean difference in absolute PBIAS distribution for all 10,000 iterations is shown in
Another important perspective on the question posed in this paper is that, given limited SNOTEL coverage across the western U.S., predictive capabilities solely provided by satellitederived snow information show promise in providing predictive information for less wellserved water communities that may not have a SNOTEL station in their basin. Similarly, changes in stormtrack and overall climate could potentially imperil the relationship between SWE observed at SNOTEL locations and seasonal water supply, such that remote sensing would be likely to become increasingly important to supplement in situ observations in making predictions in future years.
The availability of DSD data used here was constrained to data obtained and processed by Heldmyer et al., 2021; as a result, each model was only fit on data from 2001–2019 (19 years of available data). This limited sample size restricted the maximum number of predictor variables used in the models to avoid model overfit. In addition, despite efforts towards a large degree of resampling, e.g., 10,000 iterations of model fitting, the short record may still lead to model overfitting as there are fewer training samples for each prediction, as well as an increased possibility for outliers to skew model fit. On a physical level, a key limitation of satellite variables previously mentioned is they do not provide a direct estimate of water volume, rather only an indirect measurement of the timing of snow presence. Moreover, we acknowledge that all variables used in this study (remotely sensed and in situ) are subject to observational errors. In particular, the estimation of snow disappearance from satellitederived spectral data is an indirect measurement of ‘true’ snow disappearance, expected to introduce additional error to the study. While the errors associated with uncertainties in satellitederived DSD have been explored with respect to errors in spatially distributed snow reconstructions, the effect of these uncertainties on aggregated data has not been studied [
To answer the posed research question(s), this study considered relatively simple linear models to portray general relationships between variables. However, emerging machine learning techniques, which may capture nonlinear interactions, as well as models considering a broader suite of predictor variables, could be used in future analyses. As time progresses and the remotely sensed record length grows, the overarching question of this manuscript should be revisited and reanalyzed with updated data records to understand how these trends may change in a future climate and under a broader set of conditions, especially with respect to anomalous years beyond what has already been observed in the historical record.
In summary, we revisit the question posed in the title of the paper—can remotely sensed snow disappearance predict seasonal water supply? We can affirmatively answer, with all 15 of the study basins showing a statistically significant (
The following supporting information can be downloaded at:
Conceptualization and methodology, K.B., B.L. and J.M.P.; data curation, N.R.B. and P.M.; writing—original draft preparation, K.B.; writing—review and editing, B.L., J.M.P., P.M. and N.R.B. All authors have read and agreed to the published version of the manuscript.
Maps of remotely sensed snow timing data can be accessed at
The authors declare no conflict of interest.
The 15 study basins spread across eight western states. Correlation values between DSD mean and yearly seasonal water supply are calculated at basinwide level, marked at the outlet location for each basin.
Absolute correlation between V_{AMJJ} and DSD for the years of 2001–2019 spatially illustrated for the example basins of the East R. at Almont, CO, (
The median relative RMSE, correlation, R, and percent bias, PBIAS, error statistics for three different satellitebased water supply forecasts in the East R. basin, with interquartile range (25/75 percentiles) shaded. Vertical dotted lines denote the days chosen for forecasting—1 April, 15 April, 1 May, 15 May, and 1 June.
The median percent bias, PBIAS, statistics for three different satellitebased water supply forecasts with interquartile range (25/75 percentiles) shaded, for all 15 study basins. Vertical dotted lines denote the days chosen for forecasting—1 April, 15 April, 1 May, 15 May, and 1 June.
The median relative RMSE, rRMSE, correlation, R, and PBIAS statistics for three different satellite and insitubased water supply forecasts in the East R. basin with interquartile range (25/75 percentiles) shaded. Vertical dotted lines denote the days chosen for forecasting—1 April, 15 April, 1 May, 15 May, and 1 June.
The median percent bias statistics for three different satellite and insitubased water supply forecasts with interquartile range shaded, for all 15 study basins. Vertical dotted lines denote the days chosen for forecasting—1 April, 15 April, 1 May, 15 May, and 1 June.
The differences between the QCD for DSD mean and SNOTEL SWE data through time. Negative values indicate that the spread of data relative to its first and third quartile is higher for SNOTEL SWE. Blank values indicate that the Q1 value is equal to zero, or there is no remaining SWE measured at the basin’s SNOTEL site by this date in at least 25% of the study years.
A comparison of model skill among satellite and insitubased models over time. The satellite model with lowest median PBIAS at a given date was selected for comparison with the Phys_SWE model.
Description of study basins and relevant attributes. SWE/P is defined as the average ratio of 1 April SWE to cumulative precipitation, as recorded at each SNOTEL station, for the water years 1985–2020. The USGS gage names have been abbreviated for clarity.
Basin Name  USGS Gage Name  USGS ID  Gage 
Gage 
Basin Area (km^{2})  SNOTEL Station  SNOTEL Elevation (m)  SWE/P Ratio 

Walker R.  W Walker River near Coleville, CA  10,296,000  38.38, 
2008  471  575  2191  0.84 
Carson R.  E F Carson River near Markleeville, CA  10,308,200  38.71, 
1646  718  697  2358  0.82 
East R.  East River at Almont, CO  9,112,500  38.66, 
2440  750  380  3109  0.92 
Crystal R.  Crystal River near Redstone, CO  9,081,600  39.23, 
2105  434  618  2674  0.82 
San Juan R.  San Juan River at Pagosa Springs, CO  9,342,500  37.27, 
2148  727  840  3091  0.80 
Little Wood R.  Little Wood River near Carey, ID  13,147,900  43.49, 
1621  655  805  2329  0.75 
Swan R.  Swan River near Bigfork, MT  12,370,000  48.02, 
933  1753  562  1448  0.76 
Bruneau R.  Bruneau River at Rowland, NV  13,161,500  41.93, 
1372  988  746  2240  0.68 
Sandy R.  Sandy River near Marmot, OR  14,137,000  45.40, 
0  711  655  1241  0.41 
Santiam R.  North Santiam River near Detroit, OR  14,178,000  44.71, 
485  553  614  789  0.24 
Blacksmith Fork  Blacksmith Fork near Hyrum, UT  10,113,500  41.62, 
1530  681  634  2722  0.98 
Sevier R.  Sevier River at Hatch, UT  10,174,500  37.65, 
2094  864  390  2928  0.74 
Lamar R.  Lamar River near Tower Falls Ranger Station, YNP  6,188,000  44.93, 
1829  1741  683  2865  0.96 
Pacific Cr.  Pacific Creek at Moran, WY  13,011,500  43.85, 
2048  407  314  2152  0.96 
Stehekin R.  Stehekin River at Stehekin, WA  12,451,000  48.33, 
335  839  681  1402  0.86 
Description of model inputs, units, and naming scheme. ‘Sat’ describes satellitebased measurements of snow, whereas ‘Phys’ refers to physical, in situ measurements.
Model Classifier  Input Variables  Units 

Sat_DSD  Day of Snow Disappearance (DSD)  Day of year 
Sat_SFF  Snow free fraction (SFF)  Percentage (%) 
Sat_combo  DSD and SFF  Day of year; percentage (%) 
Phys_SWE  SNOTEL snow water 
mm 
SatPhys_combo  DSD, SFF, and SNOTEL SWE  Day of year; percentage (%); mm 
Relationships between mean DSD, center of water supply volume, and V_{AMJJ} for the 15 selected study basins. R^{2} and
Basin  Mean DSD  Center of Water Supply Volume  DSDV_{AMJJ} R^{2}  

East R.  130  152  0.82  8.30 × 10^{−8} 
San Juan R.  114  143  0.80  2.20 × 10^{−7} 
Crystal R.  136  157  0.80  2.50 × 10^{−7} 
Sevier R.  95  143  0.57  1.80 × 10^{−4} 
Pacific Cr.  141  147  0.60  1.10 × 10^{−4} 
Walker R.  135  150  0.79  4.50 × 10^{−7} 
Lamar R.  140  152  0.46  1.50 × 10^{−3} 
Carson R.  119  141  0.74  2.10 × 10^{−6} 
Little Wood R.  107  142  0.41  3.00 × 10^{−3} 
Blacksmith Fork  105  139  0.48  1.00 × 10^{−3} 
Bruneau R.  83  130  0.38  4.80 × 10^{−3} 
Swan R.  111  153  0.39  4.00 × 10^{−3} 
Santiam R.  102  135  0.79  4.50 × 10^{−7} 
Stehekin R.  150  153  0.60  9.30 × 10^{−5} 
Sandy R.  75  132  0.52  4.90 × 10^{−4} 
Correlation between SWE/P ratio and median PBIAS values for all basins calculated for each model and each forecast date. Values with a significant relationship (





Mean  Median  

−0.07  −0.28  −0.15  −0.03  0.03  −0.10  −0.07 

−0.21  −0.29  −0.16  −0.35  0.27  −0.15  −0.21 

−0.11  −0.20  0.22  −0.26  0.33  0.00  −0.11 

−0.17  −0.26  −0.57 *  −0.64 *  0.11  −0.31  −0.26 

−0.28  −0.23  −0.09  −0.19  0.24  −0.11  −0.19 
Mean  −0.17  −0.25  −0.15  −0.29  0.20  
Median  −0.17  −0.26  −0.15  −0.26  0.24 