We have assembled a comprehensive and publicly accessible U.S. Geological Survey (USGS) streamflow measurement data set, called HYDRoSWOT, from a USGS National Water Information System archive of acoustic Doppler current profiler river discharge measurements collected from a wide range of rivers throughout the United States. The data set provides a wealth of information on the range of hydraulic characteristics of river cross sections in the United States. Preliminary exploration of the data set, filtered for quality control, indicates that rivers tend toward consistent and predictable forms as discharge increases. The ratio of maximum‐to‐mean depth is highly predictable and is remarkably consistent across all river sizes and discharges. Distributions of hydraulic characteristics provide a large‐scale perspective on the general hydraulic characteristics of rivers. The data set affords the opportunity to analyze hydraulic relations for individual rivers as a function of stage, geomorphic setting, and energy environments and, combined with additional information contained in this data set, might yield predictive relations that could help constrain and parameterize river hydraulic models.

To support the development of emerging streamflow gaging methods and strategies, including remote sensing‐based measurements of river hydraulic characteristics and discharge ratings, we have assembled a comprehensive U.S. Geological Survey (USGS) streamflow measurement data set from the USGS National Water Information System archive of river discharge acoustic Doppler current profiler (ADCP) (Mueller & Wagner, 2009) measurements throughout the United States. ADCP measurements made by the USGS are not necessarily restricted to channel constrictions or ground access points because ADCP technology allows for boat‐based measurements at many suitable locations in a river channel. The ADCP measurements have been made on a wide variety of rivers ranging from less than 6.5 m wide to the largest rivers in the United States. The data, referred to as HYDRoSWOT, are publicly accessible via the USGS ScienceBase (Canova et al., 2016) and are organized in a Microsoft Excel format that allows creation of data subsets.

The HYDRoSWOT data set can be linked to other data sets using the streamgage identifier or the latitude and longitude of the streamgage. These data could contribute to better understanding of the physical characteristics of cross sections and relations among hydraulic variables in natural rivers. Such insight could lead to improved strategies for developing discharge ratings as well as algorithms to estimate river discharge and cross‐sectional geometry from remote sensing platforms that measure channel width and slope but not depth and velocity directly (Bjerklie et al., 2018; Durand et al., 2016; Garambois & Monnier, 2015). Additionally, the data could support research into the fundamental physical laws governing turbulent flow in natural rivers and allow the mapping of these data to explore spatial and temporal trends. In this paper, we introduce the data set and provide some initial insights into the hydraulic characteristics of natural rivers provided by the data.

The HYDRoSWOT data set currently includes over 220,000 individual measurements made at over 10,000 streamgages (Figure 1). The mean number of measurements at each gage is 22, and the median is 8, with a 25th percentile of 2 and 75th percentile of 23. The HYDRoSWOT records consist of all USGS‐approved streamflow measurement data collected by the USGS using ADCPs archived by state in the USGS National Water Information System through November 2014. At present, the data are static, with no mechanism for updating or adding to the data set. However, the data set by intention is easy to update and append with other sources and types of data. The measurement data include summary information from individual discharge measurements taken at USGS streamgages across the United States. The HYDRoSWOT data set (Canova et al., 2016) can be accessed at the USGS ScienceBase website (

Given that each measurement record has an associated latitude and longitude, various derived relations can be mapped in order to investigate spatial patterns in any specific hydraulic quantity. Each data record also has an associated altitude and a contributing drainage area. These data fields allow for the hydraulic characteristics, including width, depth, velocity, and associated hydraulic parameters (such as the width‐to‐depth ratio, the maximum‐to‐mean depth ratio, and the maximum‐to‐mean velocity ratio) to be associated with the spatial location, altitude, and size of the watershed contributing to the gage. The data could also be associated with other spatially varying climatic, topographic, and geologic characteristics that are part of other existing data sets such as the National Hydrography Dataset (U.S. Geological Survey, 2018a, 2018b).

Although the data are approved by the USGS, many of the records have some fields that are not populated. For example, nearly 15% of the flow records do not report the stream width, nearly 30% do not report the drainage area, and only 25% (approximately) report both the mean and maximum for both the depth and velocity fields. Additionally, the data set has not been exhaustively controlled for outliers or data that might have been entered into the data set incorrectly. For example, some records show a mean velocity greater than the maximum velocity and show a measured discharge more than 5% larger or smaller than the discharge calculated by multiplying the reported cross‐sectional area by the reported mean velocity.

The measurements were not necessarily taken from the same location along the stream reach at each streamgage (i.e., not the same cross section every time). As such, for any single gaging station where there are multiple measurements, there may be more variance than if all of the measurements were made at a single cross section, but the relations between variables for the gaging site might be more representative of the reach as a whole. Because of these deficiencies in the data set, the query capabilities available within Excel should be used to screen for specific data quality characteristics needed for individual projects.

Here, we explore some basic relationships between variables using complete records only (i.e., all hydraulic measurements and key associated fields are populated). This stipulation requires that each record include the station identifier, latitude and longitude, the altitude of the gage at the station, the drainage area contributing to flow at the station, and a measured discharge, width, mean velocity, and mean depth. With this requirement in place, the available data were reduced from a total of 223,023 records to 140,949 records (a reduction of 37%). We also stipulate that each record includes both the maximum velocity and the maximum depth, which reduced the available data to 43,038 records. To avoid the possible effects of localized obstructions in small channels that might cause highly variable hydraulic conditions, we restricted the analysis to streamflow measurements with top width greater than 15 m and contributing drainage area greater than 13 km^{2}. Additionally, as quality control measures, we excluded negative flows from tidal effects and any records where (1) the maximum depth or velocity is less than the mean depth or velocity; (2) the ratio of maximum‐to‐mean velocity exceeds 10 (under the assumption that if the ratio is greater than this, the cross section may contain extreme irregularities or multiple channels); and (3) the reported discharge is more than 5% greater or less than the discharge calculated by multiplying the reported width, mean depth, and mean velocity. Figure 1 shows those gages that were used in the analysis as blue dots and those that were not used as red dots.

With these restrictions in place, the subset of data used in the analysis includes 20,625 records, representing 3,519 gaging stations, with an average of 6 and median of 4 measurements at each station, and first and third quartiles of 2 and 8, respectively. Summary statistics for these data are listed in Table 1. The marked difference between the mean and median values for the discharge and width indicates that the rivers in the data set are not normally distributed.

Four fundamental hydraulic ratios derived from the data subset were summarized. These ratios are often used to describe the characteristics and state of flow in a measurement cross section (Table 2) (Blodgett, 1986; Leopold et al., 1960). The width‐to‐depth ratio (*W*/*Y*) is an index of the relative influence of the channel sides and bottom on resistance and thus the velocity distribution. The ratio of the maximum‐to‐mean velocity (*V*max/*V*) is a measure of the vertical velocity profile of the flow, which might be related to the frictional resistance of the channel bed. The ratio of maximum‐to‐mean depth (*Y*max/*Y*) is related to the cross‐sectional channel shape. The Froude number (*F* = *V*/(*gY*)^{0.5}, with *g* the acceleration due to gravity) is a measure of the relative importance of inertial and gravitational forces and indicates the energy state of the flow at the cross section.

Note that these limits are constrained by the quality control measures used to develop the data set.

A summary of the ratio values is provided in Table 2. The relations between the maximum depth and the mean depth, and the maximum velocity and the mean velocity are shown in Figures 2a and 2b, along with the best fit line representing the mean correlation between the maximum and mean values for each. Because of the large amount of data, a two‐dimensional histogram format is used for Figure 2, with each point shown on the graph representing the density of points within a specified region of the plot area indicated by a color scale. The best fit line is estimated using the line of organic correlation (LOC) regression (Kritsky & Menkel, 1968; Doornkamp & King, 1971; Helsel & Hirsch, 1992, p. 276). The LOC is used because it does not assume that either variable is dependent on the other. The LOC minimizes errors in both the *x* and *y* directions. In Table 2, the ratio of *Y*max to *Y* shows the smallest coefficient of variation, followed by the ratio of *V*max to *V*, which indicate that these measures are the most stable across the various flows and rivers.

The cross‐sectional geometry characteristic, indexed by the ratio of maximum‐to‐mean depth (*Y*max/*Y*, Figure 2a), varies over a relatively small range across all gaging stations and discharges and is remarkably consistent across the wide range of rivers and discharges in the data set. This finding was also seen by Moramarco et al. (2013) for gaged river sites in the Tiber basin, central Italy. As noted above, the slope of the LOC regression for the maximum‐to‐mean depth shows that *Y*max/*Y* is nearly constant, close to the value of 1.5, which implies that the cross sections tend toward a shape with a maximum‐to‐mean depth ratio consistent with a parabola (Natural Resources Conservation Service, 2019). A ratio of maximum‐to‐mean depth of 1.5 is characteristic and a constant for all depths within a parabolic cross section (given by the equation *y* = *ax*^{2}), with the mean depth computed by dividing the area of the cross section by the top width. Comparatively, if the exponent of the parabolic equation was 3 instead of 2, the ratio would be constant with a value of 1.33, and if the exponent was 4, it would constant at 1.25, and so forth. Similarly, the ratio of maximum‐to‐mean depth for all depths in a triangular cross section is constant with a value of 2, and for both trapezoidal and elliptical cross sections, the maximum‐to‐mean depth ratio is not constant varying with depth (Natural Resources Conservation Service, 2019).

The LOC regression for the maximum‐to‐mean velocity (Figure 2b) shows greater variability, with the variability increasing with increasing velocity, indicating that the velocity is not as consistently organized across all flow conditions as depth.

Figure 3 shows two‐dimensional histograms of the values of the dimensionless ratios as a function of discharge. All of the ratios converge toward a constant at high discharges, with wide variation at lower discharges. As discussed earlier, the ratio *Y*max/*Y* is relatively constant over most of the discharge range. An examination of each of the plots in Figure 3 shows that for discharges above approximately 10,000 m^{3}/s, the convergence toward a constant is quite evident for all of the ratios. There are 218 discharges above 10,000 m^{3}/s in the data subset, associated with 26 gaging stations in six large river systems including the Mississippi, Missouri, Ohio, Yukon, Columbia, and Susquehanna Rivers. However, in all cases examined here, the convergence from high variability toward a constant appears to be a continuous function with increasing discharge.

Depending on the method used to define the converging trend, the *W*/*Y* ratio tends toward a value between 40 and 50, the *V*max/*V* ratio tends toward a value between 2 and 3, the *Y*max/*Y* ratio tends toward a value that is approximately 1.5, and *F* tends toward a value between 0.10 and 0.15. The convergence of the ratios indicates that as the channel size increases, the flow hydraulics become less complex and less variable. This apparent simplification could be due to reduced influence of the bed and banks on the flow. Additionally, the convergence indicates that large discharges in large rivers would be easier to predict accurately from limited information on the depth of flow, provided that surface velocity and width can be measured.

For the flow conditions measured by ADCPs in this data set, the Froude number never exceeds a value of 1 (Table 2). The ADCP measurement method affords the field hydrologists making the flow measurements the freedom to collect depth and velocity information at any location that is accessible from the bank, and therefore, the hydrologists are not restricted to making measurements at bridges, cableways, relatively narrow sections, or only wadeable sections. As such, the ADCP measurement data set provides information that is more likely to represent typical hydraulic conditions in the river reach than would data collected using other methods. Consequently, the range of Froude numbers contained in these data indicates that in natural rivers with width greater than 15 m, super‐critical flow (Froude number greater than 1) is not common.

The HYDRoSWOT data set presented in this paper affords the opportunity to examine hydraulic relations for individual rivers at varying flow levels and for sets of rivers in particular geomorphic settings and energy environments. Based on our preliminary analysis of the data set, the ratio *Y*max/*Y* in cross sections from relatively small to the largest rivers is very consistent, indicating that rivers of all sizes tend to exhibit a maximum‐to‐mean depth ratio characteristic of a parabolic cross‐sectional shape. Other fundamental hydraulic ratios, including *F*, *V*max/*V*, and *W*/*Y*, are highly variable at low discharge, but all tend toward a constant value at large discharges.

The HYDRoSWOT data set consists of direct measurements of the variables that, when multiplied, yield the discharge at a cross section (width, average depth, and average velocity), as well as variables that can be used to infer the effect of drag and flow resistance on the depth and velocity distribution in the cross section (the maximum depth and the maximum velocity). Because the data set includes measurements from gaging stations on thousands of rivers, flow dynamics associated with stream velocity and depth can be explored, for example, by developing at‐a‐station hydraulic geometry relationships.

Supplementing HYDRoSWOT with other large data sets that include static channel morphology and hydrologic information (Allen & Pavelsky, 2018; Andreadis et al., 2013; Frasson et al., 2019; Lin et al., 2019) would provide an unprecedented, continental‐scale data set. This information on river channel hydraulic characteristics could be used to develop general models for predicting velocity, depth, and discharge in rivers over large regions based on limited remotely sensed variables and could also provide ground truth for assessing the accuracy of remote sensing observations. This ability would be particularly pertinent for the upcoming NASA Surface Water and Ocean Topography (SWOT) mission (Biancamaria et al., 2016), which is tasked with providing global estimates of river discharge based on satellite observations of river water surface area (width), water surface slope, and water surface height. Also, the data presented in HYDRoSWOT can be mapped and explored as a function of the physiographic characteristics of a watershed. As the data set is linked to hydrographic information, including topographic slope, geologic materials, climatic regime, and ground cover characteristics, the hydraulic flow characteristics contained in this data set might yield relations that can help constrain and parameterize large‐scale river hydraulic models.

This project was partially funded through a grant through the National Aeronautics and Space Administration Surface Water and Ocean Topography (SWOT) Mission Science Team program (NNH15ZDA001N‐SWOT). The authors would like to thank Robert W. Dudley, USGS New England Water Science Center, for an insightful and extremely helpful technical review and Glenn A. Hodgkins, USGS New England Water Science Center, for help in organizing and editing this report. The HYDRoSWOT data set (Canova et al., 2016) can be accessed at the USGS ScienceBase website (