It is becoming increasingly popular to use continuously collected acoustic or optical data to estimate abundance or biomass of fish and invertebrates. However, data from such systems are typically highly spatially autocorrelated and zero‐inflated, and thus simple design‐based estimation techniques are not applicable. Model‐based estimation methods can be used to extrapolate observations along the observed track to larger areas. We tested the precision and accuracy of three model‐based methods using both simulations and field data: Ordinary kriging (OK), Generalized Additive Models with kriged model residuals (GAM + OK), and Generalized Additive Mixed Models with kriged model residuals (GAMM + OK), along with a design‐based method (stratified mean, SM). The GAMM + OK method treats small‐scale variations as random effects, whereas the other approaches aggregate nearby data to reduce autocorrelation and random errors. We found that the GAM + OK method with relatively small aggregation lengths generally gave the best performance of the model‐based methods in terms of both accuracy and precision, followed by GAMM + OK. SM estimates were more accurate and precise than the model‐based estimates in the simulations, but only when the study region was stratified accurately. Based on the simulation and field data analysis results, we selected the GAM + OK method to estimate scallop abundance and biomass for the Georges Bank and the Mid‐Atlantic Bight regions for the years 2011–2015. We also provided SM estimates based on careful stratifications to validate the model‐based estimates.

Many types of survey data are collected continuously, such as acoustic data and photographs from towed or autonomous underwater vehicles. Because the samples produced from these “belt transect” surveys are not random, simple design‐based estimators of the population mean and variance are not directly applicable. For this reason, model‐based estimation is often used to estimate population abundances (Petitgas ; Simard et al. ; Maravelias et al. ; Simard and Lavoie ; Páramo and Roa ; Hedley and Buckland ; Mello and Rose ; Georgakarakos and Kitsiou ; Rivoirard et al. ; Williams et al. ) although design‐based methods have also been used (Jolly and Hampton ; Brandt et al. ; Singh et al. ).

The fundamental difference between design‐based and model‐based approaches is that for the design‐based methods, the population is regarded as fixed and the survey data are the measured characteristics of this population, whereas for the model‐based methods, the observed population is only one realization of a stochastic process (Särndal et al. ; Smith ). Design‐based methods based on stratifications require no assumptions regarding the underlying population, but the samples drawn from each stratum must be randomized so that they are independent and identically distributed. Failure to meet the randomness requirement may cause bias in variance estimates (Cochran ). On the other hand, although model‐based estimation methods do not require random sampling, they are typically based on strong assumptions regarding the nature of the underlying population. For example, one popular model‐based method, ordinary kriging (OK), is based on the assumptions that the population means and covariances are spatially stationary. Real populations may not satisfy these assumptions, which can cause the model‐based estimators to be biased.

The purpose of this paper is to evaluate a number of potential methods to estimate the population abundance or biomass of non‐stationary populations, based on surveys that collect abundance and biomass data continuously, using both simulations and real data. These include OK, and two variations of regression kriging (RK) that can take into account large‐scale trends and covariates in the data: Generalized Additive Models on spatially aggregated data with kriged model residuals (GAM + OK), and Generalized Additive Mixed Models where small‐scale variations are treated as random effects, combined with kriged model residuals (GAMM + OK). In addition, a design‐based method based on stratified means (SM) is also evaluated.

Although the methods described here should be applicable to a wide variety of continuously collected optical and acoustic data, we focus here on observations of sea scallops (*Placopecten magellanicus*) collected using the vessel‐towed underwater digital habitat mapping camera system (HabCam) as an example case. HabCam was developed through collaborations between scientists at the Woods Hole Oceanographic Institution, the Northeast Fisheries Science Center (NEFSC), and with commercial fishermen to survey benthic communities, and to map sea floor habitats (Howland et al. ; Taylor et al. ; NEFSC ). The cameras on HabCam take rapid‐fire still photos of the sea floor (typically 6/s) as it is towed at speeds between 5 and 7 knots at about 2 m above the bottom. Region‐scale HabCam surveys for sea scallops were conducted on Georges Bank (GB) in 2011, and on both GB and the Mid‐Atlantic Bight (MAB) in 2012–2015. Scallop data from HabCam are highly spatially autocorrelated and zero‐inflated (i.e., a high percentage of the data are zeros; Table ; Fig. ), reflecting the patchiness of scallop distributions and the continuous nature of the observations.

The simulated area was 50 km longitude and 100 km latitude (the shape and size are similar to Hudson Canyon South rotational management area in the MAB; NEFSC ) with a 100 m grid size. Scallop spatial distributions are non‐stationary due to the influences of the physical and biological environment such as substrate, depth, temperature, and predator distributions (Brand ; Hart ). The simulated scallop populations are therefore assumed to vary non‐randomly according to large‐scale trends, termed the “first‐order effect.” For simplicity in the simulations, these trends are assumed to be (non‐linear) functions of longitude only. In reality, depth and other environmental factors may be important predictors of the trend; longitude is treated as a surrogate for depth and other environmental factors in the simulations. Stationary “second‐order effects,” representing small‐scale spatially autocorrelated variability, were added to the first‐order trends. Various first‐order and second‐order effects were simulated to test whether the abundance and biomass estimation methods are robust to type of spatial distributions of the underlying population.

We simulated the first‐order trend using a double logistic function*a* and *i*_{max} is the east boundary of the longitudes. The simulated first‐order effects are greatest in the middle and decrease logistically toward the left and right edge of the simulation domain (Fig. ); this pattern mimics the observed distribution of sea scallops in the MAB, where scallop densities are the greatest at intermediate depths (Hart ). We simulated two types of first‐order effects: one where the population is more concentrated in the middle area, whereas they are more spread out in the second (Fig. ).

We simulated the second‐order effects using stationary Gaussian random fields with spherical isotropic covariance structures (Cressie ):*h* is the distance between two points. The nugget/sill ratio (

Scallop distributions are patchy, resulting in data that are highly zero‐inflated (Table ; Fig. ). To reflect the extent of the zero inflation observed in the actual data, only 10% of the locations with the highest sums of first‐ and second‐order effects in the simulations were taken to have non‐zero scallop densities; the other 90% of the sites were set to zero density (Fig. ).

Annotated images

Images w/scallops

Eight types of population distributions, from two types of first‐order and four types of second‐order effects were simulated (Fig. ). We generated 30 realizations for each population type, and then scaled the total abundance and biomass of each realization so that total biomass and abundance was the same. Each simulated population was surveyed using 30 different tracks (where the starting point and first turn of the track were varied). Shape and direction of the simulated tracks was designed to mimic the actual HabCam survey design, in which the long transects are approximately in the direction of the gradient of density. The length of these long transects are alternated: one long transect extends to the boundary of the survey area, followed by a short transect extending to the edge of the middle high density area (NEFSC 2014). This design covers the middle higher density areas more intensely than the more marginal areas toward the edges of the domain in order to improve survey efficiency. Additionally, it has cross‐transects near the high density middle portion of the domain that facilitate estimation of anisotropy. By contrast, a simpler design where each main transect was the same length would have all its cross transects at the edges of the domain, where densities are close to zero, which would give less information on the directional structure of the population.

We used model‐based and designed‐based methods to estimate total biomass and abundance for the simulated populations. These methods were evaluated using relative bias (RBias) and relative root mean square error (RRMSE)

Kriging is one of the most widely used geostatistical method for spatial interpolation (Webster and Oliver ). We tested performance of the three different kriging methods: OK, GAM + OK, and GAMM + OK on the simulated scallop populations. OK is a standard kriging method based on the assumptions of stationary means and covariances (Webster and Oliver ; Hengl ). In some cases, the population may be anisotropic, that is, its variability may be directionally dependent. Based on the assigned first‐order effects, the simulated populations should have the largest variations along the horizontal axis, due to the strong longitudinal (depth) effects. Therefore, we built both the isotropic and anisotropic models (0° and 90°) and tested four types of commonly used variogram models including spherical, exponential, Gaussian, and Matérn models (Cressie ). Of these models, the one that minimized the root mean square error (RMSE, square root of sum of squared deviations of the model predictions from the observed values) was selected. Total abundance or biomass (

RK extends OK to account for a potentially non‐linear global trend. This trend can be estimated using a generalized regression model (e.g., GLM or GAM), potentially with a series of ancillary predictors, and then OK is performed on the residuals of the regression model to model the second‐order effects (Odeh et al. ; Hengl ). The final RK predictions are obtained by summing the regression predicted values and the kriged residuals.

Because HabCam scallop data are zero‐inflated (as is common with population counts), we used a two‐staged “hurdle” model to estimate the first‐order effects, where presence/absence and the level of biomass or abundance at a non‐zero site are modeled separately and then combined to derive the final estimates (Barry and Welsh ; Smith et al. ; Zuur et al. ). The hurdle model is given by the distribution:*p* is the probability of a zero observation and

A “quasi‐likelihood” is an assumption of the relationship between the variance and mean of the observations (Wedderburn ); in the quasi‐Poisson, the variance is assumed to be proportional to the mean, whereas in the quasi‐binomial, the variance is proportional to *p*(1‐*p*), where *p* is the mean. Quasi‐likelihoods allow fitting of the GAM(M)s without assuming a specific probability distribution. In particular, they can account for overdispersion, where the variance increases faster than the mean, which is commonly observed in aquatic populations in general, and scallops in particular.

We estimated the first‐order effects using a two‐dimensional spline function of latitude and longitude in both the GAM and GAMM models. The spatial residuals obtained from the large‐scale model were used to estimate fine‐scale spatial patterns using OK with the same estimation process described above. We estimated the total abundance and biomass of GAM + OK and GAMM + OK model estimates as:

In order to reduce the extent of the zero inflation and autocorrelation among nearby data points, the data were blocked into segments of a fixed length along the tracks. For OK and the hurdle GAM, data within each segment were aggregated into a single data point, with its position taken to be the average of the locations of images, weighted by the field of view of the images in that segment. The hurdle GAMM, by contrast, uses each data point individually, but treats within segment variations as random effects. Treating the data within each segment in one of these ways is necessary because nearby data are autocorrelated, which causes the effective sample size to be well less than the total number of data points. In particular, GAMs (without random effects) are based on the assumption that data points are independent, which would be strongly violated if the data were not aggregated. This is a common technique used when analyzing fisheries acoustic data (Mello and Rose ). The length of the segment should be sufficient to reduce the degree of random variability and spatial autocorrelation of the data, while at the same time small enough to preserve spatial structures (Mello and Rose ). There is no prior knowledge on what the segment length should be used for Atlantic sea scallop and also how sensitive the segment length is for this type of analysis. Therefore, we evaluated the effects of segment length to average the data or determine random effects along the tracks. Scallop aggregations tend to occur at scales of around 1 km (NEFSC 2010), so we tested three segment lengths, 0.75 km, 1.5 km, and 2.25 km. The segment lengths used in the analysis is equivalent to the grid size A, which is the grid size for interpolation.

We tested a SM method to estimate the total abundance and biomass from the simulated data. Only the horizontal transects (along lines of latitude in the simulations) were used in the SM estimation. Because some transects do not extend to the low density edges of the domain, while others do, it is necessary to post‐stratify the horizontal transects into two strata based on high and low first‐order effects (Fig. ). We calculated the mean and its variance of the simulated scallops (

The simulation domain was well‐stratified based on the first‐order trend and the length of the short transects. However, the same precise information may not be available when dealing with the real data. For this reason, we tested the sensitivity of SM estimates to post‐stratification error by widening (SMW) and narrowing (SMN) the central high abundance or biomass stratum by 20% (Fig. ) and then estimated the SMs based on these less perfect stratifications.

The HabCam data were collected during 2011–2015 in GB and 2012–2015 in MAB. We divided the GB and MAB stock regions into 14 subregions based on geographic characteristics and management areas and analyzed them separately because their topology, orientation, and covariance structures differ. Images taken at altitude higher than 4 m were excluded from the analysis because of their poor image qualities. Only scallops with measured shell height larger than 40 mm were used in the analysis because of concerns about full delectability of very small scallops. We converted the shell height (*D* is depth (in meters) and *L* is latitude (Hennen and Hart ). Total count and weight in an image were standardized into abundance and biomass per m^{2} by dividing by the field of view of the image. A summary of the HabCam data used by year is listed in Table .

For model‐based estimations on the real data, we enlarged each subregion by 1 km and used the data within this expanded area to build the subregional models. The average of weight or count (*t*) by image (*j*) and segment (*i*) weighted by field of view (*f*) for every segment along the tracks was calculated as:

The ^{st} order statistic, i.e., the minimum of the *s*). Hurdle GAMs and GAMMs were fitted using quasi‐binomial quasi‐likelihoods for the presence/absence model and quasi‐Poisson quasi‐likelihoods for the positive model to estimate the first‐order trend with respect to latitude, longitude, and depth. Depth is correlated with latitude and/or longitude. To prevent potential problems cause by this collinearity, latitude and longitude were transformed into composite variables: latitude plus longitude, half of the latitude or longitude plus longitude/latitude. We built models including depth plus one of either latitude, longitude, or a latitude/longitude combination. Depth was included in all of the candidate models because it is one of the most important variables that affect scallop distributions. The maximum amount of knots for each term in the GAM and GAMM was limited to 15 for the interaction terms (reduced to 10 for some of the subregions) and 10 for the single terms to prevent from overfitting. The final first‐order model was selected using RMSE from a 10‐fold cross validation. We then performed OK on the model residuals, tested isotropic and a series of anisotropic (from 0 to 180 by 20°) residual OK models, and selected the final OK model using the MedSE:

The 2013 HabCam data was used to evaluate the performance of the three model‐based methods on actual data, using a range of segment lengths (0.5–1.75 km by 0.25 km) for estimating the total biomass. The first‐order effects were estimated using a smooth function of depth and the selected latitude/longitude combination for both the GAM and GAMM approaches. Model performance was evaluated by comparing model predictions to observations from other surveys that are not used in the estimation model, including dredge surveys from the NEFSC (Hart and Rago ) and the Virginia Institute of Marine Sciences, and video drop camera surveys from the School for Marine Science and Technology at the University of Massachusetts, Dartmouth (Stokesbury et al. ). Dredge data were expanded using dredge efficiencies of 0.41 on sand substrates and 0.27 on rougher gravel/cobble substrates (NEFSC 2014). Stations from these other surveys were not typically located on the HabCam transects, and thus model interpolations from the HabCam data at the station locations could be compared to the actual survey data. We also used out‐of‐sample HabCam data, typically center lines that were not part of the basic survey design and were not used to estimate the models, for the same purpose. The MedSE criterion was used to determine the model that best predicts these other survey observations.

For the SM on the real data, each transect was split into segments and the data within each segment were aggregated to help mitigate autocorrelation. We first separated the transects into segments at locations where the direction of the transects changed between parallel and perpendicular to the depth contour. These were further separated by depth strata or by locations where the distance of any two points in the segment was larger than 2 km. These were broken into even smaller pieces if the length of the segments were longer than 10 km (*see* Fig. for an example).

We estimated thresholds for the depth strata from a maximum likelihood based change‐point analysis (Killick et al. ), using the depth partial residuals from GAMs of abundance or biomass with respect to depth constructed for each subregion. The thresholds were detected based on changes in mean and/or variance of the partial residuals. Each subregion was post‐stratified into a maximum of three depth strata separately for each year and for abundance and biomass data.

We estimated the mean abundance (or biomass) and its variance by segment and stratum using Eqs. and . An example of the calculated mean and CVs for each segment for 2015 biomass data is in Fig. . These mean and variances were weighted by total field of view (

The proportion of converged model runs was 99% for GAM + OK and OK but only 52–72% for GAMM + OK (Table ). The type of simulated population and survey track did not affect the optimal model or segment length; results were thus not separated by these factors. Among the three model‐based methods, GAM + OK produced the least biased and most precise abundance and biomass estimates, followed by GAMM + OK and then OK (Table ). GAM + OK also produced the least biased CV estimates, followed by OK and then GAMM + OK (Table ). The RBias and RRMSE of the abundance and biomass estimates tend to increase with increasing segment length (Table ). The increase was the most for OK models and the least for GAMM + OK models. The RBias of the CV estimates tend to increase for GAM + OK and GAMM + OK models and decrease for OK models with increasing segment length (Table ). Taking both bias and precision of the estimates into account, the GAM + OK with 0.75 km segments was the best performing model‐based method, and also produced the least biased CV estimates for both biomass and abundance estimates.

Model type

Segment length (km)

RBias

(mean)

RRMSE

(mean)

RBias

(CV)

N/S ratio

% converged

RBias

(mean)

RRMSE

(mean)

RBias

(CV)

N/S ratio

% converged

The RBias and RRMSE for the properly stratified SM estimates were smaller than all the model‐based estimates but the CVs were highly underestimated (Table ). Additionally, SM estimates were sensitive to the quality of post‐stratification; SMW and SMN estimates were more biased and less precise than most of the model‐based estimates (Table ).

Both model fittings and validations showed that no single modeling approach was always superior, but GAM + OK generally performed the best, followed by GAMM + OK and then OK (Table ). The models performed slightly better when segment length is small.

Segment length (km) and model type

NEFSC

(dredge)

SMAST

(drop camera)

VIMS

(dredge)

Based on the simulation and the 2013 field data analysis results, the GAM + OK method with 0.75 km aggregation length was used to estimate total abundance and biomass for each subregion in GB and MAB for 2011–2015. We also provided SM estimates with careful stratifications (although the CV of the SM estimates are probably understated) as well as the stratified mean estimates from NEFSC and VIMS dredge surveys to validate the model‐based estimates (Hart and Rago ; NEFSC 2014). GAM + OK, SM, and dredge abundance and biomass estimates and their CVs for both stocks for 2011–2015 are in Table , and an example of the interpolation surface for 2015 biomass is in Fig. . GAM + OK estimates agreed well with SM estimates, except for one subregion each in 2014 and 2015. Simple linear regressions of GAM + OK estimates against SM estimates for all years and subregions without the two outliers gave intercepts and slopes of −123.93 and 1.13 for the abundance estimates and 260.14 and 1.01 for the biomass estimates (with the two outliers the intercept and slope are −282.03 and 1.34 for the abundance estimates and −1820.47 and 1.21 for the biomass estimates). GAM + OK and SM annual estimates for both stocks are similar to the dredge estimates, except for the estimates for MAB in 2015. The calculated CVs of the SM estimates were lower than the CVs of GAM + OK for both abundance and biomass.

Our work highlights the importance of incorporating large‐scale trends in spatial distribution modeling when such trends exist. The large‐scale trends that occur in both the simulated and real data violate one of the basic assumptions of OK, namely that the population is (weakly) spatially stationary. As a result, the simulations and field data analysis demonstrated that OK estimates were biased and imprecise because they do not account for the large‐scale trends. RK, in our case using GAM + OK or GAMM + OK models, can account for these trends and was performed better than OK. Similar conclusions were reached in studies of fish (Yu et al. ), soil (Knotters et al. ; Odeh et al. ), and solar radiation distributions (Alsamamra et al. ). In most cases, the GAM + OK approach usually performed better than the GAMM + OK, but the reasons for this is unclear; one possibility is that the algorithms used to estimate GAMMs are not giving as stable or as reliable estimates as GAMs using only fixed effects.

The OK method was the most sensitive to the aggregation/grouping length, followed by GAM + OK and then GAMM + OK. The reduced sensitivity of GAMM models is probably because it treats small‐scale variations as random effects, rather than simply averaging the data within each segment. The estimated nugget/sill ratios decreased more by segment length for GAM + OK and OK but less so for GAMM + OK models (Table ). Averaging data could reduce the resolution of local information and smear small‐scale spatial structures. This could cause misrepresentations of local conditions when the segment length is larger than the natural aggregations of target species, especially when the aggregations are relatively small and dense. The increase of bias and decrease of precision with increasing segment length is more severe for the simulated population with smaller ranges and denser patches (Table ). This is a particular problem for OK since the large‐scale trends are not modeled.

Anisotropy had relatively less influence on the biomass and abundance estimates compared to the model type and the segment length. Much of the anisotropy in the data was induced by large‐scale trends. For this reason, the OK method was more sensitive to anisotropy than the RK approaches. Ignoring anisotropy can cause inaccurate and imprecise abundance and biomass estimates, especially when OK alone was used. For example, the anisotropic OK models for biomass in the northern flank area of GB in 2013 varied by as much as 40%.

Although the RK approach performed well in our analysis, it was criticized by Cressie () and Lark et al. () because the variogram estimates of the random component of the spatial variation are theoretically biased. Generalized least squared and residual maximum likelihood‐empirical best linear unbiased predictors are the two potential solutions (Lark et al. ). However, Kitanidis () and Minasny and McBratney () showed that while these methods are theoretically preferable to RK, they did not substantially improve model predictions.

The spatial autocorrelation inherent in the data can be handled in the model‐based methods by incorporating the spatial structures into modeling framework using, e.g., kriging or Gaussian random fields. However, this can be a problem for design‐based estimators because the samples are not randomized. Even after the samples were aggregated into stratified segments, successive segments in each stratum were still somewhat correlated. The positive correlation between successive observations causes the naïve variance estimator of the sample mean, which assumes the samples to be independent, to be biased low, with the degree of underestimation depending on the strength of the correlation (Cochran ). Williamson () suggested using the cluster sampling variance estimators (Hansen and Hurwitz ), but this method is an approximation and only effective when the CV for the cluster is lower than 0.2. Because more than half of the CVs of segments of the HabCam data were larger than 0.2 for most years, we still used the naïve variance estimates for this study. Even though the tracks were carefully segmented to break down the spatial autocorrelation, the simulation results indicated that the sample variance was still underestimated.

The simulations indicated that the design‐based estimator (SM) performed as well or even better than the best model‐based estimates when the study area is well‐stratified. However, because the higher density central portion was sampled at a higher intensity than the lower density areas, incorrect stratifications (i.e., making the central stratum too wide or too thin) will cause the stratified mean to be biased high. Seriously inaccurate stratifications that are not consistent with the survey design resulted in biased and imprecise estimates that were worse than almost all the model‐based estimates. This could be avoided by surveying all areas at the same rate, but at the cost of reduced precision since more time would be spent in low density areas that contribute little to the overall mean.

In practice, SM estimates generally agreed well with the GAM + OK model‐based estimates, except for two cases, where the SM estimates were 20–40% higher than those from GAM + OK. Most of the survey transects in these cases were placed in the areas with high densities of scallops with few in the marginal habitats, which made it difficult to properly stratify the area. Influences of stratification on design‐based estimates have been documented in several studies. For example, Brandt et al. () showed a 20% difference for horizontally stratified estimates and vertically stratified estimates for abundance estimates based on acoustic data for pelagic fish in Lake Michigan.

Our model‐based and SM estimates from HabCam data generally agreed well with the estimates from the dredge data, with the exception of MAB in 2015, where HabCam estimates were substantially higher than the dredge estimates. A very large year classes of juvenile scallops were observed that year (Figs. ). Dredge efficiency may be reduced in the presence of these high densities, thereby inducing underestimates in the dredge survey.

Our results indicate that RK using GAM models, with a relatively short aggregation length, is an accurate and precise method to estimate HabCam data or other similar data sets. The stratified mean approach is also effective provided that the (post) stratification can be done precisely, for example if accurate stratification can be built into the survey design. Using OK alone is not recommended if there are substantial large‐scale trends in the data.

We thank the NEFSC Ecosystems Survey Branch, S. Gallager, A. York, R. Taylor, K. Boles, and N. Vine for collection of the HabCam data, D. Rudders and K. Stokesbury for providing dredge and drop camera survey data, and D. Hennen, M. Simpkins and an anonymous reviewer for their constructive comments on the manuscript.

None declared.