**Appendix S1.** Further details on preferential sampling model development and simulations.

**Appendix S2.** Additional details on preferential sampling simulation results.

*Erignathus barbatus*) in the eastern Bering Sea. As expected, models with a preferential sampling effect led to lower abundance than those without. However, several lines of reasoning (better predictive performance, higher biological realism) led us to prefer models without a preferential sampling effect for this dataset.

Ecologists often fit statistical models to data from surveys of natural populations (e.g. plants, animals) to explain variation in species distributions and to predict presence or abundance in unsampled areas. Such models may serve as the basis for testing ecological theory (Hilborn & Mangel ; Austin ), for directed conservation and management (Williams, Nichols & Conroy ; Nichols & Williams ), or for monitoring biological diversity (Yoccoz, Nichols & Boulinier ; Pereira & Cooper ). Depending on the application, data quality can vary widely, ranging from scientific survey data to opportunistic records of species presence reported by the public.

Where possible, researchers often use sound survey design principles (e.g. randomization) to ensure unbiased inference when implementing scientific surveys. For instance, under protocols such as simple, systematic, or stratified random sampling, design‐based estimators of average density or presence‐absence over large survey regions are guaranteed to be unbiased and avoid making assumptions about the factors influencing density or distribution (Cochran ; Buckland, Goudie & Borchers ). However, spatial resolution is typically poor for such estimators, limiting the breadth of ecological and management questions that can be addressed. Further, for logistical reasons, many ecological surveys simply cannot afford to be this rigorous in their design. For example, even the North American Breeding Bird Survey uses roads and other non‐random features that necessitate use of alternate estimation procedures (Sauer & Link ).

Over the last decade, there has thus been a huge surge in research describing model‐based procedures for estimating abundance, density, or presence‐absence from population survey data. In contrast to design‐based procedures, model‐based procedures make explicit assumptions about how populations vary over space, both in terms of explanatory variables and often in terms of distribution. For instance, in such applications, it is common to use habitat or environmental covariates together with spatial effects (e.g. via trend surfaces or spatial random effects) to predict density or distributions across the landscape. This approach can be useful for fine‐scale management, identifying range shifts, and comparing the fit (or lack thereof) of alternative models that express different ecological hypotheses (Austin ). We shall refer to the amalgam of model‐based approaches for making spatially explicit inference about natural populations as ‘SMPDs’. Ultimately, the quality of inference from SMPDs depends on several factors, including (i) how well the fitted model approximates reality, and (ii) the quality of the data (Buckland, Goudie & Borchers ).

The radiation of model‐based approaches has proceeded at different rates and pathways in different ecological modelling subfields, which are often linked to the data being modelled. For instance, species distribution models (SDMs; Elith & Leathwick ) are often used to predict species presence as a function of habitat variables and encompass both statistical (e.g. generalized linear or generalized additive models) and machine learning algorithms. The term SDM also connotes analysis of presence‐only data (as available from museum collection or citizen science programs for instance). In the field of population assessment, statistical models are almost exclusively used, often with observation models tailored to the sampling approach (e.g. presence‐absence surveys, distance sampling, quadrat samples, point counts). In this paper, our perspective is one shaped more heavily by the population assessment literature, though we expect our general approach to be applicable to a wider variety of SMPDs.

One of the main advantages of using SMPDs is that one can potentially use data from non‐randomized designs or opportunistic sampling to make inferences about animal populations (Johnson, Laake & Ver Hoef ). Several authors have shown that non‐randomized designs can have poorer predictive ability than randomized designs when used to estimate species distributions (e.g. Edwards *et al*. ; Albert *et al*. ), but that the optimal design may depend on the question of interest. For instance, the optimal design may differ depending on whether the inferential goal is to maximize predictive ability or to estimate the relationship between species presence and important habitat covariates (Albert *et al*. ).

Less precise predictions may be a sacrifice one is willing to make if it is the only cost incurred when using data at non‐random locations to make inferences about species distributions. After all, such data are often all that are available for certain taxa and locations (e.g. from citizen science programs), and are certainly less expensive to acquire than data from randomized, scientific surveys. However, recent research suggests that one may be opening oneself up to bias as well, which is potentially much more concerning than loss of precision. In a recent paper, Diggle, Menezes & Su () emphasized that spatially explicit statistical models can easily provide biased estimates when sampling disproportionately targets locations where the response of interest is higher (or lower) than expected given a particular set of explanatory covariates. In the context of SMPDs, this might occur if sampling disproportionately occurs in locations where animals are known to be present or of high abundance. For example, if volunteer inventory participants have access to multiple sites with similar covariate values, bias might arise if they consistently choose sites where species are thought or known to be present. Bias might also arise if survey effort is higher near bases of operations or features such as roads, particularly if animal abundance is higher (or lower) near these locations than elsewhere in the landscape. In our view, this phenomenon, termed ‘preferential sampling’, is potentially much more problematic, because estimates of animal abundance and occurrence from SMPDs could be severely overstated.

In this article, we explore potential for bias in SMPDs resulting from preferential sampling (hereafter, PS) and describe several model‐based approaches for detecting and correcting for such biases. We start by describing a common currency for notation and basic model structures considered in this paper. Second, we review PS bias in a mathematical light, and describe prior approaches to coping with its effects. Third, we introduce a novel generalization of previously proposed PS models, allowing the investigator to jointly model animal encounter data and the locations chosen for sampling, including possible dependence structure between these two types of observations. Fourth, we conduct a simulation study to examine the performance of traditional SDMs and our newly developed PS model when data are gathered preferentially. Finally, we demonstrate our modelling approach by analysing aerial survey counts of bearded seals (*Erignathus barbatus*) in the Bering Sea.

We focus here exclusively on discrete space (areal) models for population survey data, although we note that PS is likely to affect analyses similarly regardless of the choice of spatial domain. We suppose that the investigator intending to fit a SMPD to survey data breaks their study area up into *S* survey units (label these *U*_{1}, *U*_{2}, …, *U*_{S}), of which *n* are sampled (call the set of sampled locations *i* is assigned a vector of covariates, **x**_{i}, and an indicator *R*_{i} that takes on the value 1·0 if location *i* is sampled (i.e. if *f*():*Z*_{i} denotes the state variable of interest (e.g. occupancy or abundance), *g*() is a link function (e.g. probit or logit for occupancy, log for count data), and μ_{i} is a link‐scale intensity value which can itself be written as a function of habitat covariates, regression coefficients, and spatially autocorrelated random effects. Spatially autocorrelated random effects are often included to allow populations to vary smoothly in space and to account for residual patchiness not explained by predictive covariates (Legendre ; Lichstein *et al*. ). In applications described in this paper, we write the intensity as_{0} is an intercept parameter, **x**_{i} is a row vector of *m* predictive covariates associated with site *i*,** β** = {β_{1}, β_{2}, …, β_{m}} is a column vector of *m* regression parameters, and **δ** = {δ_{1}, δ_{2}, …, δ_{S}} are spatially autocorrelated random effects. For occupancy, *f*() would typically be Bernoulli, while the Poisson or negative binomial are typically choices for analysis of count data; common forms for δ_{i} include geostatistical specifications (Cressie ; Diggle, Tawn & Moyeed ), Gaussian Markov random fields (e.g. conditionally autoregressive models; Rue & Held ), or low rank alternatives such as predictive process (Banerjee *et al*. ; Latimer *et al*. ) or restricted spatial regression models (Reich, Hodges & Zadnik ; Hughes & Haran ).

The model for *Z*_{i} describes variation in the process of interest and is often described as the ‘process’ model. However, it is usually impossible to conduct a complete census of all individuals present in a surveyed location, so it is customary to include an observation model describing incomplete detection. For occupancy (presence/absence) studies, the response variable is *Y*_{i} = 1 if the species of interest is detected and is 0 otherwise, and is modelled with a Bernoulli distribution (Royle & Dorazio ):*p*_{i} is possibly a function of survey and observer specific covariates (e.g. visibility, observer experience). Replicate surveys of the same sampling unit provide the necessary information to estimate *p*_{i} (MacKenzie *et al*. ). For count surveys, a possible model is*Y*_{i} now represents the count of animals obtained while surveying unit *i*,* A*_{i} denotes the proportion of sample unit *i* that is surveyed, and *p*_{i} gives detection probability. Additional information will often be needed to estimate *p*_{i} in this context, such as data from double observers, distance observations, or double sampling (see e.g. Buckland *et al*. ; Royle, Dawson & Bates ; Borchers *et al*. ; Conn *et al*. ).

For the remainder of this treatment, we use bold symbols to denote vector‐valued quantities or matrices. We also use ‘|’ to specify that a distribution is conditional; for instance, if [**Y**] denotes the marginal probability mass function for **Y**, then [**Y**|**Z**] represents the conditional distribution of **Y** given **Z**. We use **μ** and **ν** to indicate the link‐scale process of interest and the logit of the probability of sampling, respectively. For population abundance, this would mean [*Z*_{i}] = *f*(exp(μ_{i})), whereas the distribution of sampling locations can be represented as [*R*_{i}] =*f*(logit^{−1}(ν_{i})), where μ_{i} and ν_{i} can be written as functions of regression coefficients, habitat covariates, and spatially autocorrelated random effects. We use the notation *Z*_{i} when describing the process model in general terms, but often switch to the conventional notation *N*_{i} when animal abundance is the explicit focus of interest.

Design‐based estimation requires that surveys adhere to a pre‐planned survey design selected probabilistically from an underlying sampling frame to ensure unbiasedness (Cochran ). Although randomization can reduce potential for bias, researchers often view SMPDs in a different light, where the data gathering procedure is not important as long as the models being fitted are formulated well and produce reasonable predictions (see Discussion for more on this topic). Under this view, investigators can reallocate effort if weather or logistics preclude surveying in a desired location and still generate model‐based estimates of abundance or occupancy. This can be a crucial advantage in surveys covering large areas with frequent inclement weather. It also opens the door for using presence‐only, citizen science, or opportunistically collected data for estimation.

However, the manner in which effort is allocated can potentially have profound influence on SMPD estimator performance. Preferential sampling (PS) arises when the locations being sampled and the process of interest (e.g. density, occupancy) are conditionally dependent given modelled covariates (Diggle, Menezes & Su ). For instance, PS can occur when the investigator tends to sample more in places where a species has already been observed, or is known to be abundant. Diggle, Menezes & Su () showed that this type of PS can lead to bias when this extra information is not included in models for the state variable of interest. Specifically, PS arises when we consider the set of sampled locations as stochastic and when [**R**, **Z**|**x**] ≠ [**R**|**x**][**Z**|**x**], where **R** is an indicator vector whose elements *R*_{i} are 1·0 if sampling unit *i* is sampled and are zero otherwise (see e.g. Fig. ). We use this definition of PS throughout the rest of the manuscript, noting that it is somewhat different than has sometimes been used in the SDM literature. For instance, Merckx *et al*. () use the term ‘preferential sampling’ to refer to the process of visiting some sites more often than others, while Manceur & Kühn () define it as occurring when the locations selected for sampling are a function of an environmental covariate. Neither of these latter conditions is problematic outside of the specialized field of presence‐only modelling.

Diggle, Menezes & Su () demonstrated PS with an environmental monitoring problem, whereby pollutant monitoring stations were more highly clustered around urban areas with high concentrations of pollutants than in rural areas with comparably low levels of pollutants. Fitting simple geostatistical models without fixed effects led to positively biased estimates of landscape‐level pollutant concentrations. Presumably (and as noted by discussants of the article) including a fixed effect associated with a relevant covariate (e.g. a development index) would likely reduce or eliminate bias. However, the primary point of Diggle, Menezes & Su () is well taken: inclusion of spatially autocorrelated random effects in a statistical model is insufficient to remove the potentially biasing effects of PS.

As in the pollution example, having good explanatory covariates may also reduce bias when fitting SMPDs to population survey data under PS (e.g. Fig. ). However, in many ecological applications, predictive covariates explain only a small portion of variation present in the data. If the locations selected for sampling are a function of some unmodelled factor related to abundance (intentionally or unintentionally), bias may still occur. Despite the clear potential for bias in SMPDs, there are few examples where PS (sensu Diggle, Menezes & Su ) is discussed with regard to SMPDs. One exception is Chakraborty *et al*. (), who acknowledged the likely presence of PS when fitting SMPDs to data obtained using non‐randomized designs. However, they did not attempt to account for PS in their models.

In design‐based sampling, unequal sampling intensity is often accommodated via stratification or unequal probability sampling, as with Horvitz–Thompson‐like estimators where the probability of inclusion varies by sampling unit (Cochran ). However, in the case of PS, this inclusion probability also depends on the value of the response associated with the sampling unit. Evidently, any approach to account for PS should also account for the dependence between the process of interest and the locations chosen for sampling.

Several authors have attempted model‐based corrections for PS in the statistical literature. For Gaussian models on a continuous spatial domain, Diggle, Menezes & Su () and Pati, Reich & Dunson () jointly modelled the process of interest and the locations chosen for sampling. In particular, they expressed sampled locations as an inhomogeneous Poisson point process where the underlying log‐scale intensity at a given location depended linearly on the process of interest at that location. For instance, writing observations of the spatial process (random field) at a location *i* as*i* (κ_{i}) could be written as_{i} and ξ_{i} can both be written as a function of predictive covariates. Here, the parameter *b* describes the level of PS; *b* = 0 implies no PS, *b* > 0 implies a greater level of sampling in locations where the spatial process (e.g. population density) is high, and *b* < 0 implies greater sampling where the spatial process is low. Importantly, when explanatory covariates are used in models for μ_{i} and ξ_{i}, Pati, Reich & Dunson () show that accounting for preferential sampling ‘... is only necessary when there is an association between the spatial surface of interest and the sampling density that cannot be explained by the shared spatial covariates’ (see e.g. Fig. ). Pati, Reich & Dunson () also consider a simpler, plug‐in based estimator in which log sampling density (specifically, a two dimensional kernel density estimate) is used as an additional fixed effect in eqn 5, finding that this approach helped reduce bias associated with PS, but did not perform as well as the full joint model.

The models considered by Diggle, Menezes & Su () and Pati, Reich & Dunson () are a useful first step in addressing and modelling PS. However, they are somewhat limited since they are specific to continuous spatial domains, continuous data (as opposed to presence/absence or count data), and Gaussian error distributions. Also, they require the linear predictor of the PS model to be written as a simple linear function of the spatial process. In real world applications, we can envision cases where sampling is strongly preferential in certain areas of the landscape, and not in others. For instance, sampling may be more strongly preferential close to bases of operations, (e.g. landing strips in the case of aerial surveys), but less so in areas that are harder to get to.

Given these limitations, our present task is to generalize PS models to the types of data more typical of SMPDs, and to allow the degree of PS to vary across the landscape. Like Diggle, Menezes & Su () and Pati, Reich & Dunson (), we impose a joint model for the process of interest (abundance or occurrence) and the locations chosen for sampling. For the abundance process model, we start with eqn 1 as a general formulation for non‐Gaussian data, writing the link‐scale expectation as in eqn 2. Next, recalling that *R*_{i} is a binary indicator taking on the value 1·0 if survey unit *i* is sampled, and is 0·0 otherwise, we model *R*_{i} using a Bernoulli distribution:*h*() denotes a link function appropriate for binary data (e.g. logit, probit). We then write the intensity for this model as**β**^{*}) and spatially autocorrelated random effects (**η** and **δ**). The predictive covariates **x**_{i} from eqn 2 and **δ** are included in both eqns 2 and 8, allowing for dependency in the two models, with the matrix **B** describing the strength and type of dependence between sampling intensity and the process of interest. The spatially autocorrelated random effects **η** are assumed independent of the **δ**.

The formulation in eqn 8 is similar (but not identical) to one proposed by Royle & Berliner () for hierarchical multivariate models with spatial dependence. There are multiple ways of structuring **B** depending on the complexity of spatial dependence desired for the PS process (Royle & Berliner ). For instance, setting **B** = **0**_{S×S} (i.e. to an all‐zero matrix) corresponds to an absence of spatial dependence (and thus no PS). Setting **B** = *b***I**, where *b* is an estimated parameter and **I** is an (*S*×*S*) identity matrix corresponds to the linear PS model suggested by Diggle, Menezes & Su () and Pati, Reich & Dunson (). Alternatively, we could allow the degree of PS to vary across the landscape. For instance, one could contemplate a trend surface model for PS by specifying a matrix for **B**, with diagonal entries given by*B*_{ij} = 0 for *i* ≠ *j*. Here, *b*_{0}, *b*_{1}, and *b*_{2} are estimated parameters and lat_{i} and long_{i} give latitude and longitude, respectively (Royle & Berliner ). To our knowledge, the identifiability of higher order models (such as trend surfaces) has not been investigated.

A comparison of the performance of models with different sets of constraints on **B** can serve as a test of PS. In particular, if one can demonstrate that models with **B** = **0** perform similarly or better than models with **B** ≠ **0**, then PS is likely not worth modelling and inference can proceed using standard SMPDs.

To illustrate PS and demonstrate that our proposed model has reasonable performance, we conducted a small simulation experiment. For each of 500 simulations, we generated abundance of a hypothetical species over a 25 × 25 grid as*i* indexes survey unit *i*, and μ_{i} is determined according to eqn 2. Abundance was generated as a function of a single spatially autocorrelated landscape covariate, as well as residual spatial autocorrelation (δ_{i}) and overdispersion (Fig. ). Specific details of data generation procedures are provided in Appendix S1, Supporting Information.

For each simulated landscape we generated three virtual count surveys using eqs. 7 and 8. Each survey had **β**^{*} = *η*_{i} = 0 (that is, no covariate or spatially autocorrelated random effects on the probability of sampling), but differed in how the matrix **B** was parameterized. In the first, we set **B** = 0, so that surveyed locations were randomly selected (i.e. independent of abundance). For the second and third, we set **B** to be a diagonal matrix with entries *b* = 1 and *b* = 3, respectively, so that the probability of sampling a given survey unit (grid cell) was explicitly dependent on the spatial abundance process in that unit. We refer to these scenarios as moderate and pathological PS, respectively (see Fig. ). Simulations were configured so that *n* = 50 of the 625 survey units were sampled; each survey covered half of each target cell.

We fitted two different models to each count dataset. In the first model, the elements of **B** in eqn 8 were all set to zero. In this case, sampling intensity is assumed to be independent of underlying abundance, as implicitly assumed by most SMPD in the ecological literature. In the second model, we included an explicit connection between the abundance distribution and the sampling process by setting **B** = *b***I**, where *b* is an estimated parameter, and **I** is an identity matrix. For ease of exposition, we set *p*_{i} = 1·0 for all simulations (i.e. all animals present were assumed to be detected). Each model included the same habitat covariate that was used to generate the data as a log‐linear predictor of abundance.

We applied our modelling technique to counts of bearded seals obtained on aerial transects flown over the eastern Bering Sea from 10–16 April 2012 (Fig. ). These counts were gathered as part of a larger survey designed to estimate abundance of four species of ice‐associated seals; the survey is described in greater detail elsewhere (Conn, *et al*. ). The survey area considered here consists of 25 × 25 km grid cells bordered to the north by the Bering Strait, to the west by the International Date Line, to the south by maximal April ice extent, and to the east by Alaska. Here, we limit counts to those gathered within a one week period so that relative abundance will remain relatively constant throughout the study area. Our primary focus in this application is to diagnose PS (rather than to estimate absolute abundance). As such, we do not attempt to correct for nuisance processes such as incomplete detection or species misclassification, which requires models of increased sophistication (Conn *et al*. ).

Our choice to model bearded seal counts, as opposed to one of the other seal species, is based on the observation that bearded seal densities tend to be highest in the northern portion of the study area. This is also the location of one of the primary airports used to prosecute surveys (Nome, AK, USA). Although we would not anticipate a large effect, higher survey coverage in areas of high bearded seal density could potentially lead to positive bias in apparent abundance owing to PS.

To test for such an effect, we conceptualized bearded seal counts as arising according to the formulation*a*_{i} gives the proportion of grid cell *i* that is composed of salt water habitat, *A*_{i} defines the proportion of salt water habitat in grid cell *i* that is sampled, and μ_{i} is defined in eqn 2. We modelled the grid cells that were chosen for sampling, using eqns (7 and 8).

We fitted a total of six models (*M*_{cov = 0,b = 0}, *M*_{cov = 0,b = 1}, *M*_{cov = 1,b = 0}, *M*_{cov = 1,b = 1}, *M*_{cov = 1,b = ts}, *M*_{cov = 1,b = ts}) to bearded seal count data using the same estimation framework as in the simulation study. Models varied by (i) whether or not habitat and landscape variables were used as predictors of bearded seal density (cov = 1 and cov = 0, respectively), and (ii) the form of PS (*b* = 0 indicates no PS; *b* = 1 indicates **B** = *b***I**;* b* = *ts* indicates a trend surface specification for PS, see eqn 12). When habitat and landscape variables were included, we used the following log‐linear predictors for the abundance process: linear and quadratic functions of sea‐ice concentration, linear and quadratic functions of depth, distance from land, and distance from the southern ice edge. In contrast, distance from the Alaska mainland was the only predictor used for sampling intensity (**ν**). Remotely sensed sea‐ice data were obtained at a 25 × 25 km resolution from the National Snow and Ice Data Center, Boulder, CO, USA, as described by Conn *et al*. (). All measurements were made relative to grid cell centroids. Models for μ_{i} and ν_{i} both utilized spatially autocorrelated random effects with a Matérn covariance function (see Appendix S1 for further details).

To compare and contrast models, we calculated marginal AIC (having marginalized over spatial random effects; see below) and mean square prediction error. Prediction error was calculated, using a Monte Carlo cross validation procedure. For a total of 100 replicates, we randomly selected 40/394 counts to be withheld for testing. Estimation proceeded using the remaining 354 counts for each of the six estimation models; mean squared error of observed and predicted counts was used as the scoring function for calculation of prediction error with each test dataset (Hooten & Hobbs ). Note that in each cross validation analysis that we used *R*_{i} values from the full dataset (i.e. count predictions were made assuming these cells had been sampled).

We used maximum marginal likelihood to conduct statistical inference, where the random effects (**η** and **δ**) are integrated out of the likelihood. For increased numerical tractability, we also used the result that the marginal distribution of a Poisson‐binomial mixture is itself a Poisson distribution to formulate the likelihood. In particular, we replaced the binomial observation model (eqn 4) with the following unconditional specification:*i* and attendant standard errors were then estimated with respect to

We used Template Model Builder (TMB; Kristensen *et al*. ), interfaced with the R programming environment, to conduct maximization. The TMB software uses a Laplace approximation to integrate out random effects (**η** and **δ**), and a bias correction algorithm (Tierney, Kass & Kadane ; Thorson & Kristensen ) to obtain abundance estimates and standard errors that properly account for nonlinear transformations of random effects. This approach resulted in a facile implementation and fast computing times, allowing us to conduct simulation and model testing with greater efficiency than would have been possible with Bayesian simulation. Further detail on statistical methods are provided in Appendix S1; requisite R and TMB code has been archived (Conn, Thorson & Johnson ) and is also available at

A substantial proportion of simulation replicates included models that were unstable (i.e. either did not converge or had *b* estimated on a boundary; Appendix S2). In particular, 00·6% of spatial models fit with *b* fixed to zero were unstable, compared to 20·5% when *b* was estimated and the true value of *b* was either zero (no preferential sampling) or 1·0 (moderate preferential sampling). Under pathological preferential sampling (*b* = 3·0), 78·8% of simulations failed to produce reliable results when *b* was estimated. Here, we only present results for cases where models produced stable estimates.

Estimates of cumulative animal abundance across simulated landscapes were relatively unbiased (−4% for the canonical SMPD and 5% for the joint model) in the scenario where survey locations were selected randomly (i.e. independent of abundance; Fig. ). Under this scenario, abundance estimates had higher variance when *b* was estimated. Under moderate PS (*b* = 1), estimation of the PS parameter *b* continued to result in unbiased estimates of abundance (−1%), while the canonical SMPD model ignoring preferential sampling had a mean bias of 48%. Under pathological PS (*b* = 3), the canonical SMPD model was extremely biased (mean bias of 159%) while the joint model had a relatively small positive mean bias of 6% (Fig. ). Monte Carlo error on bias of abundance estimates was 1–2% for each scenario. In contrast to abundance, the slope of the regression parameter relating abundance to the simulated habitat covariate was unbiased for all scenarios (Appendix S2).

In models where one was estimated, the preferential sampling effect was large and positive (e.g. *M*_{cov = 1,b = 1} model). An estimate of this magnitude implies that we effectively sampled all locations where bearded seals were present, which seems biologically implausible given that survey effort was allocated to maximize spatial coverage rather than to target specific pockets of high bearded seal density. Further, these point estimates well exceeded estimates from the simulation study where moderate or even pathological levels of preferential sampling were employed. Thus, despite better marginal AIC scores for models with a preferential sampling effect than without (Table ), we regard results of the joint preferential sampling model as unreliable in this example. This view is further supported by examination of cross‐validated mean square prediction error, which suggests better predictive performance of models without a preferential sampling effect (Table ). Given these findings, we suggest that researchers employ multiple metrics to diagnose the existence of PS, with more weight given to cross validatory predictive performance than AIC (see Discussion for further suggestions on diagnosing PS).

Predictive performance was roughly similar for models with and without landscape and habitat covariates (Table ). However, this does not necessarily indicate that such covariates are ecologically unimportant, as spatial random effects and habitat effects can be confounded in spatial regression models (Hodges & Reich ).

Finally, we note that estimates of bearded seal abundance in individual cells can become negative when the preferential sampling effect is strongly positive, using our estimation approach. This is because the bias correction algorithm (Tierney, Kass & Kadane ; Thorson & Kristensen ) used to correct for the nonlinear transformation of log‐scale abundance does not prevent estimates from becoming negative. *Post hoc* experimentation with different priors on the preferential sampling parameter *b* indicated that the frequency and magnitude of negative estimates declined as estimates of *b* decreased towards biologically plausible values. This phenomenon could be avoided altogether by conducting a fully Bayesian analysis.

In this study, we showed that preferential sampling can have a profound impact on the quality of estimates (e.g. population abundance) when sampling is non‐randomized. In simulations, naive estimators were increasingly positively biased as PS increased. When PS was present, we were able to substantially reduce bias by conducting estimation under a framework where the state variable of interest and the sites chosen for sampling were jointly modelled, using a dependent covariance structure. In absence of PS, simulations indicated that models with a PS effect produced less precise estimates. Thus, there was a real cost for attempting to account for PS within a model‐based framework. Fortunately, PS only seemed to be a problem if density or abundance is of interest. Estimates of slope parameters governing the relationship of relative abundance to habitat and landscape parameters did not appear to be affected (Appendix S2).

Bias attributed to PS may seem counterintuitive, especially given the maxim in survey sampling to allocate more effort to strata for which population density is high (at least when the goal is to maximize precision of population estimates). For instance, in large‐scale line transect surveys under stratified sampling, the optimal amount of effort that should be allocated to stratum *s* is *A*_{s} is the area of *s* and *D*_{s} is the anticipated density (Buckland *et al*. ; eqn 7.7). Thus, there are theoretical reasons to sample more in high density areas than in low‐density areas. One solution is to sample more in locations where covariates suggest abundance will be higher; in this case, sampling will not be preferential as long as the covariate used to select sampling locations is also used as an explanatory covariate in the SMPD (Fig. ). Another approach is to account for variation in sampling intensity with explanatory covariates or *post hoc* stratification. However, it is not always clear how to perform *post hoc* stratification when effort is allocated in a subjective manner.

When applied to bearded seal count data, models with a preferential sampling effect produced considerably lower abundance estimates than models without a PS effect (Table ). However, such estimates were biologically unreasonable despite being favored by AIC (Table ). In particular, estimates of preferential sampling parameters were so extreme as to indicate we had visited all survey units with substantial numbers of bearded seal. This was clearly unreasonable given the manner in which survey effort had been allocated, which sought to maximize spatial coverage while ignoring the expected spatial distribution of animals at finer scales. Given this finding, we caution against using likelihood‐based model selection diagnostics for choosing between models with and without preferential sampling effects. Metrics based on predictive ability (e.g. cross‐validated mean square prediction error) seem more useful for this purpose; in our bearded seal example, models without a preferential sampling effect had better predictive ability (Table ). We also caution ecologists to be aware of biologically implausible values for *b* was not sufficient to remedy instability (a post hoc model with a Normal(0,1) prior on *b* still produced implausible estimates).

We fitted models to bearded seal data where the degree of PS changed over the landscape, as Royle & Berliner () suggested might be possible (but never demonstrated) for multivariate spatial models. Such models led to similar inferences to models with a simple, linear effect for PS. However, given the potential for PS model instability, we currently suggest limiting initial consideration to models with a single, estimated *b* parameter as a composite adjustment to abundance or occupancy. Further research into the viability of including spatial variation in the **B** matrix certainly appears warranted. It would also be worthwhile to investigate whether the reliability of inference varies as a function of data type (e.g. binary vs. count data), survey type (e.g. line transects vs. point counts), and whether parameters remain identifiable when nuisance processes such as incomplete detection are modelled.

The models we have developed here are specific to spatial models with discrete support, as when data are gathered at the plot level, or aggregated prior to analysis. However, it should be possible to extend our approach to continuous space. One approach would be to model sampling locations as realizations from a spatial point process in a manner similar to Warton & Shepherd (). Another possible extension would be to consider models for the sampling process where sampling occurs without replacement for a fixed sample size. For instance, the Bernoulli sampling model makes the implicit assumption that sample size is random. If, instead, a fixed number of locations are sampled, the Bernoulli model is somewhat misspecified. Our simulations suggest some robustness to this misspecification, as the Bernoulli model performed reasonably well when sampling was conducted with a fixed sample size (Fig. ). A more precise treatment would require extending the hypergeometric distribution to have variable inclusion probabilities when formulating the sampling model; this extension is non‐trivial.

We have focused here on selection of survey units at a rather coarse scale. If survey units are large relative to the area that is sampled, ecologists also need to pay attention to how sampling is conducted within survey units selected for sampling. In particular, the areas visited within selected sample units should be representative (or random) with respect to available habitat within the survey unit or additional bias can result (e.g. Fig. )

Our conception of PS is related, but not equivalent to ‘sample selection bias’ (e.g. Phillips *et al*. ) in presence‐only models. In such models, species absences are never directly observed. To draw inference about space use, it is thus necessary to produce a background sample representing the range of locations and habitats that could have been sampled. Sample selection bias then results if the characteristics of sites selected for sampling (e.g. by a volunteer or museum collector) differ systematically from the assumed background sample. In our case, we use PS to refer to the case where absences are available, but where the probability of sampling is dependent on some unknown factor that is also related to abundance or presence of the target species.

Model‐based approaches to estimation of abundance or occurrence have become popular in recent years. We (the authors) have noticed a tendency for analysts to assume that inclusion of spatial covariates or random effects into predictive models will make the underlying sampling design ignorable. Although this may be reasonable for some goals (e.g. describing relative influence of different environmental and habitat predictors), we have shown that estimates of absolute quantities (abundance, density, or occupancy) may be biased under preferential sampling. We have also shown that it is possible to diagnose and adjust for preferential sampling by jointly modelling dependence between the data collection mechanism and the process of interest (e.g. abundance or occupancy). However, such models can be considerably less precise and have greater instability than models without a preferential sampling parameter. Where possible, we suggest that survey planners incorporate design‐based elements (e.g. random or systematic sampling) into their survey designs to increase robustness and reduce the need for model‐based triage.

P.B.C. led the project and conceived of the study. All authors contributed to the formulation of preferential sampling models and to study design. J.T.T. coded initial estimation routines in TMB. P.B.C. edited estimation code and implemented simulation and bearded seal analyses. P.B.C. drafted the paper with contributions from J.T.T. and D.S.J. All authors contributed to manuscript revisions.

We thank M. Kery, J. Laake, B. McClintock, B. O’Hara and several anonymous reviewers for comments that helped strengthen this paper. Funding for aerial surveys was provided by the U.S. National Oceanic and Atmospheric Administration and by the U.S. Bureau of Ocean Energy Management. Views expressed are those of the authors and do not necessarily represent findings or policy of any government agency. Use of trade or brand names does not indicate endorsement by the U.S. government.

R scripts and bearded seal survey data necessary to recreate analyses have been collated and archived in an R package, PrefSampling (Conn, Thorson & Johnson ), which has been assigned NMFS InPort ID 45879 (see