Fig. _{e}) increases: (a) true _{e} = 10^{4} (_{e} = 10^{5} and (c) true _{e} = 10^{6}. _{e} increases.

_{e} = 10^{5} (_{e} = 10^{6}. ‘inf’ = infinity.

Theory and empirical estimates agree that the ratio of effective size (_{e}) to census size (_{e}/_{e}/_{e}/^{−3} or smaller, marine fish populations that are quite large could be at genetic risk. Based on a recently‐improved understanding of factors that influence _{e} and _{e}/_{e}/_{e}/_{e}/_{e}/_{e} is very large (≥10^{6}), a substantial fraction of point estimates of _{e}/^{−3} or smaller. These results mean that tiny, genetically‐based point estimates of _{e}/_{e}/

Marine species have long captured our imaginations, and this has been true of scientific investigations as well as the popular media. Even before the first large allozyme studies revealed dramatically higher‐than‐expected levels of genetic diversity in humans (Harris, ) and *Drosophila* (Hubby & Lewontin, ), genetic methods were being used to gain insights into ecological and evolutionary processes in marine fish populations (Frydenberg *et al*., ; Waples *et al*., ). Genetic studies of marine species have explored population structure (Cross & Payne, ; Burton & Feldman, ), larval dispersal and gene flow (Strathmann, ; Johnson & Black, ), natural selection (Tracey *et al*., ; Koehn *et al*., ; DiMichele & Powers, ; Gaffney, ), sampling and kinship (Hansen *et al*., ; Buston *et al*., ), fishery management (Ryman & Utter, ), unusual life histories (Koehn & Williams, ; Aarestrup *et al*., ), mating systems (Bierne *et al*., ; Bekkevold *et al*., ), fecundity and maternal age (Berkeley *et al*., ; Hixon *et al*., ), phylogeography (Grant & Bowen, ), speciation (Palumbi, ) and even needle‐in‐a‐haystack parentage assignments (Christie *et al*., ). These studies have led to many surprises and, collectively, greatly enriched the understanding of how natural populations function in the real world.

One important topic that remains controversial is whether marine species with high fecundity can have effective population sizes (*N*_{e}) that are many orders of magnitude smaller than the census size (*N*). Conventional evolutionary theory holds that the *N*_{e}/*N* ratio should not deviate too far from 0·5 and rather special circumstances are required to produce *N*_{e}/*N* as small as 0·1 (Nunney, ). Hedgecock (), however, proposed that, through a variation of Hjort's () larval mismatch hypothesis, *N*_{e}/*N* in highly fecund marine species could be very small if a typical year class of surviving offspring is not derived randomly from the huge number of adults, but instead from only a few families that by chance happen to produce eggs and larvae that end up at the right place and time to allow them to survive. This idea has been referred to as the sweepstakes reproductive success (SRS) hypothesis. Subsequently, a number of empirical studies using indirect genetic methods obtained estimates of *N*_{e}/*N* in marine species ranging from 10^{−3} to 10^{−6} or even smaller. This topic was reviewed by Hauser & Carvalho () and Hedgecock & Pudovkin (), and Hedrick () used some simple theoretical models to identify scenarios that could potentially produce very small *N*_{e}/*N*.

This study extends these previous analyses in two ways. First, the analytical models considered by Hedrick () are extended to account for age structure and overlapping generations, with the goal of identifying life history traits that can and cannot be expected to produce low *N*_{e}/*N*. Second, the conditions under which commonly used genetic methods can be expected to produce tiny estimates of the *N*_{e}/*N* ratio, even when *N*_{e} is large and *N*_{e}/*N* is close to 1, are evaluated. Finally, experimental procedures that can evaluate hypotheses regarding estimates of *N*_{e}/*N* in marine fishes are discussed.

Consider a marine species with a large population size (*N* ≥ 10^{6}). The focus is on scenarios where *N*_{e}/*N* is very small. For purposes of this study, a tiny estimate of *N*_{e}/*N* is defined as one that is ≤10^{−3}. The choice of which individuals to include in *N* can strongly affect the estimated *N*_{e}/*N*. The analyses below use the definition that is most widely accepted in the literature: *N* = the number of mature adults (Nunney & Elam, ). In species with fixed age at maturity, this can be calculated as the number in all age classes from age at maturity to the maximum age, *ω*.

Hedgecock () proposed that *N*_{e}/*N* could be arbitrarily small if only a small fraction of all *N* adults successfully reproduced (in this context, successful means production of at least one offspring that survive to be an adult). Let *N*_{p} be the number of these successful parents, so the focus is on scenarios in which *N*_{p} << *N*. Hedrick () quantitatively evaluated some simple scenarios of this type and here this idea is expanded using the parentage analysis without parents (PWOP) approach of Waples & Waples (). In the PWOP formulation, the standard discrete‐generation formula for inbreeding effective size is recast as:*k _{i}* is the number of offspring produced by the

If the population is stable, then the *N* total adults in generation 1 produce *N* adults in generation 2 and so on. For diploid species, each of the adults must on average contribute half the genes to each of two offspring, so overall *k _{i}* = 2

This analysis can be generalized by allowing *V*_{kp} to be any multiple (*α*) of the mean reproductive success: *Σ*(*k*_{i}^{2}) = 2*N*(*α* + 2*N*/*N*_{p}) and*N*_{p}/*N* is very small are of primary interest, in which case the 2*N*/*N*_{p} term in the denominator of equation will be very large, so the −1 term in the denominator also can be ignored, producing:*α* is very large, it also will be dwarfed by the other term in the denominator, which again will produce the result that *N*_{e}/*N* ≈ *N*_{p}/*N*. For example, even if *α* is as large as *i.e*. the variance in reproductive success is *N*_{e} is only reduced by 50%, so *N*_{e}/*N* ≈ 0·5*N*_{p}/*N*.

Hedrick () considered a variation of this scenario in which each of the *N*_{p} successful parents produced exactly the same (large) number of progeny. In this case, *α* = 0, which again produces *N*_{e}/*N* ≈ *N*_{p}/*N* from equation . Hedrick () also showed that considering a third class of parents (those who produce exactly two offspring each) leads to the following result:*y* is the fraction of all *N* adults that produce exactly two offspring. Using some numerical examples, Hedrick () showed that *N*_{e}/*N* remains quite low in the presence of this third class of parents unless they make up a large fraction of the population.

To summarise, under the discrete generation model, when a small number (*N*_{p}) of parents dominate reproduction, the *N*_{e}/*N* ratio will be close to *N*_{p}/*N* regardless how reproductive success is partitioned among the successful parents and regardless whether some other parents manage to contribute small numbers of offspring.

An important limitation of the above analyses is that they implicitly assume discrete generations and fail to consider age structure, whereas most species (and all of those for which very small *N*_{e}/*N* ratios have been reported), have overlapping generations. Overlapping generations and reproduction in >1 year or season (iteroparity) have some important consequences for small *N*_{e}/*N* ratios that have not been quantitatively evaluated before. After considering the general model developed by Hill (), which assumes a constant population size, stable age structure and independence of birth and death rates over time, the consequences of some violations of these assumptions are evaluated.

To account for overlapping generations, the discrete‐generation formula for *N*_{e} can be modified as follows (Hill, ):*N*_{1} is the number of offspring produced each time period, *T* is generation length and *V*_{k•} is lifetime variance in reproductive success of the *N*_{1} individuals in a cohort. Any age up to the age of first reproduction can be used to enumerate the individuals in a cohort, provided that reproductive success and *V*_{k•} are also based on production of offspring of that same age. To see the effect of age at maturity on *N*_{e}/*N*, consider two hypothetical species, species A with age at maturity = 1 year and species B with age at maturity = 1 + *z* years. Further, assume that both species have adult lifespan = *L*_{A} years, the same annual adult survival and the same pattern of age‐specific fecundity. Because species B delays onset of reproduction, generation length for species B will be higher by *z* years, while *V*_{k•} will not be affected. As a consequence *N*_{e}/*N* is higher in species B than in species A. In fact, *N*_{e}/*N* can be >1 in species that delay maturity for many years or reproductive cycles (Waples *et al*., ). Because interest here is on factors that can produce tiny *N*_{e}/*N* ratios, it is assumed that age at maturity = 1 and that *N*_{1} is the number of individuals in a cohort that survive to age 1 year. This does not mean that species with delayed maturity cannot have tiny *N*_{e}/*N*, just that it is a little harder than it is for species that mature at age 1 year.

Apart from age at maturity, the other key demographic traits that affect the *N*_{e}/*N* ratio are: adult lifespan or longevity, which is determined by the annual survival rate; age‐specific patterns of change in survival (*s _{x}*) and especially fecundity (

In age‐structured species, some individuals live longer than others and hence have more opportunities to reproduce. This increases lifetime variance in reproductive success; because *V*_{k•} appears in the denominator of equation , this (all else being equal) reduces *N*_{e} and *N*_{e}/*N*. Increasing the adult lifespan, however, also increases generation length (which appears in the numerator of equation ) as well as the number of adults in the population (which appears in the denominator of *N*_{e}/*N*), so all these factors must be considered jointly to assess overall effects on the effective size–census size ratio. Effects of longevity on *N*_{e}/*N* are isolated by considering a population that has constant fecundity with age and constant Φ* _{x}* = 1, which means that (for example) all age 7 year males have random reproductive success among themselves. This hypothetical population also has a constant adult survival rate that produces total life spans (and maximum ages,

Although many species (such as many birds and mammals) have vital rates that are approximately constant across their adult lifespan, the same is not true for most marine ectotherms with indeterminate growth. In these species, older individuals are larger and generally have higher fecundity; this increases the reproductive payoff for individuals that survive to reproduce many times and further increases *V*_{k•}. In addition, older females not only have more eggs, they may produce better eggs that have a higher chance of producing a viable offspring–the ‘big old fat fecund female fish’ (BOFFFF) hypothesis (Berkeley *et al*., ; Hixon *et al*., ). The same could potentially be true of males. To the extent that such effects occur, they would further enhance the reproductive disparities associated with increased longevity and fecundity that increases with age. Because only offspring that survive to age 1 year are considered, both increased number of eggs and increased survival of eggs for females of older ages can be accommodated by appropriate scaling of effective age‐specific fecundity.

One simple way to do this is to assume that effective fecundity (in terms of production of offspring that survive to age 1 year) increases linearly with age, which is not uncommon in long‐lived marine fishes (Fig. ). In the first evaluation of this general scenario, relative fecundity was assumed to be proportional to age, while the other vital rates were constant at *s _{x}* = 0·891 and Φ

If increasing fecundity with age creates larger disparities in lifetime reproductive success between those individuals that do and do not survive to old age, why does not this have a larger effect on the *N*_{e}/*N* ratio? The answer is straightforward: shifting more and more reproduction to older age classes also increases generation length. Because *T* appears in the numerator of equation and *V*_{k•} appears in the denominator, the effects on *T* and *V*_{k•} largely cancel each other, leading to only modest net changes in *N*_{e}/*N*.

This brings us to the final major factor that determines *N*_{e} and *N*_{e}/*N*: Φ* _{x}*. Φ

The final series of analyses evaluated how large Φ* _{x}* (assumed to be fixed) must be to produce very low

This effect can be illustrated using the above example with maximum age = 40, fecundity proportional to age and Φ fixed at 1. For this scenario, if *N*_{1} is set at production of 10 000 age 1 year recruits year^{−1}, adult *N* = 90 835, *T* = 15·5 years and *V*_{k•} = 14 · 4, leading to *N*_{e} = 37 760 and *N*_{e}/*N* = 0·416. Substituting into equation to evaluate effects of large Φ produces this:*N*_{e} ≈ 3·4 × 10^{−3}, in good agreement with the exact value shown in Fig. calculated using equation . It is easy to see that *N*_{e}/*N* in species with overlapping generations can be arbitrarily small if Φ is assumed to be arbitrarily large. A good order‐of‐magnitude approximation is the following:*N _{x}*) is very large. For long‐lived iteroparous species, Φ generally will be constrained to be ≤

A simple example illustrates the biological meaning of large values of Φ. Consider a large marine fish population that each year produces *N*_{1} = 10^{6} recruits that survive to age at maturity. If annual mortality is constant at *d* = 0·15, the total number of adults will be approximately *N* = *N*_{1}/*d* = 6·67 × 10^{6}, so the mean genetic contribution in each time period by all adults will be *N _{x}* =

Variable recruitment: although Hill's model assumes that population size is constant and age structure is stable, the method is robust to random demographic stochasticity (Waples *et al*., , ). Furthermore, Felsenstein () showed that his related model is still accurate if a population increases or declines at a steady rate. Some long‐lived marine species, however, have highly variable recruitment, with little or no successful reproduction in many years and occasional large pulses of strong recruitment. If successful recruitment occurs less frequently than once per generation, the population is not likely to be viable in the long term. Taken to a plausible extreme, therefore, this type of scenario can be evaluated by assuming that the population has a strong recruitment once per generation, with zero successful reproduction in the intervening years. But this is just a discrete‐generation model, with consequences as discussed above. Therefore, variable recruitment by itself is not likely to lead to tiny *N*_{e}/*N* ratios, although it could if the *N*_{p} individuals responsible for the successful recruitment are a tiny fraction of all adults.

Persistent individual differences: the assumption in Hill's model that survival and reproduction are independent across time is necessary to make the analysis tractable but unrealistic for many populations. Intuitively, *N*_{e} and *N*_{e}/*N* should be reduced if the same individuals are consistently good or bad at reproducing across multiple time periods, and Lee *et al*. () showed that this indeed is the case, although the effect was rather modest: in the most extreme scenario they considered, *N*_{e}/*N* was reduced by less than one order of magnitude (from about 0·5 to 0·1–0·2). Is this issue a weak link in the SRS hypothesis; *i.e*. is it necessary to assume that the same very few individuals are sweepstakes winners year after year after year? This seems highly implausible, given Hedgecock's () concept of the sweepstakes winners being the parents that (by luck) just happened to deliver their families of eggs or larvae to one of the few places in the ocean where they could survive and grow. This is not a serious limitation for the SRS hypothesis, however, for the following reason.

Consider two scenarios for a long‐lived species in which only a small number of parents successfully reproduce each year: first, the same *N*_{p} parents are successful every year across a generation; second, each year the *N*_{p} successful parents are randomly chosen from the population as a whole. In scenario 1 the effective size per generation will be approximately *N*_{p}, in which case the consequences for *N*_{e}/*N* are the same as they are in the discrete generation model. In scenario 2, assuming the total population of adults is very large, the successful parents that are randomly chosen each year are expected to be nearly or completely non‐overlapping (*i.e*. the chances that any individual will win the sweepstakes more than once is very small). In that case, the number of successful parents over a period of a generation (and hence the approximate effective size per generation) will be approximately *TN*_{p}. Given the typical range of generation lengths for long‐lived marine species (10–20 years or so), assuming a complete turnover of successful parents each year would only increase *N*_{e}/*N* by roughly one order of magnitude. This would not preclude tiny *N*_{e}/*N* values, provided that the fraction of successful spawners each year is sufficiently small.

Intermittent breeding: the converse of positive correlations between reproductive success of individuals across time is inverse correlations caused by intermittent or skip breeding, which occurs when energetic costs of reproduction (including any associated migrations) reduce the chances that an individual that reproduces in 1 year will reproduce in a subsequent year (Shaw & Levin, ). It is common for females of large mammal species to skip breeding for one or more cycles after giving birth, and the same can be true for many other species and, occasionally, for males as well. Although skip breeding reduces the number of adults available to breed in any given year [and hence can sharply reduce the effective number of breeders per year (*N _{b}*) in some species], this phenomenon serves to reduce lifetime

To summarise, all else being equal, increasing longevity reduces *N*_{e}/*N*, but by itself the effect is rather modest. Stronger reductions can occur if a long adult lifespan is coupled with fecundity and Φ* _{x}* that increase with age, but plausible scenarios of this type, even those that incorporate the BOFFFF hypothesis, are still unlikely to produce

In a computer model, it is easy to count the number of individuals in the population, but that is not the case in the real world, particularly for large marine populations that can include millions or even billions of individuals that are (at best) difficult to observe directly. Also, calculation of adult *N* ideally would account for the fraction in each age class that are sexually mature. Again, this is easy in a computer model but much more challenging in real‐world populations and this contributes uncertainty to estimates of *N*. Most of what follows focuses on estimates of *N*_{e}, but it is important to remember that estimating *N*_{e}/*N* involves separate estimation of two parameters, each of which presents major challenges for marine species.

All of the tiny estimates of contemporary *N*_{e}/*N* for marine species are from indirect genetic methods that use a genetic index that is expected to be a function of 1/*N*_{e}. Most of the estimates are from the temporal method (which requires two or more samples spaced in time) or the linkage disequilibrium (LD) method, which uses single samples. The respective genetic indices and their expected values are as follows:*S* is the number of individuals in a sample, *F* is the standardized variance in allele frequency between two temporal samples, *t* is the number of generations between temporal samples and *r*^{2} is the squared correlation of alleles at different gene loci. The basic approach of these moment‐based methods is straightforward: (1) develop theoretical expectations for contributions of drift and sampling error to the genetic index of interest (as in the equations above); (2) compute the overall index; (3) subtract from that the expected contribution from sampling error; (4) use the result to estimate *N*_{e} using a simple rearrangement of equation (temporal method; Waples, ) or equation (LD method; Hill, ).

Inspection of equations ( and ) makes it clear why estimating effective size in large populations is very challenging with indirect genetic methods. For a sample size that is common for marine species (*S* = 50), the contribution of random sampling error to the genetic index will dwarf the signal from drift unless true *N*_{e} is very low (Fig. ). Vastly increasing sample size to *S* = 5000 can substantially improve performance if true *N*_{e} is no larger than about 10^{4}, but even such large samples are ineffective in reducing the signal‐to‐noise ratio problem if true *N*_{e} is as large as 10^{6} (Fig. ). When *N*_{e} >> *S*, the crucial step in the estimation procedure is (3), because a small error in correcting for sampling error can have a huge effect on *N*_{e} = 10^{6}, after accounting for sampling error a user must find a mean value of *r*^{2} between 0·00000003 and 0·00000333. For estimating *N*_{e} in very large populations, this means that the genetic index must be based on very large amounts of data (samples of individuals or gene loci) to provide any hope of high precision.

These challenges can be illustrated using simulated data for populations with a wide range of true *N*_{e} values and applying the estimators and amounts of data that were commonly used to generate published tiny estimates of *N*_{e}/*N*. One hundred diallelic, single nucleotide polymorphism (SNP) loci were simulated in ideal populations of constant size *N*_{e} = *N* = 100 to 1 000 000 for a burn‐in period of five generations (enough to reach equilibrium for LD; Waples, ). An initial sample of *S* = 50 individuals was taken using Plan II sampling (Nei & Tajima, ; Waples, ) and a final temporal sample was taken after simulating an additional three generations of drift. The second temporal sample was also used for the LD estimates. For some scenarios, 1000 loci were simulated, or samples sizes of *S* = 200, 1000 or 5000 were used. Simulations using *N*_{e} = 10^{7} were also conducted for the temporal method only (memory limitations precluded generating such large matrices for LD analyses). Finally, infinite *N*_{e} was modelled by randomly choosing genotypes for LD samples (equivalent to having an infinite number of parents) and by taking two replicate samples from the same population for the temporal estimates. [Simulations were conducted in R (

Diallelic loci are easier to simulate than the microsatellites used in most of the published studies, but the 100 SNP loci modelled here provide approximately the same amount of information as 10 moderately variable microsatellites [the 11 microsatellite studies reviewed by Hauser & Carvalho () used a mean of eight loci]. Samples were used to estimate *N*_{e} using both the standard temporal method (using Nei and Tajima's estimator of *F*) and the bias‐corrected LD method (Waples, ); in addition, a combined estimate was computed as the harmonic mean of the two estimates.

For small populations (true *N*_{e} = 100), both methods are largely unbiased and have good precision, as has been reported elsewhere. Almost no estimates were below 50 and very few were above 200 [Fig. (a)]. The situation changes dramatically, however, if true *N*_{e} is 100 times as large (*N*_{e} = 10 000): about half the estimates from each method are infinitely large and most of the remainder are <25% of the true value [Fig. (b)]. This bimodal pattern to *N*_{e} is 10^{6}: over half the estimates are infinite, while most of the rest are in the hundreds or low thousands, downwardly biased by three to four orders of magnitude [Fig. (c)]. Almost no estimates are within an order of magnitude of the true *N*_{e}. It might be expected that combining estimates from the two methods would lead to greater precision, but there was little difference in performance of the combined and the LD estimates in this example.

Note that as true *N*_{e} gets larger, the lower estimates stay in about the same range (hundreds to low thousands with *S* = 50; Table and Fig. S1, Supporting Information). This means that a large fraction of point estimates of *N*_{e}/*N* can be expected to be increasingly small as true *N*_{e} becomes increasingly large. These results do not reflect a large systematic bias in the estimates; instead, they reflect the increasingly bimodal pattern of the estimates as effective size increases, such that virtually all estimates are either infinity or very small compared to true *N*_{e}.

It is interesting to note that this bimodal pattern also occurs with small‐sample estimates based on the true pedigree. The simulations kept track of the parents of offspring that appeared in the samples of 50 individuals each and *N*_{e} was estimated for each sample using equation . With true *N*_{e} = 10^{6}, one does not expect to find any siblings in a sample of only 50 offspring, and that was the case for 499 of the 500 replicates. When each offspring has two unique parents, Σ*k _{i}*

Fortunately, two approaches can help mitigate this rather gloomy prospect for obtaining reliable estimates of effective size and *N*_{e}/*N* when true *N*_{e} is large. First, all indirect methods for estimating *N*_{e} have ways of placing confidence intervals (*χ*^{2}‐distribution, as detailed by Waples (, ). As would be expected for methods that are largely unbiased, most of the small point estimates for *N*_{e} = 10^{4} and 10^{6} depicted in Fig. had upper bounds that included the true *N*_{e}. This illustrates the importance of considering not only the point estimates but also the upper bounds of the *e.g*. discrete generations, closed populations and selective neutrality), nor do they account for biases associated with the nearly impossible task of achieving a completely random sample from a large wild population. Although model misspecifications like these were not explicitly evaluated (collectively, they encompass an enormous parameter space), the likely consequences are easy to predict. Given that, when true *N*_{e} is large and all model assumptions are met, a large fraction of point estimates will be orders of magnitude smaller, even modest downward biases in the estimates could easily cause the upper *N*_{e}. Finally, *N*_{e}/*N* without also explicitly considering uncertainty in the estimate of *N* and the covariance of *N*_{e} and *N*, but that is seldom done.

To the extent that it is feasible, one can increase precision by obtaining more data (larger samples of individuals and genetic markers). Although the number of individuals and loci included in the simulations were comparable with those actually used in many of the studies that have reported tiny *N*_{e}/*N* (summarized by Hauser & Carvalho, ), it is now relatively easy to obtain many thousands of SNPs even for non‐model species, and considerably larger samples of individuals are possible for some species. With true *N*_{e} = 10^{6}, increasing sample size from *S* = 50 to 200 improved performance of the estimators somewhat (fewer estimates were in the hundreds), but the distribution of ^{4} and would produce estimates of *N*_{e}/*N* in the range 10^{−2}–10^{−3} [Table and Fig. S2(b), Supporting Information]. In some cases, very large samples (up to 5000 individuals or so) can be collected from marine species (MacBeth *et al*., ). With true *N*_{e} = 10^{6}, increasing sample size by two orders of magnitude (from *S* = 50 to 5000) also shifted the bulk of the finite estimates by two orders of magnitude (from 10^{2}–10^{4} to 10^{4}–10^{6}), with most of the remainder still being infinite (Table ). Under these conditions, no estimates for either method were as much as two orders of magnitude lower than the true effective size. When modelled *N*_{e} was infinitely large, however, the distribution of estimates using *S* = 5000 remained unchanged (Table ). This means that, even with very large sample sizes, tiny estimates of *N*_{e}/*N* can occur if true *N*_{e} is large enough.

Memory limitations precluded simulating >100 SNP loci with *N*_{e} = 10^{6}, but results for 1000 SNPs and *N*_{e} = 10^{5} also produced fewer estimates in the hundreds and slightly fewer infinite estimates [Fig. S1(a), Supporting Information]. Still, about half of the finite estimates were in the low thousands, which would produce estimates of *N*_{e}/*N* in the range 10^{−1}–10^{−2}.

It is important to note here that all of the simulations evaluating performance of genetic estimates of *N*_{e} used discrete generations and ideal populations in which *N*_{e} = *N*. Even so, all scenarios with large true *N*_{e} produced a large fraction of estimates of *N*_{e}/*N* that were orders of magnitude too small. But *N*_{e} will generally be <*N* in real populations, even without invoking extreme variance in reproductive success (Frankham, ; Palstra & Fraser, ; Fig. ). If, for example, *N*_{e} = 10^{6} and *N* = 10^{7} (so that true *N*_{e}/*N* = 0·1), then the ∼50% of estimates of *N*_{e} in Fig. (c) that fall between 10^{2} and 10^{4} would produce estimates of *N*_{e}/*N* in the range 10^{−3}–10^{−5} rather than 10^{−2}–10^{−4}.

In summary, genetic methods for estimating contemporary *N*_{e} are sensitive to a signal proportional to 1/*N*_{e}, which is very small for populations with large true effective sizes. As a consequence, when true *N*_{e} is large and only moderate amounts of data are available, the distribution of *N*_{e} gets. This means that when true *N*_{e} is ≥10^{6} and true *N*_{e}/*N* is ∼0·1, a large fraction (perhaps up to 50%) of the point estimates can be at least three to five orders of magnitude too small. The larger the population, the greater the scope for downward bias in the subset of finite

Accounting for iteroparity and overlapping generations shows that plausible patterns of change in age‐specific fecundity cannot by themselves produce tiny *N*_{e}/*N* ratios (*N*_{e}/*N* ≤ 10^{−3}); these are only possible if individuals of the same age and sex have greatly over‐dispersed variance in reproductive success (Φ* _{x}* ≈ 10

To date, little published information has been available regarding performance of estimators of contemporary *N*_{e} when effective size is very large. The largest effective sizes evaluated in some key papers were 100 (Nei & Tajima, ), 200 (Wang, ), 500 (Waples, ) and 1000 (Wang & Whitlock, ). Waples & Do () and Gilbert & Whitlock () evaluated *N*_{e} up to 5000 and Ovenden e*t al*. () and MacBeth *et al*. () simulated some scenarios with *N*_{e} = 10^{4}, but this is still orders of magnitude smaller than effective sizes that might characterize large marine populations. This range was expanded to arbitrarily large populations in the present study. Simulated data using samples of individuals and genetic markers comparable to those used in most published estimates of tiny *N*_{e}/*N* demonstrate that, when true *N*_{e} is large (10^{5}–10^{6} or higher), the distribution of *N*_{e}. If true *N*_{e} is large, genetic estimators have a characteristic sweet spot where almost all of the finite estimates land. For *S* = 50 and 100 diallelic loci, the range of this sweet spot is the hundreds to low thousands, and this range does not change appreciably no matter how large true *N*_{e} is (Table and Fig. S1, Supporting Information). Thus, this sweet spot also represents a blind spot with respect to the true *N*_{e}. Huge increases in sample size shift the sweet spot to higher ^{4}–10^{6} for *S* = 5000), but again this distribution of *N*_{e} is (Table ).

These results mean that tiny, genetically based point estimates of *N*_{e}/*N* in large marine populations are expected to be quite common, even when the true *N*_{e}/*N* ratio is normal (*c*. 0·1 or higher). Notably, this pattern of spuriously low estimates of *N*_{e}/*N* agrees almost exactly with the three characteristics Hauser & Carvalho () identified as typical of empirical estimates for marine species: first, most point estimates are in the hundreds or low thousands; second, resulting estimates of *N*_{e} are two to five orders of magnitude lower than estimates of *N*; third, the estimated *N*_{e}/*N* ratio decreases as population size increases (Fig. S1, Supporting Information). Of course, this does not mean than any particular published estimate of a tiny *N*_{e}/*N* is wrong. The fact that false positives for tiny *N*_{e}/*N* are expected to be quite common when true *N*_{e} is large, however, argues for considerable caution in interpreting genetically based estimates for large populations.

Results presented here have several practical implications for evaluation of estimates of *N*_{e}/*N* that are unusually small. First, this scenario is likely to produce publication bias toward small estimates, as noted by Hauser & Carvalho (), but the problem is not simply that small estimates of *N*_{e}/*N* are interesting and provocative, while normal estimates with *N*_{e} is small and even moderate amounts of data are available [Fig. (a)].

Because a large fraction of false‐positives for low *N*_{e}/*N* can be expected when using genetic methods to estimate effective size in large marine populations, careful attention must be paid to the upper bounds of *N*_{e}/*N* that are consistent with the point estimate. As noted above, however, the *N*_{e}, *N*_{e} is very large. Gradually, some of the potential biases associated with violations of simplistic assumptions of genetic methods for estimating contemporary *N*_{e} are being critically evaluated (Waples & Yokota, ; Waples & England, ; Neel *et al*., ; Waples *et al*., ; Gilbert & Whitlock, ; Wang, 2016) and it will be important to carefully consider these results in interpreting low estimates of *N*_{e}/*N*.

Analysis of the simulated data used the moment‐based temporal and LD methods, which are easy to calculate and which have been among the most commonly used methods to generate tiny estimates of *N*_{e}/*N* (Hauser & Carvalho, ; Hedgecock & Pudovkin, ). Likelihood‐based or approximate‐Bayesian‐computation (ABC) methods (Wang & Whitlock, ; Tallmon *et al*., ) have the potential to reduce biases and increase precision, but they have not been rigorously evaluated with very large populations; furthermore, most require one to specify an upper limit to *N*_{e}, which can be problematical for very large populations. Although the combined estimates computed here as a simple, unweighted harmonic mean of *N*_{e} (Waples, in press) and hence should provide largely independent information about effective size. The most robust results can be expected when multiple methods produce comparable estimates (Hauser *et al*., ). Unfortunately, results for parentage analyses based on the known pedigree show that, even if parents of each offspring can be assigned with complete certainty, relatively small samples from very large populations will not provide much useful information about *N*_{e} unless effective size is very small compared with *N*. This means that the single‐sample sibship method (Wang, ) is not likely to be useful for evaluating populations with large *N*_{e}, although it could help confirm events of SRS in which the individuals sampled could include many siblings.

Increasing the number of genetic markers can help improve precision, but only to a certain degree (Table and Fig. S2, Supporting Information). An important caveat applies to evaluating the benefits of using many thousands of genetic markers, as it is now possible to obtain even for non‐model species. In a computer it is easy to simulate arbitrarily large numbers of loci that assort independently and hence provide non‐redundant information (up to 1000 such loci were modelled here). In the real world, however, loci must be packaged into a relatively small number of chromosomes, and physical linkage creates dependencies among the markers that reduce the overall information content. This means that increasing the number of loci by a factor of 10 [as in Fig. S2(a), Supporting Information] will not increase precision by the same proportion. This is an important topic that merits more rigorous study, but preliminary results (Jones *et al*., ; Waples *et al*., 2016) indicate that, for the LD method, the effective number of loci (in terms of information content) can be much lower than the actual number. Therefore, although using large numbers of SNP markers will increase precision (Hoffman *et al*., ), this by itself is unlikely to solve all problems associated with estimating *N*_{e} in large populations. A rigorous evaluation of this issue will require conducting simulations using linked markers in populations with very large *N*_{e}.

Empirical studies to test the hypothesis of tiny *N*_{e}/*N* are tricky, for several reasons apart from statistical behaviour of the genetic estimators. Results presented above demonstrate that, after accounting for life‐history traits typical of many marine species (iteroparity and long adult lifespan), it remains that case that tiny *N*_{e}/*N* require some type of SRS, whereby only a small fraction of adults successfully reproduce in any given season. The converse is not necessarily true. Not all types of sweepstakes reproduction produce tiny *N*_{e}/*N*, it depends on the spatial and temporal scale on which SRS occurs. In some cases, SRS can produce chaotic genetic patchiness (Johnson & Black, ; Broquet *et al*., ) without permanent population structure or small overall *N*_{e}. Selkoe e*t al*. () and Buston *et al*. () suggested that any effects of SRS are likely to be ephemeral and disappear when individual cohorts are integrated into the population as a whole. Whether this is true, however, depends on the nature of the SRS. As demonstrated above, if entire cohorts of a long‐lived species are consistently produced by SRS (*e.g*. if Φ* _{x}* is consistently very high for all ages of adults), then

The concept of *N*_{e} applies most directly to a full generation in a single, completely isolated population. Life histories of many marine species pose a major challenge in this regard. Many marine fishes (and some marine invertebrates) are highly vagile as adults and many have long larval stages that provide opportunities for dispersal. As a consequence, marine populations are often ill‐defined spatially and in at least some cases better fit one or two‐dimensional isolation‐by‐distance models with continuous distributions than they do models with semi‐discrete subpopulations. If one wants to draw inferences about *N*_{e} and *N*_{e}/*N* for a metapopulation rather than a single isolated population, then spatially varying productivity could affect the result (as proposed by Turner *et al*., ), but that scenario is beyond the scope of this paper to consider in detail.

Hedgecock () proposed some tests of the sweepstakes‐reproduction hypothesis, and these have been discussed by many subsequent authors (Selkoe *et al*., ; Hauser & Carvalho, ; Taris *et al*., ). One of the predictions is that genetic diversity within larval cohorts should be reduced (and LD increased) compared with adults. As subsequently noted by Hedgecock & Pudovkin (), this qualitative prediction is tricky to evaluate. Robust tests must await a quantitative treatment that fully accounts for all the sampling issues involved and the different expectations for different measures of genetic diversity (*e.g*. number of alleles *v*. heterozygosity). Another of Hedgecock's () predictions, that genetic differences among cohorts should be large compared with samples from the adult population, has more direct relevance to evaluation of *N*_{e}/*N* ratios. In iteroparous species, if effective size actually is relatively small, then the magnitude of allele frequency differences among cohorts can be used to estimate *N*_{e}, using the method of Jorde & Ryman () and extended by Jorde ().

The most robust way to quantitatively evaluate the hypothesis that SRS leads to tiny *N*_{e}/*N* ratios is to implement a sampling programme that combines both spatial and temporal replication (akin to the Lagrangian and Eulerian frames of reference discussed by Hedgecock & Pudovkin, ). The spatial scale should be broad enough to identify population boundaries (if they exist) and to account for effects of immigration. Providing the appropriate temporal dimension to the data collected is likely to be more challenging, because at least two temporal components must be considered. First, many marine species spawn over extended periods of time (and tropical species can spawn throughout the year), so it is important to have a way to integrate all reproduction events across each season. Second, the strongest evidence for tiny *N*_{e}/*N* attributable to sweepstakes reproduction will be to demonstrate temporal stability of geographic patterns across multiple generations. If these patterns are not dynamically stable, the signal could be one that reflects ephemeral patterns of reproduction of local groups of parents (chaotic genetic patchiness) rather than small effective size of the entire population across a generation. Evaluating this will be difficult in long‐lived species unless historic samples (*e.g*. archived scales) are available.

Finally, point estimates of *N*_{e} and *N*_{e}/*N* for marine species can be evaluated in the context of the species' life history and other genetic analyses, as suggested by Hedgecock & Pudovkin (). *N*_{e}/*N* cannot be tiny unless individuals are capable of producing at least thousands of offspring that survive to age at maturity, so tiny estimates for species with low fecundity would be unlikely to be valid. If *N*_{e} of a large marine species really is in the hundreds to low thousands and this pattern has persisted through time, it should be reflected in low overall genetic diversity, shallow coalescent structure and a star phylogeny of DNA sequences.

I appreciate the invitation to prepare this manuscript for the symposium edition of the Journal. I thank S. Sogard for useful information about BOFFFFs and M. Hare, L. Hauser and two anonymous reviewers for useful comments on an earlier draft.