**Fig. S1** Relationship between observed and expected fractions of siblings in simulated populations.

**Fig. S2** Proportions of all pairwise relationships that are full siblings and half siblings, for three different mating models and random and non‐random samples.

**Fig. S3** Relationship between true *N*_{e} and *N*_{e} estimated from the parentage‐analysis‐without‐parents (

**Fig. S4** Relationship between *N*_{e} calculated using

**Fig. S5** Relative

**Fig. S6** Performance of the

**Appendix S1** Detailed methods.

Interest has surged recently in removing siblings from population genetic data sets before conducting downstream analyses. However, even if the pedigree is inferred correctly, this has the potential to do more harm than good. We used computer simulations and empirical samples of coho salmon to evaluate strategies for adjusting samples to account for family structure. We compared performance in full samples and sibling‐reduced samples of estimators of allele frequency (

Random sampling is a convenient theoretical and statistical construct. It is easy to implement in computer simulations but difficult or impossible to achieve in the real world. A truly random sample has two important features: *equal opportunity*—every individual in the focal population must have the same probability of appearing in the sample; and *independence*—the probability that an individual will be sampled does not depend on whether any other individuals are in the sample. A violation of the second criterion occurs when relatives are likely to be sampled together. Recent attention related to nonrandom sampling has focused on three key points:

*et al*. ; Goldberg & Waits ; Whiteley *et al*. );*et al*. ; Jones & Wang ; Almudevar & Anderson ).

Based on these three observations, many researchers (e.g. Hess *et al*. ) now routinely remove all but one member of a putative sibling group before using population genetic data in downstream analyses—an approach that is now is often regarded as ‘best practices’ (Peterman *et al*. ). At first glance, the premises seem sound and the conclusion seems logical—what could go wrong?

Actually, a lot can go wrong. At least three major problems can arise when one attempts to purge putative siblings from samples. (i) Siblings occur naturally in all populations at frequencies that are inversely related to effective population size (*N*_{e})—and this fact forms the basis of the sibship method for estimating *N*_{e} (Wang ). Indiscriminant removal of putative siblings thus risks erasing part of the evolutionary signal of small populations, making them look more like large or infinitely large ones. (ii) Even if sibling removal is effective in reducing the appearance of nonrandom sampling, it also reduces the sample size, which sets up an inevitable trade‐off with respect to precision and statistical power that should be formally considered. (iii) Methods for sibling inference are not infallible, particularly for identification of half‐siblings (HS) or other more distant relatives. Results of sibship reconstruction often differ depending on the method used (Ringler *et al*. ) and, for a given method, can differ depending on the type of markers used (Linløkken *et al*. ). Therefore, researchers interested in pursuing removal of putative siblings should consider the consequences of imperfect ability to identify relatives.

As far as we can determine, the only analyses for which sibling removal has been convincingly demonstrated to improve performance are Bayesian clustering methods such as *et al*. ), for which family groups can be mistaken for separate ‘populations’ (Anderson & Dunham ; Rodríguez‐Ramilo & Wang ). Whether this reflects a shortcoming of the method itself or rather a problem of interpretation is an open question. Rodríguez‐Ramilo *et al*. () found that some other clustering methods that do not make assumptions about Hardy–Weinberg equilibrium and linkage equilibrium were much less sensitive to family groups than

Because related individuals in a sample create a mismatch between the variance of estimated allele (or genotype or gamete) frequencies and the variance predicted under the assumption of random, independent sampling, it is clear that presence of siblings can bias results of statistical tests that compare the observed deviations in statistics (like allele frequencies) with their expectations under random sampling theory. Accordingly, researchers should remain vigilant about effects of siblings on statistical hypothesis tests. However, this does not necessarily mean that eliminating putative siblings from samples is a sound strategy in general. For example, although sibling removal has been advocated to improve point estimates of allele frequency (Goldberg and Waits ), it can actually reduce performance of the estimator. This potential downside to sibling removal indicates that it is risky to routinely remove putative siblings from genetic data sets without a clear understanding of the consequences.

Here, we use computer simulations to evaluate the practical consequences of removing siblings from both random and nonrandom samples, generated under several mating models. We focus on two single‐locus metrics (allele frequency and *F*_{ST}) and one‐two‐locus metric [linkage disequilibrium (LD) between pairs of loci, which is often used to estimate *N*_{e}]. Removing putative siblings before estimating *N*_{e} is less common than for other downstream analyses, but it does occur (e.g. Peterman *et al*. ). Results indicate that even if family relationships are known without error, removing siblings can degrade precision of estimates of allele frequency and *F*_{ST} and bias estimates of *N*_{e}. The results argue for caution by researchers who are tempted to try purging siblings from samples from real populations.

To better understand these simulation results, we next undertake an analytical treatment of the effect of sibling removal on allele frequency estimation. We consider sibling removal as a special (extreme) case of unequal weighting of the information from each individual in a sample. Use of such weighting schemes to obtain the best linear unbiased estimator (BLUE) for a parameter is well known in statistics, and we review McPeek *et al*. ()'s application of the BLUE approach for estimating allele frequency in the presence of related individuals. Using the same mathematical machinery, we obtain an algorithm to find the optimal sibling removal scheme. Finally, we assess performance of the BLUE approach and several strategies for sibling removal using multiple collections of highly related, juvenile coho salmon (*Oncorhynchus kisutch*). Our results indicate that, if the true family relationships are known without error, the BLUE approach produces an estimate with lower variance than is obtained either by any variation of sibling removal or use of the full (unweighted) sample; however, using the full sample is better than BLUE when sibling identification is not reliable. A method of partial sibling removal that leaves intact family groups no larger than two performed nearly as well as the naïve estimator when inferred pedigrees were unreliable and generally better than the naïve estimator when the inferred pedigree was correct.

The simulations had the following features. Modelled populations were closed to immigration and emigration, had discrete generations, a constant number (*N*) of mature individuals and separate sexes with an equal number (*N*/2) of males and females. Selfing was not allowed. After initialization, each population was allowed to reproduce for 10 generations, following which a sample of *S* individuals for genetic analysis was produced by the *N* parents in generation 10 (Table ). Genotypes at 100 diallelic (SNP) loci were tracked in each individual. For each parameter set, 100 replicate samples of *S* offspring were generated as described in more detail in Supporting information.

F1, F3, F9 are maximum family sizes for each pair of randomly chosen parents. F1 produces a random sample. For F3 and F9, family size at each reproductive event is drawn from an even distribution 2–3 or 2–9, respectively, and this produces samples with excesses of siblings.

In the mixed model, samples were randomly produced by the same process used for reproduction in generations 1–10, with the exception that sample size could exceed *N*.

Three mating systems were considered (Table ). In the monogamy and random models, reproduction in generations 1–10 was by Wright–Fisher ideal populations with and without monogamy, respectively (so *N*_{e} ≈ *N*). In the final generation, samples of *S* progeny were produced in two ways: (i) using the same two ideal mating systems (random samples), with the only difference being that *S* could be larger than *N*; or (ii) by allowing some pairs of parents to produce large families (to mimic nonrandom, family‐correlated samples). In the latter case, the effective number of parents that produced the sample (*N*_{b}) was less than *N*_{e}. In the third (mixed) model of reproduction, in generations 1–10, some pairs of parents were allowed to produce >1 offspring per mating episode, so variance in reproductive success was greater than random, leading to *N*_{e} < *N* each generation. Only random samples were taken in the final generation for the mixed model, but these also were drawn from parents for which *N*_{e} < *N*. Full details of these mating models and sampling of juvenile offspring can be found in Supporting information.

In addition to considering the full samples, we evaluated subsets of the samples from which some or all siblings were removed. Removal was based on the recorded pedigree and did not consider potential errors in sibling identification. In fractional removals, individuals from the full sample were evaluated one at a time to determine whether they should be included in the reduced sample. If the individual was a relative of any other individual already in the reduced sample, it was excluded with probability β. Values evaluated were β = 0.25, 0.5, 0.75, 0.9, 0.95 and 1. With β = 1, all but one member of each family was excluded; we refer to this as 100% sibling removal. In the random and mixed mating models, sibship exclusion was considered two ways: excluding all siblings or only full‐siblings (FS).

All simulations and data analyses were conducted in

FS, full‐siblings; HS, half‐siblings.

No HS are produced with monogamy.

Maximum family size = 9 siblings.

Maximum family size = 3 siblings.

*N*_{e} < *N* in the mixed mating model because larger families were produced than with random mating.

Parameters: α (probability of each pair producing >1 offspring) = 0.5; maximum family size = 9 siblings.

Sampling was random but the mixed mating model produced large families.

Parameters: α = 0.1; maximum family size = 30 siblings.

The random mating and mixed models produced three classes of offspring based on their one‐generation pedigree: FS, HS and unrelated (U). Only FS and U offspring were produced in the monogamy model. The proportions of siblings produced are expected to be simple functions of *N*_{e} and the mating model. This relationship, based on a simplification of eqn 10 in Wang () that ignores the (generally small) correction for departures from Hardy–Weinberg equilibrium, is as follows:*Q*_{HS} is the fraction of pairs that are HS (maternal and paternal HS combined) and *Q*_{FS} is the fraction that are FS. It can be shown (Ackerman *et al*. ) that eqn 1 yields the same estimate of *N*_{e} as the parentage‐analysis‐without‐parents approach of Waples & Waples (), which calculates inbreeding *N*_{e} based on the vector of numbers of offspring produced by each parent. Simple rearrangement of eqn 1 produces the expected frequencies of sibships:

Observed fractions of siblings produced in the simulations were tracked and compared with the expected fractions based on eqn 2, and realized *N*_{e} was calculated from the pedigree using the following equation from Waples & Waples ():*S* is the number of individuals in the sample and *k*_{i} is the number of sampled offspring produced by the *i*th parent. Realized effective size was calculated separately for each sex (*N*_{eM}, *N*_{eF}), and the overall *N*_{e} was calculated as *N*_{e} = 4*N*_{eM}*N*_{eF}/(*N*_{eM} + *N*_{eF}) (Wright ).

Analyses of simulated data focused on three metrics.

For each locus, true allele frequency (*TrueP*) was calculated as the mean frequency in the *N* parents from generation 10. For each replicate sample of offspring, estimated allele frequency (_{P} across all 100 loci. The consequences of sibling removal for allele frequency estimation were quantified as the relative RMSE_{P} (ϕ_{P}), which is the ratio of RMSE_{P} for the subsample and the full sample. Values of ϕ < 1 indicate that sibling removal increased precision of

We can also define an effective sample size in a manner analogous to the definition of effective population size. Let an ‘ideal’ sample be one in which individuals are drawn randomly and every individual provides completely independent information. For estimation of parental allele frequency using offspring, an ideal sample would be one containing *S* unrelated individuals, in which case the variance of

In this ideal sample, the effective sample size is the same as the sample size (ESS_{ideal} = *S*). For a nonideal sample, ESS is defined as the size of an ideal sample that would be expected to produce the observed value of _{P} for

Equation 5 was used to calculate effective sample sizes for the original samples (ESS_{full}) and the sibling‐reduced samples (ESS_{reduced}). ESS was calculated for each locus, and the harmonic mean across all 100 loci was used as the overall ESS.

For each parameter set, the 100 replicate daughter populations were divided into 50 pairs, and Nei's () *F*_{ST} was calculated for each locus as*H*_{S} is the average expected heterozygosity within the two populations and *H*_{T} is the total expected heterozygosity based on mean allele frequencies across the populations. For each pair of populations, an overall mean *F*_{ST} (*N* parents, and this was considered the parametric (true) *F*_{ST}. Unbiased estimates of *F*_{ST} (*S*) (Wright ; Chakraborty & Leimar ). Loci monomorphic in both samples were excluded. Across the 50 pairs of populations, and for each sibling‐reduced sample size, RMSE of *F*_{ST} was calculated as*F*_{ST} (_{P} described above.

True effective size under the monogamy and random mating models was *N*_{e} = *N* + 0.5 + 1/(2*N*) (Balloux ); these values are shown in Table , but for simplicity in the text only the whole number is used. In the mixed mating model, the possibility that a parental pair would be allowed to produce multiple offspring at each draw led to overdispersed variance in reproductive success and *N*_{e} < *N*. Realized *N*_{e} in the mixed model was calculated using eqn 3, which was also used to track the realized effective number of parents that produced the sibling‐reduced samples for all mating models.

For each sample, *N*_{e} was estimated using the LD method (Waples & Do ), and an overall harmonic mean

Estimating allele frequency by removing siblings can be thought of as giving a weight of ‘1’ to all unrelated individuals in the sample and a weight of ‘0’ to all but one sibling from each family. A more general approach would allow noninteger weights for each individual, based on information about their degree of relatedness to others in the sample. McPeek *et al*. () used this approach to derive a best linear unbiased estimator (BLUE) for allele frequencies when the sample consists of individuals of known relationship. Some programs for parentage analysis (e.g. Colony; Wang & Santure ) also use the inferred pedigree to update estimates of allele frequency. We use the same mathematical analysis framework to compute the effective sample size of any weighted or sibling‐removed sample, conditional on the pedigree connecting the sample members. To allow comparison of weighted and sibling removal methods for estimating allele frequency, we also developed a greedy algorithm to identify the optimal sibling elimination scheme, given the known family relationships among individuals in the sample (See Supporting information for details).

We evaluated performance of the estimators using 70 collections of juvenile coho salmon taken from streams in California and Southern Oregon (described in Gilbert‐Horvath *et al*. ). Each collection, which varied in size from 10 to 150 individuals (mean = 59), was genotyped at 95 SNPs and analysed with Colony (Version 2, Wang ; Wang & Santure ) to infer FS and HS. We then used the inferred pedigree to compute the effective sample size under five different weighting schemes: (i) naïve: full sample with no weighting; (ii) BLUE: weighting according to the best unbiased linear estimator; (iii) optimal‐z: the minimum variance sibling removal scheme found using our greedy algorithm; (iv) Yank‐1: HS were ignored, and individuals were reduced randomly from full‐sibships of size ≥ 3 until only one member of the family remained; (v) Yank‐2: identical to Yank‐1 except individuals were reduced randomly from FS groups of size ≥ 3 until only two members of the family remained. The Yank‐1 scheme was used in Garza *et al*. () and Gilbert‐Horvath *et al*. () for investigations into juvenile steelhead (anadromous *Oncorhynchus mykiss*) and coho salmon.

In the empirical evaluations under the ‘Sample Related’ scenario, we assumed that the inferred pedigree was correct and used it to compute the effective sample size for all five methods. To evaluate consequences of errors in sibling reconstruction, in the ‘Sample Unrelated’ scenario the gene copies carried by each member of each collection were permuted within loci (holding missing data positions constant), yielding for each collection a permuted sample in which there was no longer any relationship among the sample members. These permuted samples were analysed with Colony 2, weighting schemes were calculated using the Colony‐inferred pedigree, and effective sample sizes for all schemes were calculated using the fact that all scrambled sample members are effectively unrelated. This provides an indication of the performance of sibling elimination under pedigree reconstruction errors.

Results are summarized in Figs and S1–S6; see Tables S1 and S2 (Supporting information) for more details.

Observed frequency of siblings in the samples agreed closely with those expected from eqn 2 (Fig. S1, Supporting information). Although expected overall frequencies of siblings depend only on realized *N*_{e}, the mix of FS and HS depends on the mating system and the type of samples. Furthermore, the distribution of family sizes in the sample depends strongly on the sample size. Both random and family‐correlated samples showed a strong, positive relationship between maximum family size and the ratio *S/N*_{e} (Fig. ). For the same *S/N*_{e}, random samples had consistently smaller maximum family sizes. However, a random sample for a scenario in which sample size is large compared to effective size can have larger family sizes than are found in some nonrandom samples. Frequencies of FS and HS in the complete samples for the three mating models are shown in Fig. S1 (Supporting information).

Complete sibling exclusion reduced the final sample size by up to 90% or more when *S* was larger than *N*_{e} and all siblings were excluded (e.g. scenarios B and M; Fig. ). At the other extreme, in some scenarios with very small *S/N*_{e} and/or exclusion only of FS (e.g. Scenario X; Fig. ), the final sample sizes were reduced by <10%. Partial removal of siblings had predictably intermediate consequences for sample size.

Removing siblings from random samples of progeny generally reduced precision, such that RMSE_{P} was higher after sibling removal than it was for the full data set (Fig. , top). This effect was nonlinear, with RMSE_{P} rising faster for high levels of sibling removal. With 100% sibling removal, RMSE_{P} could be over twice as large as for the unpurged data set. In some scenarios, RMSE_{P} increased only very slightly with removal of siblings. This occurred when few siblings were produced in the first place (Scenario J) or most siblings were HS but only FS were removed (Scenario T) (Table ), but also in some mixed mating scenarios (e.g. Scenario N, Fig. ) where sampling was random but many siblings were produced. In one extreme scenario using the mixed mating model (Scenario R), where 10% of the randomly chosen pairs of parents were allowed to produce up to 30 FS, RMSE_{P} of

Surprisingly, very similar patterns were found in most scenarios with nonrandom samples: removing siblings increased RMSE_{P} (by up to ~50%) and reduced precision compared to the full samples (Fig. , top). In one case (Scenario V, random mating with family‐correlated sampling that primarily produced HS), removing 100% of the FS actually reduced RMSE_{P} by 3%.

In every scenario considered, regardless whether sampling was random or not, the ratio of effective size of the reduced sample (ESS_{reduced}) to the sibling‐reduced sample size (*S**) increased with more aggressive purging of siblings (Fig. , top). A positive relationship between ESS_{reduced}/*S** and the fraction of siblings removed indicates that removing siblings did not increase the variance in _{reduced}/ESS_{full} was <1, indicating that effective sample size after sibling removal was less than the original sample size (Fig. , bottom).

Results for estimation of *F*_{ST} largely paralleled those for estimation of allele frequency: in most scenarios with both random and nonrandom samples, removing siblings increased *F*_{ST}: purging all siblings led to almost a 10‐fold increase in

Harmonic mean *N*_{e} i scenarios with random sampling and random mating or monogamy (Fig. S3, Supporting information). In the scenarios with nonrandom sampling and in the mixed mating model (where sampling was random but mating was not), eqn 3 estimated realized *N*_{e} based on the pedigree for the samples, and *N*_{e} (Fig. S4, Supporting information). When sibling removal was less than about 80%, *N*_{e} from eqn 3. This reflects a slight upward bias in the LDNe version of the LD method, which has been documented elsewhere (e.g. Waples & Do ). With 100% sibling removal, the point estimate from eqn 3 is infinity (see eqn 2). Because sibling removal was probabilistic, on occasion all siblings were removed from samples by chance when the probability of removal was high but <1. Infinite estimates (recorded here as 99 999 for each sex) increased harmonic mean realized *N*_{e}, such that it generally exceeded that from LDNe under very aggressive purging of siblings (Fig. S4, Supporting information).

Removing siblings had the expected effect of increasing *N*_{e} = 500 and a small *S* = 40) did not produce many siblings, but removing the few siblings that did occur by chance produced an estimate that was over six times true *N*_{e}.

When sampling was nonrandom, estimates of *N*_{e} based on full samples were all downwardly biased, and *N*_{e} occurs somewhere along the continuum of fractional sibling removal. This means that (in theory at least) an unbiased estimate could be obtained from nonrandom samples by removing exactly the right fraction of siblings. This ‘sweet spot’, however, varied widely among scenarios. In Scenario F, where the sample was only moderately nonrandom, removal of half of the siblings produced an unbiased

Removing siblings from random samples always sharply increased relative *N*_{e} became largely unbiased, after which

Under the assumption that the inferred pedigree was correct (‘Sample Related’ scenario), application of the BLUE maximized ESS for each of the 70 collections of juvenile coho salmon (Fig. , top; Supporting information). The optimal‐z sibling removal strategy never decreased ESS and, in some collections, substantially increased effective sample size. ESS for the Yank‐1 strategy was sometimes smaller than, but more often larger than, effective sample size of the naïve estimator. For almost all collections, the Yank‐2 strategy produced a higher ESS than both the naïve estimator and the Yank‐1 strategy. This indicates that if there is large variance in family size, as appears to be the case in these collections of juvenile coho salmon, informed sibling removal can potentially improve estimates of allele frequency. In agreement with these theoretical results, when we applied the BLUE to simulated data for the scenarios in Table , in every case ESS based on BLUE‐weighted allele frequency was higher than ESS for the full sample (Fig. S6, Supporting information).

When genotypes of the juvenile samples were scrambled to mimic those of unrelated individuals, a very different result was obtained: except in the smallest collections, in which Colony correctly inferred everyone to be unrelated, using the BLUE or the optimal‐z strategy always reduced ESS compared to that of the naïve (and in this case, correct) estimator (Fig. , bottom). Reductions in ESS occurred in samples for which Colony erred by identifying spurious relationships. Colony is much more likely to spuriously identify HS relationships or pairs of FS than large, FS groups from data where no family structure exists. Across all 70 permuted coho salmon collections, Colony inferred no FS groups of size four or greater, only one FS group of size three and 193 pairs of FS (Table S2, Supporting information). The Yank‐1 and Yank‐2 are designed so that sibling groups of size < 3 do not trigger any removals, and as a consequence, they are relatively resilient to pairs of spuriously inferred FS. Accordingly, even using the incorrectly inferred pedigree, ESS for these two sibling removal strategies was identical to that of the naïve estimator in almost every collection; in the one exception, removing siblings using both the Yank‐1 and Yank‐2 protocols slightly reduced ESS.

Because the Yank‐2 strategy appeared to perform relatively well with the empirical coho salmon data sets, we evaluated its performance with simulated data for the monogamy and mixed mating models (the random mating model produces few FS so we did not evaluate Yank‐2 for this model). Full results are shown in Table S1 (Supporting information); representative patterns are illustrated in Fig. . In comparison with probabilistic sibling removal schemes that produce the same overall reduction in sample size (indexed by the ratio *S**/*S*), Yank‐2 generally increased precision of

Presence of family structure in a sample does not lead to systematic bias in estimation of parental allele frequency, in the sense that large numbers of siblings do not consistently lead to under‐ or overestimation of *TrueP*. However, for any given sample, the presence of siblings will tend to skew

Our simulation results provide two general insights on this issue. First, in every combination of mating model and sampling strategy that we evaluated, removing some or all siblings increased the ratio of ESS_{reduced} to *S**. If all individuals provided completely independent information about allele frequency, this ratio should not change with reductions in *S**. Therefore, this result confirms the partial redundancy of siblings with respect to estimation of *TrueP*. The second key result is that in most of the simulated scenarios, this partial redundancy was not strong enough to fully offset the loss of precision associated with reducing overall sample size. When sibling removal had any appreciable effect on performance, it always reduced precision of *TrueP*; it can easily make things worse.

The empirical example using collections of juvenile coho salmon provides important context for interpreting results of the simulated populations. The true pedigree of these samples is not known, but based on the biology of the species, the nature of the samples, and the estimated sibships from Colony, the samples contain a great deal of family structure. We showed that, under the assumption that the inferred pedigree is correct, noninteger weighting of individuals using the BLUE approach produces an estimate of parental allele frequency with lower variance than the naïve estimate or any method that removes entire individuals. On the other hand, our results also show that performance of the BLUE method can be worse than the naïve estimator when the inferred pedigree is not correct. Although it might be tempting to apply the BLUE or try to find the optimal removal scheme given the inferred pedigree of sample members, there are clear risks associated with that approach.

Two variations of sibling removal (Yank‐1 and Yank‐2) that have been used in some empirical studies performed well in analysis of the coho salmon samples: in most cases, they produced a higher ESS than the naïve estimator when the inferred pedigree was assumed to be correct, and in all collections but one they did not degrade the performance of the allele frequency estimate when the pedigree included spurious family structure. The Yank‐2 procedure consistently had a higher effective sample size than Yank‐1, suggesting that it might be a generally better option. This empirical coho salmon example provides a somewhat more optimistic picture of the effectiveness of sibling removal for estimating allele frequency than did the simulations. We verified this general pattern by incorporating the Yank‐2 procedure into the simulated data (Fig. , Table S1, Supporting information). We expect this result can be explained by two factors. First, the Yank‐1 and Yank‐2 methods only reduce families having three or more FS, whereas one of a pair of siblings could be removed during proportional sibling removal in the simulations. Second, in many of the coho salmon samples, Colony inferred not only some large FS families but also many unrelated individuals (see Table S2, Supporting information)—a feature that also characterized the one scenario in the simulations (Scenario R) for which aggressive purging of siblings did substantially reduce RMSE_{P}. Therefore, results for the empirical example support the premise from the simulations that removal of putative siblings is most likely to improve allele frequency estimation when the samples include a substantial degree of family structure and a large variance in family size. If every family is large, there will be little to gain by downsampling some of them. In the simulations, sibling removal degraded performance most in scenarios (e.g. A, B, S, U; Figs and ) where sample size was large compared to *N*_{e}, which tends to produce many large families. However, if there are only a few large families and many unrelated individuals, then the relative contribution of the large families to the estimated allele frequency can be brought into line by downsampling them (as in Scenario R). Plotting the distribution of inferred family sizes in empirical samples (see, for example, Whiteley *et al*. ) can be useful to researchers in this regard.

A caveat worth noting regarding the empirical example is that only one type of sibling misidentification was modelled (spurious identification of related individuals when all individuals are unrelated). The Yank‐1 and Yank‐2 approaches, although they are immune to spuriously inferred HS and FS groups larger than 3, could perform worse if FS groups larger than 3 were inferred. Given the genotype data that we had (95 SNPs), Colony almost never identifies such large sibling groups among totally unrelated individuals; however, it is possible that it could spuriously identify large FS groups among samples with many HS, or if fewer loci are used in the analysis. Thus, practitioners should be aware that our results do not provide information about how the Yank‐1 and Yank‐2 methods might perform in their own data sets when large family groups are inferred by unreliable pedigree reconstruction methods. These should be treated on a case‐by‐case basis. Tools for evaluating expected performance of sibling removal methods are available at

Our results for *F*_{ST} estimation largely paralleled those for allele frequency, which suggests that they might be generally applicable to other types of analyses that strongly depend on population allele frequencies, such as assignment tests and genetic mixture analysis—but that is only a conjecture that requires empirical evaluation. We did find that a wider range of scenarios showed improved performance of

Because using the BLUE increases precision of *F*_{ST} and related quantities have different sensitivities to effects of rare alleles, the number of populations exchanging genes and corrections for sampling error (Bhatia *et al*. ). Finally, in our simulations, the Yank‐2 method generally reduced precision in

Presence of siblings affects estimation of *N*_{e} differently (in two ways) than it does estimation of allele frequency. First, whereas family structure only affects precision of *N*_{e}. Second, whether samples are collected randomly or not has a much larger effect on *S* and *N*_{e} and whether all siblings or only FS are removed. Furthermore, the Goldilocks zone (an area around the sweet spot that represents ‘just the right amount’ of bias adjustment to produce a reasonable estimate) can be very narrow, such that the consequences of small errors in identifying the optimal fraction of siblings to remove can be harsh. For example, under Scenario W, an unbiased estimate can be achieved by removing about 95% of all siblings, but removing only 75% leaves *N*_{e} (Fig. ). Researchers interested in pursuing this option are faced with a chicken‐and‐egg conundrum: only if one knows the true effective population size can one determine the precise amount of sibling reduction that will produce an unbiased estimate of *N*_{e}.

We used the LD method to estimate *N*_{e}, but similar results can be expected for Wang's () sibship method and for ONeSAMP (Tallmon *et al*. ); the latter's approximate‐Bayesian‐computation program uses several summary statistics, but the most important signal is from LD. In the standard temporal method for estimating *N*_{e}, temporal *F* is a function of the variance of

We see no realistic way to distinguish random and nonrandom samples based on patterns of relatedness, even if family structure can be reconstructed with 100% accuracy. For any given array of sibling relationships that can be generated with nonrandom sampling, it is possible to imagine a random sample that could produce the same result. For example, consider Scenario R, in which most of the *N* potential parents produced no offspring that appeared in the samples, while one or a few lucky pairs produced large numbers of offspring. This is exactly the kind of result that could arise if offspring of a species with Type III survival are sampled at an early age before they have properly mixed (as could easily occur with many fish, marine invertebrates, amphibians, insects, etc.). On the other hand, this pattern of family structure (realized *N*_{e} was <2% of *N* for Scenario R; Table ) also conforms to predictions of Hedgecock's () hypothesis of sweepstakes reproductive success (reviewed by Hedgecock & Pudovkin ), which has been postulated to be responsible for a number of tiny estimates of the *N*_{e}/*N* ratio in marine species (Hedrick ; Hauser & Carvalho ; Waples ). A sample with this type of family structure could therefore represent an extreme case of nonrandom sampling, or an extreme case of naturally overdispersed variance in reproductive success that characterizes a novel and important evolutionary phenomenon. Without independent information about the nature of the samples, it generally will not be possible to distinguish between these two possibilities.

Identifying optimal ways for dealing with siblings in population genetic data sets is a complex problem with no single, one‐size‐fits‐all solution. Our evaluations of simulated and empirical data sets by no means represent a comprehensive evaluation of this topic, nor was that our intent. Nevertheless, several important points can be made.

*N*_{e} estimation that can be biased by excess family structure in nonrandom samples, it will be difficult or impossible to determine the optimal fraction of siblings to remove. Therefore, if one strongly suspects they have nonrandom, family‐correlated samples, a far better strategy is to go back and obtain a random sample. Of course, that is easier said than done in many natural populations.*P* in the entire population of *N* potential parents. But what if instead one wants to estimate *P* in the same generation that is sampled? Or what if one wants to estimate the *P* that characterizes the effective population that actually produces the next generation? In the latter case, the best strategy would be to take a large, random sample of progeny and weight all individuals equally, regardless of family structure. So different perspectives about what quantity we want to estimate can lead to different conclusions regarding handling of siblings. Researchers should give this some thought before considering various sibling adjustment options.

We appreciate useful comments by and discussions with Mike Ford, Rus Hoelzel, Daniel Ruzzante, Jinliang Wang, Ryan Waples, Andrew Whiteley and two anonymous reviewers. Genetic data for the coho salmon samples were kindly provided by John Carlos Garza and Libby Gilbert‐Horvath from the Southwest Fisheries Science Center.

R.W. conceived the study. R.W. and E.A. conducted the analyses and wrote and edited the manuscript.

Analyses of effective sample size for the coho salmon collections were carried out using an