This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.

**Appendix S1.**

The ‘opportunity for selection’ has a long but somewhat controversial history in evolutionary and population ecology. Here, the authors discuss the use/misuse of the concept and provide a clear guide to the issues, which straddle the border between ecology and evolution, for a general audience.

In a 1958 paper about natural selection in humans (Crow, 1958), James Crow showed that the squared coefficient of variation in absolute fitness $W$, equivalent to the variance in relative fitness $w$ ($w=W/\overline{W}$), places an upper limit on the rate of evolutionary adaptation (see also O'Donald, 1970):

$I$ is a relevant parameter in the context of eco‐evolutionary dynamics and population responses to environmental change, insofar as it constrains the degree to which evolution can shift trait distributions and increase mean fitness $\overline{W}$ (and hence population growth). However, $I>0$ is a necessary but not sufficient condition for selection on traits other than fitness itself. Selection on a given trait only occurs when some of the variation in fitness is caused by variation in the trait (Figure 1). Demographic stochasticity can account for a large and variable fraction of $I$ (van Daalen & Caswell, 2019), so making inferences about the strength or drivers of selection based on the variance in fitness alone is problematic. This wider point has been made before in the context of sexual selection (Jennions et al., 2012), but the arguments and subtleties—which apply to natural selection in general—might not be familiar to ecologists. Our goal in this paper is therefore to guide empiricists on these issues. We first briefly review evolutionary theory on $I$. We then use simulations and empirical data on great tits to illustrate challenges relating to the interpretation of $I$ and related metrics, and finish by offering some take‐home messages.

From an evolutionary perspective, $I$ is best measured via lifetime fitness (e.g. number of new‐borns produced per new‐born across its lifetime), as this will account for life‐history trade‐offs. However, $I$ can be computed for any fitness component (e.g. survival, mating success, fecundity; Equation (1) applies to any fitness variable $W$) and is then interpreted as the scope for selection via that component alone. We concentrate on survival and annual reproductive success in our examples, but for more involved treatments that partition variance in lifetime reproductive success in various ways, see (Arnold & Wade, 1984; Waples, 2022a; Waples & Reed, 2022), and references therein.

$I$ is an important parameter in evolutionary biology because the selection intensity on any given character cannot exceed $\sqrt{I}$ (Arnold & Wade, 1984), while the rate of evolution of fitness itself cannot exceed $I$ (Crow, 1958; O'Donald, 1970; van Daalen & Caswell, 2019). Traits affecting fitness can then also evolve if they are heritable. The selection differential $S$ on a focal trait $Z$ is defined as the covariance between $w$ and $Z$, or equivalently as the mean of the trait values weighted by $w$, minus the mean unweighted trait value:

When *n* traits affect fitness, a multiple regression equation can be used to describe their independent linear effects; that is, $w=1+{\beta}_{1}{Z}_{1}+{\beta}_{2}{Z}_{2}+\dots {\beta}_{n}{Z}_{n}+\epsilon $, where the $\beta $ coefficients correspond to selection gradients and $\epsilon $ is a noise term (Lande & Arnold, 1983). Nonlinear terms can be added but are ignored here for simplicity. ${i}_{z}$ for each trait can then be expressed as the sum of a direct component plus one or more indirect components owing to phenotypic correlations with other traits. For example, with two traits we have: ${i}_{z1}={\beta}_{1}^{\prime}+{\beta}_{2}^{\prime}{\rho}_{1,2}$ and ${i}_{z2}={\beta}_{2}^{\prime}+{\beta}_{1}^{\prime}{\rho}_{2,1}$ (where ${\beta}_{1}^{\prime}$ and ${\beta}_{2}^{\prime}$ are variance‐standardised selection gradients; Lande & Arnold, 1983). Note that unlike $\mid {i}_{Z}\mid $, $\mid {\beta}_{Z}^{\prime}\mid $ can exceed $\sqrt{I}$ because the indirect component of selection can be opposite in sign to the direct component. The statistical contribution of each trait to the overall variance in relative fitness is then given by the product ${i}_{z}{\beta}_{Z}^{\prime}$, which equals ${{i}_{z}}^{2}$ when indirect selection is absent (Moorad & Wade, 2013).

Two lines of evidence suggest the lion's share of $I$ might be attributable to stochastic variation in fitness. First, quantitative genetic studies have estimated that the heritability of fitness ${h}^{2}\left(w\right)$ (fraction of $I$ explained by additive genetic effects) is low, typically on the order of a few per cent (Bonnet et al., 2022; Hendry et al., 2018). Second, even if many traits are under selection, they collectively might still explain relatively little of $I$. Assume, for argument's sake, that 10 independent traits are under linear selection and ${i}_{z}=0.16$ for each (the median selection intensity reported by Kingsolver et al., 2001). The total contribution to $I$ is then 10*(0.16^{2}) = 0.26; overall $I$ is then this plus some stochastic component. The stochastic component will vary within and across studies, but one way of estimating the typical magnitude of overall $I$ is to make use of the equation $I=\frac{{I}_{A}\left(w\right)}{{h}^{2}\left(w\right)}$, where ${I}_{A}\left(w\right)$ is the additive genetic variance in relative fitness, equivalent to the evolvability of fitness (Hansen & Houle, 2008). Using the median values for ${I}_{A}\left(w\right)$ and ${h}^{2}\left(w\right)$ of 0.10 and 0.03, respectively, reported by Bonnet et al. (2022), we have $I=\frac{{I}_{A}\left(w\right)}{{h}^{2}\left(w\right)}=\frac{0.10}{0.03}=3.33.$ Our 10 traits would thus together explain only 8% ($100*\frac{0.26}{3.33}$) of $I$, the rest being attributable to stochastic variation in fitness. This toy example of course makes a lot of assumptions, but it illustrates how stochastic effects can dominate $I$ even with 10 traits under moderate selection.

A major limitation when comparing $I$ across different contexts (populations, years, fitness variables) is that it is highly sensitive to mean fitness $\overline{W}$ (Downhower et al., 1987), as ${\overline{W}}^{2}$ features in its denominator (Equation (1); Figure 2). No selection might occur on any trait in any environment, yet $I$ can still vary simply because $\overline{W}$ and/or ${\sigma}_{W}^{2}$ (demographic stochasticity) varies. If selection does occur, it can be stronger in harsher, or human‐disturbed, environments where $\overline{W}$ is lower (Fugère & Hendry, 2018; Hunter et al., 2018; Reiss, 2013), because $\overline{W}$ features in the denominator of the selection differential (which can be reformulated in absolute fitness terms as $S=\mathit{cov}\left(W,Z\right)/\overline{W}$). Stronger selection is not guaranteed at lower $\overline{W}$, however, because $\mathit{cov}\left(W,Z\right)$ might also be lower when $\overline{W}$ is lower (e.g. if fitness reductions are proportional across individuals; Fugère & Hendry, 2018). In general, ecological changes can affect $S$ via changes in the mean or variance of the trait distribution, or via changes in the intercept or slope of the trait‐absolute fitness relationship (Hunter et al., 2018).

Even if selection on a given trait varies in intensity as $\overline{W}$ changes, $I$ and ${i}_{Z}$ might still be only weakly coupled. Multiple traits will typically be under selection, and if selection intensities on each are uncorrelated (or weakly correlated) across environments this effectively adds noise to the expected positive relationship (Figure 3) between $I$ and ${{i}_{Z}}^{2}$ for any given trait. The stochastic component of $I$ will also be highly variable in magnitude. Its expected value will be some function of $\overline{W}$, owing to mean–variance scaling for non‐Gaussian fitness variables such as survival and reproductive success (Figure 2). Its realised magnitude, for a given $\overline{W}$, will also vary randomly around this expectation (Section 4; Figure 2). Systematic and/or random variation in the magnitude of demographic stochasticity thus render relationships between $I$ and explanatory variables of interest vulnerable to misinterpretation (Jennions et al., 2012).

With binary fitness variables like survival, the variance equals $p\left(1-p\right),$ where $p$ is the survival rate. The opportunity for viability selection (${I}_{M}$, where the *m* stands for mortality) is then given as ${I}_{M}=\frac{p\left(1-p\right)}{{p}^{2}}=\frac{1-p}{p}$ (Crow, 1958). For a given $p$, ${I}_{M}$ will always be the same, regardless of whether there is a nonrandom (e.g. trait‐determined) component to mortality or not, because latent variation in a Bernoulli variable is unobservable. If all one has is data on survival, there is no way to correct ${I}_{M}$ for its dependence on mean survival (Figure 2) to try to get at the nonrandom component.

The opportunity for fecundity selection ${I}_{F}$ can in theory be corrected for its dependence on (population or sample) mean fecundity. Let $k$ be the number of offspring produced per parent in a single reproductive bout, or across a full breeding season, that survive to the age at which juveniles are counted. An adjusted ${I}_{F}$ can then be computed as $\u2206{I}_{F}={I}_{F}-1/\overline{k}$ (Waples, 2020). Note that here we are defining ‘fecundity’ broadly as the product of zygote number and offspring survival to the stage at enumeration. We acknowledge that assigning offspring survival as a component of parental fitness is problematic from a quantitative genetic theory perspective (Thomson & Hadfield, 2017) but argue that computing $\u2206{I}_{F}$ for such ‘mixed fitness’ measures still is useful as an overall index of potential for selection to be acting. Under a Wright–Fisher model—a mainstay of classical population genetics— $k$ is approximately Poisson distributed when reproductive success is purely random (i.e. ${{\sigma}_{k}}^{2}\approx \overline{k}$). The expected null value of ${I}_{F}$ is then $1/\overline{k}$, so subtracting this quantity from raw ${I}_{F}$ not only corrects for the dependence on $\overline{k}$, but also for the magnitude of demographic stochasticity expected under Wright–Fisher reproduction, under which $E\left(\u2206{I}_{F}\right)=0$ (Waples, 2020). Thus positive $\u2206{I}_{F}$ values suggest, but do not prove, that something interesting might be going on. Assuming a Poisson distribution in the null case will not always be warranted: $k$ might be underdispersed (${{\sigma}_{k}}^{2}<\overline{k}$) or overdispersed (${{\sigma}_{k}}^{2}>\overline{k}$) in some species/populations even if reproductive success is purely random (Kendall & Wittmann, 2010; Waples & Reed, 2022). Thus, negative, or positive, $\u2206{I}_{F}$ values might be expected absent any selection, and deviations from a given expectation can also occur by chance (Figure 2; Section 4).

Given these limitations, interpreting absolute values of $\u2206{I}_{F}$ is problematic, but comparing relative values across contexts where $\overline{k}$ varies can be informative. Regardless of the magnitude of overdispersion expected under a given null model, raw ${I}_{F}$ is expected to be higher when $\overline{k}$ is lower, even if selection is always absent. The same is not true of $\u2206{I}_{F}$, however. If the overdispersion parameter $\phi $ ($\phi ={{\sigma}_{k}}^{2}/\overline{k}$) varies across environments, as expected if selection intensities vary, $E\left(\u2206{I}_{F}\right)$ will in turn also vary (but note it might be nonzero even in environments where selection is absent). Changes in $\u2206{I}_{F}$ therefore more reliably indicate changes in the true opportunity for selection than do changes in raw ${I}_{F}$ (Table 1). Nevertheless, there remains no free lunch. The only way to show that selection intensities on a given trait of interest do in fact vary is to test for variation in the slope of the relationship between trait and relative fitness (Morrissey & Hadfield, 2012; Wade & Kalisz, 1990), which obviously requires phenotypic information as well as fitness information.

Above, we defined $k$ as the number of offspring per parent surviving up to a particular point. The later in life offspring are counted, the lower $\overline{k}$ will be, and thus the higher raw ${I}_{F}$ will be. Consider a highly fecund species such as cod that produces hundreds of thousands of eggs. For a given level of real fitness differences among parents, ${I}_{F}$ computed via number of eggs will be tiny, whereas ${I}_{F}$ computed via number of juveniles surviving a full year will be much larger. On the one hand, this makes biological sense, in that the more offspring mortality that has accrued, the higher the maximum possible selection intensities on parental traits influencing offspring survival. On the other hand, offspring survival might be purely random with respect to parental phenotype, in which case the true scope for selection does not actually increase just because offspring are enumerated at older ages. Again, $\u2206{I}_{F}$ offers some advantages over raw ${I}_{F}$: comparing $\u2206{I}_{F}$ across consecutive offspring life stages can provide insights as to whether any intervening mortality is random or not (Table 1). An increase in $\u2206{I}_{F}$ above that expected by chance (see Section 5) suggests that selection *might* be occurring in the intervening period, but equally it might have nothing to do with the phenotype of parents (or offspring). For example, offspring from the same parents will co‐occur in space for some period in ontogeny (e.g. siblings sharing a nest or parental territory), so entire families can live or die as units owing to random catastrophes or variation in habitat quality.

As well as being sensitive to population (i.e. true) $\overline{k}$, raw ${I}_{F}$ is also sensitive to sample $\overline{k}$. Owing to constraints on study design, it might be possible to sample only a random subset of all offspring alive at a given age/stage. If sampling effort varies, this will produce variation in $\overline{k}$ estimates that has nothing to do with biology. Randomly sampling offspring at an early life stage is statistically equivalent to enumerating all offspring at some later life stage after which random mortality has occurred (as in Table 1, Scenario 1). $E\left(\u2206{I}_{F}\right)$ at this early life stage will be the same regardless of sampling effort, but the same is not true for $E\left({I}_{F}\right)$. Sampling a smaller fraction of offspring will bias ${I}_{F}$ upwards (because sample $\overline{k}$ is lower relative to true $\overline{k}$), giving the false impression of more scope for selection, whereas $\u2206{I}_{F}$ is unbiased in this regard. Note that if trait information is available, phenotypic selection estimates are not biased by lower sampling effort, but their uncertainty would be higher.

For a given mean fitness, the realised value of $I$ will vary randomly around some expectation owing to sampling effects. Uncertainty in the stochastic component of $I$ is higher when $\overline{W}$ is lower (Figure 2). For a given $\overline{W}$, uncertainty is also higher when population or sample size is lower. When selection occurs on one or more traits, sampling variation around the true selection intensities (i.e. population values of ${i}_{Z}$) will in turn be higher when $\overline{W}$ and/or sample size is lower. In other words, phenotypic selection is itself a stochastic process, and the realised magnitude of the trait‐determined component of $I$ in any given environment will depend on the realised selection pressures.

To illustrate these points, we simulate a simple scenario where a single trait $Z$ is under linear selection via variation in offspring number $k$. One way to model this is to work with expected fitness $W$ on the natural logarithm scale (Morrissey & Goudie, 2016), such that negative values are impossible: $\mathrm{ln}\left(W\right)=a+\mathit{bZ}$, where $a$ is the intercept and $b$ the slope of the individual fitness function. Expected absolute number of offspring is then given by $W=\mathrm{exp}\left(a+\mathit{bZ}\right)$, and this can be converted into a random variable by assuming some stochastic process. Here, for simplicity, we assume a Poisson process ($k$ ~ $\mathit{\text{pois}}\left(W\right)$), but any distribution could be used. This is conceptually equivalent to simulating a generalised Wright–Fisher model with weights given by $W$ (Waples, 2022b). A range of selection intensities was produced by varying $b$ across simulations, holding $a$ constant at the log of the desired $\overline{W}$ such that $E\left(\overline{k}\right)$ was independent of selection strength (as might occur for example with soft selection; Bell et al., 2021).

As expected, the results showed that realised ${I}_{F}$ increased with ${{i}_{Z}}^{2}$ (Figure 3). This relationship was always linear because $E\left(\overline{k}\right)$ was constant within each scenario. Curved relationships would have instead resulted if $E\left(\overline{k}\right)$ varied within scenarios (e.g. by assuming some relationship between the $a$ and $b$ parameters), because the trait‐determined and stochastic components of ${I}_{F}$ would then no longer be independent. Uncertainty in ${I}_{F}$ across replicate simulations was larger when $N$ was smaller or $\overline{k}$ was lower. The realised correlation between ${I}_{F}$ and ${{i}_{Z}}^{2}$ for a given replicate was thus lower and more variable when $N$ or $\overline{k}$ was lower. When selection was absent (${i}_{Z}=0$), the sampling variance in ${I}_{F}$ (and $\u2206{I}_{F}$) was well approximated by $2/N{\overline{k}}^{2}$ (Appendix S1). With real‐world data on some fitness variable, confidence intervals around estimates of ${I}_{F}$ or $\u2206{I}_{F}$ could be computed via bootstrapping (see Section 5).

To illustrate key points made in previous sections, we exploit 46 years (1973 to 2018) of individual‐level reproductive success data from a Dutch study population (National Park de Hoge Veluwe, the Netherlands) of great tits *Parus major* (small songbirds common across Eurasia) see Visser et al. (2021) and references therein for details on the study population and data collection methods. Annual reproductive output $k$ per known (ringed) female was measured in three different ways: number of eggs (including second clutches but excluding replacement clutches), number of fledglings (juveniles surviving the nestling phase; all clutches included) and number of recruits (first‐time breeders recorded in the study area the following years; all clutches included). Birds that underwent experimental manipulations were excluded. Average values (across all females and years combined) for the same parameters listed in Table 1 were then calculated. Raw ${I}_{F}$ was an order of magnitude greater for recruits compared to fledglings, which was in turn about twice that for eggs (Table 2). This simply reflects the fact that the biggest drop in $\overline{k}$ was between fledglings and recruits, given the substantial intervening mortality. Overdispersion ($\phi $>1) was apparent for all three fitness variables, particularly fledglings (Table 2). Note, however, that clutch/litter size is often underdispersed in other species (Kendall & Wittmann, 2010) and egg number might be underdispersed in other great tit populations, for example, those where second clutches are uncommon. The scaled overdispersion parameter ($\phi $’) of Crow and Morton (1955) was close to 1 for eggs and fledglings, indicating that random mortality up to the recruit stage would shrink initial overdispersion (or grow initial underdispersion) towards the Poisson expectation of 1. However, unscaled $\phi $ was 1.54 at the recruit stage, implying that intervening mortality was not completely random. Consistent with this, $\u2206{I}_{F}$ was much higher for recruits, compared to fledglings or eggs.

To assess whether the changes in $\u2206{I}_{F}$ between stages were more than expected by chance (bootstrapped 95% confidence intervals (BCIs) nonoverlapping zero), a nonparametric bootstrapping procedure was performed in which *n* = 100 ‘females‐years’ (an instance of a particular female breeding in a particular year; some females breed across multiple years) were randomly sampled, with replacement, from all 2486 female‐year records. The difference in $\u2206{I}_{F}$ between each pair of stages (eggs‐fledglings; fledglings‐recruits; eggs‐recruits) was then computed for this random sample, and this was repeated 10,000 times to generate bootstrapped distributions. The observed increase in $\u2206{I}_{F}$ between eggs and fledglings of 0.14 was more than expected by chance (lower BCI = 0.06; upper BCI = 0.23). The same was true for the difference in $\u2206{I}_{F}$ between fledglings and recruits (observed = 1.07; lower BCI = 0.14; upper BCI = 2.20), and for the difference in $\u2206{I}_{F}$ between eggs and recruits (observed = 1.20; lower BCI = 0.28; upper BCI = 2.35).

Supporting evidence for a nonrandom component to reproductive success in this study population comes from the fact that annual number of recruits is repeatable across females (Reed et al., 2016). Laying date and clutch size are two heritable traits that explain some of this individual heterogeneity among females, with selection on each trait varying in intensity and to some extent sign across years (Reed et al., 2016; Sæther et al., 2016; Visser et al., 2021). Additional traits might be under selection, and the big jump in $\u2206{I}_{F}$ between fledglings and recruits (Table 2) suggests that the combined fitness effects of all traits are strongest during the postfledgling period. Environmental factors shared by families can also increase $\u2206{I}_{F}$ above the null expectation, but such ‘common environment effects’ are expected to be stronger during the nestling phase (when siblings are still together) than in the postfledgling phase. Across years, the correlation between ${I}_{F}$ and ${{i}_{z}}^{2}$ for laying date (both estimated via number of recruits) was weak (Spearman's *r* = 0.26) and not statistically significant (*p* = 0.08). Similarly, the correlation between ${I}_{F}$ and ${{i}_{z}}^{2}$ for clutch size (both again estimated via number of recruits) was weak and not significant (Spearman's *r* = −0.08; *p* = 0.61). This suggests that the realised magnitude of demographic stochasticity varies substantially across years, and/or that selection on other unmeasured traits varies in intensity. Either way, the component of ${I}_{F}$ not explained by laying date or clutch size appears to be highly variable across years.

The research was carried out under licence AVD801002017831 of the Centrale Commissie Dierexperimenten (CCD) in the Netherlands. Fieldwork at the National Park de Hoge Veluwe was carried out with permission of the Park.

$I$ remains theoretically relevant in population biology, in that it places a hard ceiling on the rate of evolutionary adaptation. The more salient parameter, although, is ${I}_{A}\left(w\right)$, as this captures the *actual* rate of evolutionary adaptation, that is, the genetic increase in mean fitness due to selection (Bonnet et al., 2022; Hendry et al., 2018). Given that most variation in fitness typically is nonheritable, and that the heritability of fitness (${h}^{2}\left(w\right)={I}_{A}\left(w\right)/I$) might vary unpredictably across environments, $I$ is of limited practical utility for understanding/predicting evolutionary dynamics (Grafen, 1988). At best, it provides an answer to the question: ‘What is the maximum amount by which evolution could increase population growth in this environment?’, which might help guide management/conservation scenario‐planning, for example. $I$ also provides an answer to the question: ‘How strong could selection be on any trait?’. The related OSM metric of Pelletier and Coulson (2012) in turn provides an answer to the more specific question: ‘How strong could selection be on *my particular study trait’*. Again, however, knowing how strong things *could be* is of less practical utility than knowing how strong they in fact are.

More fundamentally, $I>0$ simply cannot be taken as evidence for selection, as all fitness variation could be random with respect to (multivariate) phenotype. Variation in $I$ across ecological contexts also cannot be taken as evidence that selection is varying, as the only thing that might be varying is the magnitude of demographic stochasticity. If mean fitness varies, the stochastic component of $I$ will then vary systematically, so realised $I$ is then an even‐less reliable guide to selection. Similarly, if demographic stochasticity varies systematically with some putative environmental driver of selection, the scope for being misled is high (Jennions et al., 2012). $\u2206{I}_{F}$ corrects for dependence on mean offspring number but is also open to misinterpretation (Waples, 2020; Waples & Reed, 2022). Comparing relative values of $\u2206{I}_{F}$ across situations where true or sample $\overline{k}$ varies can nonetheless give a more reliable indication of the true scope for selection than comparing raw ${I}_{F}$. Our empirical example with the great tits—a system in which parental reproductive success is known to be nonrandom with respect to laying date and clutch size—supported this latter contention.

Our simulations showed that the realised correlation between $I$ and the squared selection intensity on a given trait can be weak and highly variable, especially when mean fitness or population/sample size are low. Indeed, with the great tits, ${I}_{F}$ was statistically uncorrelated with ${{i}_{Z}}^{2}$ for both laying date and clutch size, despite 46 years of data. Few other studies seem to have explored this, but a study of water striders *Aquarius remigis* found a weak positive correlation (*r*_{s} = 0.34, *n* = 24) between $I$ and standardised linear selection gradients on body size (Ferguson & Fairbairn, 2001). The data were quite heterogeneous, however, involving three different fitness variables, complicating the comparison with the great tit results. Pelletier and Coulson (2012) documented positive relationships between viability selection differentials on juvenile body size and ${I}_{M}$ in both red deer *Cervus elaphus* and Soay sheep *Ovis aries*, but the ${R}^{2}$ values (0.30 and 0.15, respectively, including outliers) were still relatively low (see also Martin et al., 2015).

In conclusion, we hope our arguments and examples have clarified the dangers of uncritically using $I$ and related metrics to make inferences about selection and its drivers. There is simply no free lunch: measuring selection requires data on phenotypes! Nevertheless, comparing $I$ across life stages or environments can provide hints as to where selection might be acting most strongly, with $\u2206{I}_{F}$ being preferable to raw ${I}_{F}$ when $\overline{k}$ varies. This might in turn prompt further study or inform study design. If one does have phenotypic information from multiple years/locations, on top of paired fitness data, correlating squared selection intensities against $I$ can still be revealing. For example, if mean fitness is high and relatively constant, and population size is large, then variability in the stochastic component of $I$ should be low. A low correlation between $I$ and ${{i}_{Z}}^{2}$ for a focal trait would then imply variable selection intensities on one or more unmeasured traits. Wider reporting of the mean and variance of absolute fitness (by year/location), as well as the mean and variance of any measured traits and their relationship with absolute fitness (Hunter et al., 2018), will allow for improved general understanding of the ecological drivers of selection.

All authors contributed intellectually to the paper. The paper was written by Thomas E. Reed, with inputs from Robin S. Waples and Marcel E. Visser.

Thanks to Jacob Moorad for discussions on the OFS, and to Joel Pick, Michael Morrissey and an anonymous reviewer for thoughtful and detailed reviews. Thanks to Maria Teider and Roy Supratik for help on mathematical aspects. We also thank the many workers who contributed to the great tit data collection and collation over the years and the National Park de Hoge Veluwe for permission to work on their premises. TER was funded by an ERC Starting Grant (ERC‐2014‐StG‐256 639192‐ALH) and an SFI ERC Support Award. Open access funding provided by IReL.

The authors declare no conflict of interest.

Data available from the Dryad Digital Repository