Pseudoreplication in genomic‐scale data sets
Advanced Search
Select up to three search categories and corresponding keywords using the fields to the right. Refer to the Help section for more detailed instructions.

Search our Collections & Repository

For very narrow results

When looking for a specific result

Best used for discovery & interchangable words

Recommended to be used in conjunction with other fields

Dates

to

Document Data
Library
People
Clear All
Clear All

For additional assistance using the Custom Query please check out our Help Page

The NOAA IR serves as an archival repository of NOAA-published products including scientific findings, journal articles, guidelines, recommendations, or other information authored or co-authored by NOAA or funded partners. As a repository, the NOAA IR retains documents in their original published format to ensure public access to scientific information.
i

Pseudoreplication in genomic‐scale data sets

Filetype[PDF-2.00 MB]



Details:

  • Journal Title:
    Molecular Ecology Resources
  • Personal Author:
  • NOAA Program & Office:
  • Description:
    In genomic‐scale data sets, loci are closely packed within chromosomes and hence provide correlated information. Averaging across loci as if they were independent creates pseudoreplication, which reduces the effective degrees of freedom (df’) compared to the nominal degrees of freedom, df. This issue has been known for some time, but consequences have not been systematically quantified across the entire genome. Here, we measured pseudoreplication (quantified by the ratio df’/df) for a common metric of genetic differentiation (FST) and a common measure of linkage disequilibrium between pairs of loci (r2). Based on data simulated using models (SLiM and msprime) that allow efficient forward‐in‐time and coalescent simulations while precisely controlling population pedigrees, we estimated df’ and df’/df by measuring the rate of decline in the variance of mean FST and mean r2 as more loci were used. For both indices, df’ increases with Ne and genome size, as expected. However, even for large Ne and large genomes, df’ for mean r2 plateaus after a few thousand loci, and a variance components analysis indicates that the limiting factor is uncertainty associated with sampling individuals rather than genes. Pseudoreplication is less extreme for FST, but df’/df ≤0.01 can occur in data sets using tens of thousands of loci. Commonly‐used block‐jackknife methods consistently overestimated var (FST), producing very conservative confidence intervals. Predicting df’ based on our modelling results as a function of Ne, L, S, and genome size provides a robust way to quantify precision associated with genomic‐scale data sets.
  • Keywords:
  • Source:
    Molecular Ecology Resources, 22(2), 503-518
  • DOI:
  • ISSN:
    1755-098X;1755-0998;
  • Format:
  • Publisher:
  • Document Type:
  • Rights Information:
    Accepted Manuscript
  • Compliance:
    Library
  • Main Document Checksum:
  • Download URL:
  • File Type:

You May Also Like

Checkout today's featured content at repository.library.noaa.gov

Version 3.27.1