Dataset complexity impacts both MOTU delimitation and biodiversity estimates in eukaryotic 18S rRNA metabarcoding studies

De Santiago, Alejandro; Pereira, Tiago José; Mincks, Sarah L.; Bik, Holly M.

doi:10.1002/edn3.255

i

Dataset complexity impacts both MOTU delimitation and biodiversity estimates in eukaryotic 18S rRNA metabarcoding studies

Supporting Files

2021
By De Santiago, Alejandro ; Pereira, Tiago José ; Mincks, Sarah L. ; ...

Details

Journal Title:

Environmental DNA
Personal Author:

De Santiago, Alejandro ; Pereira, Tiago José ; Mincks, Sarah L. ; Bik, Holly M.
NOAA Program & Office:

NOAA General Documents
Description:

How does the evolution of bioinformatics tools impact the biological interpretation of high‐throughput sequencing datasets? For eukaryotic metabarcoding studies, in particular, researchers often rely on tools originally developed for the analysis of 16S ribosomal RNA (rRNA) datasets. Such tools do not adequately account for the complexity of eukaryotic genomes, the ubiquity of intragenomic variation in eukaryotic metabarcoding loci, or the differential evolutionary rates observed across eukaryotic genes and taxa. Recently, metabarcoding workflows have shifted away from the use of operational taxonomic units (OTUs) toward delimitation of amplicon sequence variants (ASVs). We assessed how the choice of bioinformatics algorithm impacts the downstream biological conclusions that are drawn from eukaryotic 18S rRNA metabarcoding studies. We focused on four workflows including UCLUST and VSearch algorithms for OTU clustering, and DADA2 and Deblur algorithms for ASV delimitation. We used two 18S rRNA datasets to further evaluate whether dataset complexity had a major impact on the statistical trends and ecological metrics: a “high complexity” (HC) environmental dataset generated from community DNA in Arctic marine sediments, and a “low complexity” (LC) dataset representing individually barcoded nematodes. Our results indicate that ASV algorithms produce more biologically realistic metabarcoding outputs, with DADA2 being the most consistent and accurate pipeline regardless of dataset complexity. In contrast, OTU clustering algorithms inflate the metabarcoding‐derived estimates of biodiversity, consistently returning a high proportion of “rare” molecular operational taxonomic units (MOTUs) that appear to represent computational artifacts and sequencing errors. However, species‐specific MOTUs with high relative abundance are often recovered regardless of the bioinformatics approach. We also found high concordance across pipelines for downstream ecological analysis based on beta‐diversity and alpha‐diversity comparisons that utilize taxonomic assignment information. Analyses of LC datasets and rare MOTUs are especially sensitive to the choice of algorithms and better software tools may be needed to address these scenarios.
Keywords:

Ecology Ecology, Evolution, Behavior And Systematics Genetics Ecology, Evolution, Behavior And Systematics Ecology, Evolution, Behavior And Systematics
Source:

Environmental DNA, 4(2), 363-384
DOI:

https://doi.org/10.1002/edn3.255
ISSN:

2637-4943 ; 2637-4943
Format:

PDF
Publisher:

Wiley
Document Type:

Journal Article
License:

https://creativecommons.org/licenses/by-nc/4.0/
Rights Information:

CC BY-NC
Compliance:

Library
Main Document Checksum:

urn:sha256:0f638ff2b8781d649408ab0d408a1611f75c7a75e43fdaeb120c4b703062b4db
Download URL:

https://repository.library.noaa.gov/view/noaa/59198/noaa_59198_DS1.pdf
File Type:

[PDF - 3.38 MB ]

Machine readable version (XML)

Download

ON THIS PAGE

Details Supporting Files

The NOAA IR serves as an archival repository of NOAA-published products including scientific findings, journal articles, guidelines, recommendations, or other information authored or co-authored by NOAA or funded partners. As a repository, the NOAA IR retains documents in their original published format to ensure public access to scientific information.