Highly accurate species identification of Eastern Pacific rockfishes (Sebastes spp.) with high-throughput DNA sequencing

Genetic species identification is often necessary for species flocks, such as rockfishes in the genus Sebastes (Teleostei, Scorpaenidae). Traditional visual identification methods are challenged by the presence of many sympatric rockfish species with morphologically similar juveniles. Here we present a straightforward approach for species identification in rockfishes using 96 nuclear microhaplotype loci that can be efficiently genotyped using high-throughput DNA sequencing. Self-assignment of nearly 1 000 samples from 54 species resulted in > 99% accurate species identification at a 95% confidence threshold. Phylogenetic relationships of Sebastes uncovered with these same loci were highly concordant with relationships previously derived primarily with mitochondrial DNA. We also assessed ascertainment bias and consequent reduced nucleotide diversity and heterozygosity in non-ascertainment species to understand the potential utility of these markers for those species. The data and protocol presented here will be useful for research and management of rockfishes in the Northeastern Pacific Ocean.


Introduction
Species identification is necessary when taxa that are the subject of study have closely related and morphologically similar congeners.Generally, visual identification is the first priority, as it is typically low-cost and rapid.However, it can be inaccurate, particularly for juvenile life stages, which often lack the morphological characteristics -especially identification (e.g., Baetscher et al. 2019).However, ascertainment bias impacts the ability to use genetic markers for this type of multi-level identification.Ascertainment bias occurs because the initial assessment of genetic variation within a small number of samples means that more common SNPs are more likely -and rare SNPs are less likely -to be identified (Nielsen et al. 2004;Clark et al. 2005).When the ascertainment samples consist of a small subset of the total number of species analyzed, ascertainment bias suggests that only some of the variation from the initial SNP discovery samples will be shared across species due to different demographic histories and rates of mutation (Li and Kimmel 2013).One manifestation of this bias can be reduced variation in the genetic markers commensurate with the evolutionary genetic distance between the taxa used for marker discovery and other species of interest (Wakeley et al. 2001;Vowles and Amos 2006).Another outcome is that markers may not amplify phylogenetically distant species because of uncharacterized variation in the primer sites.
In marine fishes, few groups are as speciose as the rockfishes of the genus Sebastes, which includes over 100 species globally, almost all of which are found exclusively in the North Pacific Ocean (Love et al. 2002).Nearshore species are abundant in kelp forests and are prominent in studies of ecology and community structure along the West Coast of North America (Carr 1991).Rockfishes also support important commercial and recreational fisheries throughout their Northeastern Pacific range from Alaska to Mexico, where some regulations do not differentiate among species and others apply to species complexes, both because many species co-occur and also to alleviate the need to identify each fish (e.g., the "Other Rockfish" stock complex in the Gulf of Alaska; Tribuzio et al. 2017).Despite this inconsistent regulatory framework, adult rockfishes can be accurately identified in most cases based on morphometric characteristics; however, juveniles and cryptic species are frequently misidentified (Butler et al. 2012).
Mitochondrial DNA (mtDNA) data suggest that Sebastes arose during the middle Miocene in the Northwest Pacific and quickly diversified and dispersed into habitats produced by high-latitude cooling and upwelling systems throughout the North Pacific (Hyde and Vetter 2007).Originally, phylogenetic relationships among rockfishes were defined by morphologic and meristic characters, with genetic data -specifically mitochondrial DNA -incorporated by the early 2000s (see Kendall 2000 for a comprehensive review).Closely related species have been the subject of recent genetic studies, which have identified cryptic species where adult specimens are morphologically similar and sometimes indistinguishable (Orr and Blackburn 2004;Gharrett et al. 2005;Burford and Bernardi 2008;Orr and Hawkins 2008;Hyde and Vetter 2009;Hess et al. 2013;Frable et al. 2015).These discoveries of cryptic species have coincided with increased genetic monitoring of rockfish populations for commercial and recreational groundfish fisheries and population assessments (Orr and Blackburn 2004;Berntson and Moran 2009).Previous research used mitochondrial and nuclear loci to genetically identify rockfish species (Hyde and Vetter 2007;Pearse et al. 2007).In this study, we describe a new protocol for genetic species identification of rockfishes, including almost all lineages found in the California Current and Gulf of Alaska Large Marine Ecosystems.We efficiently assay 96 nuclear microhaplotype loci using high-throughput DNA sequencing of amplicons and provide almost perfect species identification in this group of fishes.The genetic markers (described in Baetscher et al. 2018;2019) were discovered using doubledigest restriction-site associated DNA sequencing (ddRADseq; Peterson et al. 2012) of S. atrovirens (kelp rockfish) samples.Initially, these markers were selected for the high heterozygosity necessary for identifying family relationships within S. atrovirens and its sympatric close relatives S. chrysomelas (black-and-yellow rockfish) and S. carnatus (gopher rockfish).Additionally, the markers were designed for multiplexed analysis using next-generation DNA sequencing of amplicons, which allows researchers to generate genotype data for hundreds-to-thousands of fish in a single sequencing reaction.Given that some collection techniques employed to sample juvenile rockfishes can capture hundreds of fish in a single sampling event (Ammann 2004), a high-throughput method is particularly useful.In the approach we describe here, we conducted species identification using genetic assignment tests.Such assignment tests are employed to determine the likelihood that a sample originates from one or more populations based on allele frequencies derived from reference samples taken from those populations (Paetkau et al. 1995;Rannala and Mountain 1997).Self-assignment provides a metric of how well a particular set of genetic markers can differentiate among taxa when the identity of the true taxon is known.Intuitively, the accuracy of assignment tests is limited by the biology and life history of the organism -species with high gene flow have populations that are more difficult to differentiate, and require high-resolution genetic data, whereas species with almost no gene flow have populations that are typically easily discriminated using a sufficient number of polymorphic loci.Given that our study involved classifying species rather than populations, we anticipated identifying samples to true species with high accuracy, assuming little-to-no ongoing gene flow among species.
Genotype data generated for testing species assignment allowed us to estimate phylogenetic relationships of more than 50 rockfish taxa using nuclear DNA markers and compare these results with a previously published phylogeny for Sebastes based on seven mtDNA and two nuclear genes (Hyde and Vetter 2007).Depending on the evolutionary history of the organism, nuclear and mtDNA genes can produce discrepant signals of diversification (Shaw 2002;Chan and Levin 2005) and, thus, comparing the nuclear phylogeny against patterns derived in large part from mtDNA highlights areas where the two marker types depict inconsistent relationships.A recent phylogenetic study of six rockfish species using nuclear markers provides us with a comparison for the subgenus Sebastosomus (Wallace et al. 2022).Furthermore, we describe phylogenetic relationships for a recently described cryptic species relevant to our geographic study region.
Phylogenetic relationships help to contextualize the low levels of heterozygosity and nucleotide diversity for species not included in our marker ascertainment process and allow us to assess this ascertainment bias based on evolutionary genetic distance.Reduced heterozygosity diminishes the utility of these markers for intraspecific genetic analyses, including population structure and pedigree inference, even for species within the same subgenus as the ascertainment species.This work describes a valuable analysis tool for research of rockfishes when confident species identity is required, an examination of phylogenetic relationships across the genus, and insight into how nucleotide diversity rapidly declines in species not included in the marker discovery process.

Samples
Samples from adults of 54 species of rockfishes (Sebastes) and cabezon (Scorpaenichthys marmoratus), the sister species of the genus Sebastes, were obtained by trawl and hook-and-line fishing.Rockfishes were identified morphologically by experts from the NOAA Southwest Fisheries Science Center or researchers at the University of California, Santa Cruz.For the majority of samples, DNA was extracted from fin tissue using DNeasy 96 Blood & Tissue kits on a BioRobot 3000 (Qiagen, Inc.), eluted into 200 µL, with extracts stored at 4° C. For species with few adult samples available, DNA was extracted from juvenile samples as described.A small number of samples were received as previously extracted DNA and stored at 4° C prior to sequencing library preparation.

Genotyping and analysis
Samples were genotyped with a set of 96 microhaplotype markers ascertained in S. atrovirens, S. carnatus and S. chrysomelas using the Genotyping-in-Thousands by sequencing (GT-seq; Campbell et al. 2015) protocol, as modified by Baetscher et al. (2018).The amplicon-sequencing library preparation includes an initial multiplex PCR step to amplify target loci and a second PCR to add sequencing adapters and barcodes for identifying samples.Normalized libraries were sequenced using 2 × 75 bp paired-end sequencing on a MiSeq instrument (Illumina, Inc.).Raw sequence reads were sorted by individual barcode using the MiSeq Analysis Software (Illumina), and then paired reads were combined and mapped to a reference using the bioinformatic workflow in Baetscher et al. (2018).Variants were called across samples using FREEBAYES (Garrison and Marth 2012) and the output variant call format (VCF) files were filtered for quality (minQ = 30; minDP = 10) and merged using VCFTOOLS (Danecek et al. 2011).In microhaplotypes, multiple single nucleotide polymorphisms (SNPs) segregate together within a single sequencing read and do not require statistical phasing (Stephens and Donnelly 2003), which makes it relatively straightforward to call individual haplotypes from mapped data files and the combined VCF file using the software program MICRO-HAPLOT (Ng and Anderson 2016).Resulting genotypes were filtered in R (R Core Team 2016) using a minimum threshold of 20 reads per individual/locus and a minimum read depth ratio of 0.4, which applies to heterozygotes and is a measure of the number of reads of the second most common allele divided by the read depth of the most common allele.Loci with high rates of missing data or deviations from Hardy-Weinberg equilibrium (HWE) were removed and then samples with missing data at more than 25 of the remaining loci were dropped from further analysis.This missing data threshold was intentionally liberal to avoid removing samples of species in which a larger proportion of loci failed to amplify due to uncharacterized variation in the primer sites (Fig. S1).Such variation is more common in genetic markers applied to species that are phylogenetically distant from the ascertainment species due to different demographic histories and rates or directionalities of mutation.
Since juvenile rockfishes are commonly misidentified, only genotypes for adults were included, except for species in which we had fewer than five adult samples and samples from juveniles were available.The veracity of the species identity for these juvenile samples was evaluated by the self-assignment analysis (see below).A maximum of 32 individuals per species was included, when available, to generate a dataset with a representative estimate of assignment accuracy across the genus.The data set was tested for deviations from HWE using the R package PEGAS (Paradis 2010) and pairwise F ST was calculated with heterozygosity For the Bayesian analysis, FASTA alignments were converted to Nexus format using PGDSpider (v.2.1.1.5;Lischer and Excoffier 2012), and then used as input for MRBAYES (v.3.2; Huelsenbeck and Ronquist 2001).Parameters included a GTR substitution model and one million generations, where generation time was increased experimentally until the standard deviation of split frequencies dipped below 0.01 and the Potential Scale Reduction Factor (PSRF) converged to 1.This included a uniform Dirichlet prior (1,1,1,1) and 25% burn-in with sampling from the posterior every 5000 generations.Phylogenetic trees generated by this analysis were visualized using Fig-Tree (v 1.4.3;Rambaut 2016).
Because the marker set was designed using data from S. atrovirens, S. chrysomelas, and S. carnatus based on the variability in those species, the amount of variation in other species was expected to be affected due to ascertainment bias.This bias was quantified as the decrease in mean internal heterozygosity and nucleotide diversity for each species with increasing genetic distance from S. atrovirens.Genetic distance was calculated in MEGA using a variety of model settings to determine the extent to which estimates of genetic distance in these data are sensitive to model choice (Fig. S2).Nucleotide diversity was calculated per variant site for each species in VCFTOOLS and then the sum of all sites within a species was divided by the total number of bases in the 96 loci to account for invariant sites.

Genotyping and data analysis
A total of 997 rockfish samples from 54 species were genotyped and analyzed with a VCF file that had previously been generated from 1 690 rockfish samples and contained 4,322 variant sites from all species (Baetscher 2019; Table S1).Five loci (Sat_914, Sat_934, Sat_1399, Sat_1871, Sat_2513) with large amounts of missing data across > 35% of species and one locus (Sat_1166) with three or more haplotypes per individual in some species, suggestive of a paralogous locus, were removed.Only genotypes that passed filtering thresholds for read depth, allelic ratio, and missing data were retained for analyses.In three species (S. reedi, S. wilsoni, and S. crameri) with fewer than two adult samples available, genotypes from juveniles were included.The number of samples per species ranged from two (S.rufinanus) to 32 (S.atrovirens; Table 1).
The majority of species-by-locus combinations conformed to HWE; however, the six species with the greatest number of deviations (> 10 loci out of HWE), were S. rosaceus (18 loci), S. carnatus (15 loci), S. chrysomelas weighted by group size, also in R using HIERFSTAT (Goudet 2005).

Genetic assignment
Genetic self-assignment was conducted in the R package RUBIAS (Moran and Anderson 2018) using the leave-oneout self-assignment function with default allele frequency prior.Leave-one-out procedures remove the gene copies for each sample from the allele counts of its known population/ taxon of origin before calculating the likelihood that the sample came from that population, in order to avoid overestimating assignment accuracy.RUBIAS provides a likelihood for each sample assigning to every reference population and a z-statistic for each sample assignment.The z-statistic is the difference (in number of standard deviations) between the observed log-probability of an individual's genotype given it came from a specific population, and the log-probability expected for an individual from that population.The mean and standard deviation of the expected log-probability values are computed by RUBIAS using the locus-specific allele frequencies and the assumption of HWE.When the probability of assignment is high for a given reference population but the z-statistic is outside the expected range (<-3 or > 3), this can be an indication that the sample belongs to a population that is not included in the reference dataset.In an effort to ensure that only samples that were confidently identified to true species were included, any samples that were assigned to a reference taxon with a z-statistic <-3 or > 3 were excluded from the final dataset.

Phylogenetic analyses
Samples verified by self-assignment were used to construct phylogenetic trees.To generate consensus sequence data for building trees, species-specific VCF files were produced by FREEBAYES and then a consensus FASTA file for each species was created using VCFTOOLS (Danecek et al. 2011).A member of the sister genus to Sebastes, Scorpaenichthys marmoratus (cabezon) was used to root the phylogenetic trees.Loci in each species-consensus FASTA file were concatenated with the GENEIOUS software program (v 7.1.7;Kearse et al. 2012) before export to MUSCLE (Edgar 2004) with alignments output in ClustalW format.These were then used as input for MEGA (v. 7.0.26;Kumar et al. 2016) to generate maximum-likelihood trees using the General Time Reversible (GTR) model (Nei and Kumar 2000) with 1 000 bootstrap replicates, which was consistent with the model used by Hyde and Vetter (2007) for their Sebastes phylogeny.A similar analysis was performed to generate an unrooted maximum-likelihood tree, without cabezon, also using the GTR model and 1 000 bootstrap replicates. 1 3

Ascertainment bias
Observed heterozygosity in most species declined substantially when compared to S. atrovirens, with a smaller decrease in S. chrysomelas and S. carnatus (mean for S. atrovirens, chrysomelas, carnatus = 0.423, overall mean = 0.130; range = 0.012-0.458;Fig. 2).Nucleotide diversity sharply declined with genetic distance from S. atrovirens (Fig. 3), with low levels of diversity even in species in the same subgenus as S. atrovirens.Genetic distance was calculated as pairwise differences since a comparison indicated that nucleotide substitution model does not substantially alter distance estimates for this dataset (Fig. S2).
Based on these results, over 80% of species analyzed in this study contained less than half of the nucleotide diversity of S. atrovirens over a genetic distance of fewer than 0.04 base differences per site for 10 695 total sites, excluding gaps and missing data.

Discussion
Here we demonstrate the high accuracy (> 99% correct assignment) of a set of short haplotypic markers for identifying 54 species of the genus Sebastes, including all of the species commonly found in the California Current Large Marine Ecosystem along the Pacific coast of North America.Using these loci, we distinguish between closely related and recently described cryptic species, describe phylogenetic relationships, and quantify a decrease in the heterozygosity and nucleotide diversity of these genetic markers in species with increasing evolutionary genetic distance from the ascertainment species.
Ecological studies and management of fisheries require efficient methods to conclusively identify sympatric marine species, particularly at the larval and juvenile stages.In rockfishes, planktonic larvae from many species coexist during their pelagic phase and remain challenging to identify morphologically as they recruit to settlement habitats (Butler et al. 2012).Even as adults, the number of species present in overlapping habitats, the presence of cryptic species (e.g., Frable et al. 2015), and subtle differences in coloration or morphology (Ingram and Kai 2014) underscore the need for genetic species identification.Previous marker types have been used for this task; one such study included 33 species with 97.4% assignment accuracy (Pearse et al. 2007), and the other, a much more complete survey of the genus, genotyped 103 individuals from 101 species at seven mitochondrial and two nuclear genes, but did not test these data for genetic assignment accuracy (Hyde and Vetter 2007).Our method of genotyping fewer than 100 multiplexed microhaplotype loci with high-throughput DNA sequencing is (13 loci), S. ensifer (13 loci), S. diaconus (11 loci), and S. mystinus (10 loci).30% of the loci were out of HWE in four of the 54 species, and three loci, Sat_770, Sat_875, and Sat_2178, were out of HWE in 8-13 species.Pairwise F ST ranged from 0.015 between S. carnatus and S. chrysomelas to 0.746 between S. levis and S. entomelas (mean F ST = 0.45, s.d.= 0.13).

Self-assignment
Self-assignment resulted in 98.3% accuracy at a scaledlikelihood value of 0.95, and all mis-assigned individuals at > 50% likelihood were either S. carnatus assigning to S. chrysomelas, or vice versa.These assignment results indicated that this set of genetic markers cannot consistently distinguish between S. carnatus and S. chrysomelas and that a single genetic reporting group would be appropriate for assignment.
The self-assignment analysis was performed again after creating a single S. carnatus/chrysomelas reporting group and 100% of samples were correctly assigned at a 50% scaled-likelihood threshold.At the 95% confidence level, assignment accuracy was 99.2% and all lower confidence assignments were S. carnatus or S. chrysomelas samples that assigned to the joint reporting group, but at a scaledlikelihood below 95%.

Phylogenetic trees
Species relationships were elucidated with maximum-likelihood and Bayesian trees.Both rooted trees (Fig. 1, Fig. S3) and an unrooted tree (Fig. S4) recovered very similar phylogenetic relationships.Branch support on the Bayesian tree was generally higher than for the maximum-likelihood trees, which had consistent bootstrap values, but with slight differences at some of the deeper nodes.Some of the most confident relationships in the Bayesian tree included the position of S. atrovirens clustered with members of the Pteropodus subgenus, as well as that S. saxicola and S. semicinctus appeared proximate to Pteropodus and distant from other members of the subgenus Allosebastes (Fig. 1).Monophyletic relationships among taxa within the subgenus Sebastomus garnered strong support with the exception of S. rufus, which groups with the subgenus Acutomentum (Fig. 1).While the branch support for these phylogenetic positions varied between the maximum-likelihood and Bayesian analyses, the overall pattern among these subgenera appeared consistent.
but still above a 50% scaled-likelihood.Notably, these sister species have been the subject of ongoing research (Narum et al. 2004;Buonaccorsi et al. 2011) and our results from the self-assignment demonstrate the challenge of separating the two groups with existing genetic markers and call into question their taxonomic status as two distinct species.
Coincidentally, S. carnatus/chrysomelas are also the most phylogenetically proximate to the primary ascertainment species (S. atrovirens; Fig. 1; Fig. S3, Fig. S4), and with nearly as much variation in these loci (Figs. 2 and  3).And while these genetic markers easily differentiate highly accurate, efficient for large sample sizes and can be coupled with a reproducible analysis workflow based on the reference database for species assignment generated by this study.
Self-assignment using genotype data from 90 retained microhaplotype markers accurately identified the true species identity of every sample for all 54 species, with the exception of two extremely closely related species.At a stringent likelihood threshold (> 95%), eight samples of S. carnatus and S. chrysomelas assigned to the combined S. carnatus/chrysomelas group at a lower level of confidence, likely involve both allopatric and sympatric processes, including habitat differentiation associated with depth gradients (Ingram 2011) and mate choice reinforced by internal fertilization (Buonaccorsi et al. 2011).
Previously described rockfish species relationships relied heavily on mitochondrial DNA data (Kai et al. 2003;Li et al. 2006Li et al. , 2007;;Hyde and Vetter 2007), providing an opportunity to apply the nuclear markers from this study to estimate phylogenetic relationships for comparison (Fig. 1, Fig. S3, Fig. S4).Rooted and unrooted maximumlikelihood trees produced consistent topologies with very similar branch support, although some deeper nodes in the unrooted tree garnered higher support, while other nodes were better supported in the rooted tree (Fig. 1, Fig. S4).High confidence nodes in the Bayesian tree were generally well supported in the maximum-likelihood tree, with most differences occurring at nodes with lower support, such as the position of either S. alutus or S. borealis in a clade with juvenile-stage cryptic species (e.g., S. mystinus/diaconus, S. aleutianus/melanostictus) and those commonly misidentified even as adults (e.g., S. flavidus/serranoides), they underperform for S. carnatus/chrysomelas.This indicates that these taxa are more genetically similar than every other pair of sister species included in our dataset, at least in the portion of the genome surveyed with these loci, consistent with the lowest pairwise F ST value (0.015) in the study.Previous work on S. carnatus and S. chrysomelas identified a single, highly diverged locus and concluded that the pair is likely in the final stages of speciation, but with ongoing gene flow (Narum et al. 2004;Buonaccorsi et al. 2011).A more recent investigation using reduced-representation and whole genome resequencing found three distinct genomic regions with elevated divergence and variation in genes pointing to the importance of coloration and vision (Behrens et al. 2021).Results from these studies are consistent with the general idea that speciation mechanisms in rockfishes  1 Areas in which the microhaplotype tree (Fig. 1) deviates from their tree include clade "D" nesting within Pteropodus, and members of Eosebastes, S. aurora and S. diploproa, nesting within Sebastichthys.At the species level, more variation exists.For example, both trees depict close phylogenetic relationships among S. atrovirens, S. carnatus, and S. chrysomelas, with the microhaplotype tree placing S. maliger as a closer relative of the three species than S. caurinus, as in the mitochondrial tree.Other small differences in the topologies include strong support that S. melanops is more closely related to S. flavidus than S. serranoides (a relationship also identified by Wallace et al. 2022); and that S. goodei is more closely related to S. paucispinis than to S. jordani.We also show that S. diaconus and S. mystinus are easily distinguished and nearest neighbors in the phylogeny, which is unsurprising since these species were only recently described as separate taxa (Frable et al. 2015;Wallace et al. 2022).
Taxonomy of rockfishes, particularly of subgenera, has been and continues to be dynamic, as highlighted by S. melanostictus and S. aleutianus (Fig. 1, Fig. S3).Few instances of well-supported Bayesian relationships deviate from the maximum-likelihood tree, although S. polyspinis presents one such case.The Bayesian tree topology from our data is the most appropriate for comparison with the phylogeny in Hyde and Vetter (2007) since the analyses are equivalent and, although Bayesian methods can overestimate node support, bootstrapped maximum-likelihood values may be overly conservative (Douady et al. 2003).
Most relationships remain consistent between the microhaplotype tree topologies and the more complete Sebastes tree from Hyde and Vetter (2007).Although Hyde and Vetter analyze species that are absent from our dataset, primarily from the Northwest Pacific and North Atlantic, we analyze representatives from each major clade with the exception of the subgenera Sebastocles and Mebarus, whose constituents are exclusively in the Northwest Pacific, with the exception of S. atrovirens which should clearly be included in the Pteropodus subgenus.Generally, we find very high concordance with Hyde and Vetter (2007) at the subgeneric level.The genetic markers we employ, and our subsequent analytical workflow, provide highly accurate species identification and estimates of phylogenetic relationships largely consistent with previous genetic data.In addition, we describe a flexible protocol for modifying the set of target loci and accounting for ascertainment bias to suit the specific needs of a variety of ecological studies and fisheries management objectives.
multiple revisions of subgeneric classifications (Love et al. 2002).For example, S. diploproa is part of the subgenus Sebastichthys in Kendall (2000), who cites Eigenmann and Beeson (1894), but Li et al. (2006) designate S. diploproa as a member of Allosebastes, attributed to Gilbert (1890).Phylogenetic relationships described by the microhaplotype data are generally consistent with mitochondrial data and support polyphyly of generally accepted subgenera, including Acutomentum, Allosebastes, and Sebastosomus (Hyde and Vetter 2007;Li et al. 2007).A formal re-description of these subgenera would alleviate some of the taxonomic confusion but comprehensive taxonomic revision would require data from more species in the genus than are included in this study.
The set of nearly 100 microhaplotype loci target substantial variation in the ascertainment species, S. atrovirens, S. carnatus, and S. chrysomelas (Baetscher et al. 2018;2019) and contain a similar amount of variation in a closely related taxon (S. maliger).However, variation declines rapidly with increasing genetic distance (Fig. 3), even for members of the Pteropodus subgenus.Such reduced variation has been documented in studies of ascertainment bias in microsatellite loci across multiple genera (Vowles and Amos 2006).Even so, the ascertainment bias we observe here is even more significant than previously observed, with dramatically decreased nucleotide diversity over relatively small evolutionary genetic distances, with only the most closely related species to those included in the marker discovery process found to have substantial variation (Fig. 3).The surprising amount of variation in S. rosaceus and S. ensifer, despite their evolutionary distance from Pteropodus, might be explained by cryptic structure in those species, as indicated by the relatively high number of loci that deviated from HWE.However, selectively removing loci for individual species would be challenging with the > 50 species included in this analysis.
Although the relatively low observed heterozygosity found in this set of markers for the majority of species analyzed here suggest limited utility for purposes other than species identification (e.g., pedigree reconstruction), the amplicon library preparation protocol is highly flexible and enables researchers to add additional loci or swap out markers that would increase power for species of particular interest.Such an effort could bolster this set of markers for population genetic structure or pedigree analyses in additional species, and previous research has shown that genotyping samples with a single set of genetic markers to both identify species and analyze pedigree relationships is an economical approach (Baetscher et al. 2019).
Here, we describe an efficient method for genotyping and analyzing genetic data to identify species of rockfishes, particularly for taxa commonly captured together as juveniles.

Fig. 1
Fig. 1 Consensus tree estimated using the General Time Reversible (GTR) model and Bayesian posterior analysis for 54 Sebastes species and one member of the sister genus, Scorpaenichthys.Genetic data

Fig. 2
Fig. 2 Heterozygosity for 54 Sebastes species genotyped with 96 nuclear genetic markers.Mean internal heterozygosity per species is indicated as the dark bar inside the box.Boxes represent the 25th and 75th percentiles (first and third quartiles) and whiskers extend to the

Fig. 3
Fig. 3 Genetic distance from Sebastes atrovirens and nucleotide diversity for 54 Sebastes species classified to subgenus.Genetic data includes 96 nuclear markers.Genetic distance is measured as pair-

Table 1
Number of samples per Sebastes species included in the self-assignment and phylogenetic analyses.Mean nucleotide diversity, mean internal heterozygosity and nominal subgenera classification is included