How Should Genes and Taxa be Sampled for Phylogenomic Analyses with Missing Data? An Empirical Study in Iguanian Lizards.

Abstract:

:Targeted sequence capture is becoming a widespread tool for generating large phylogenomic data sets to address difficult phylogenetic problems. However, this methodology often generates data sets in which increasing the number of taxa and loci increases amounts of missing data. Thus, a fundamental (but still unresolved) question is whether sampling should be designed to maximize sampling of taxa or genes, or to minimize the inclusion of missing data cells. Here, we explore this question for an ancient, rapid radiation of lizards, the pleurodont iguanians. Pleurodonts include many well-known clades (e.g., anoles, basilisks, iguanas, and spiny lizards) but relationships among families have proven difficult to resolve strongly and consistently using traditional sequencing approaches. We generated up to 4921 ultraconserved elements with sampling strategies including 16, 29, and 44 taxa, from 1179 to approximately 2.4 million characters per matrix and approximately 30% to 60% total missing data. We then compared mean branch support for interfamilial relationships under these 15 different sampling strategies for both concatenated (maximum likelihood) and species tree (NJst) approaches (after showing that mean branch support appears to be related to accuracy). We found that both approaches had the highest support when including loci with up to 50% missing taxa (matrices with ~40-55% missing data overall). Thus, our results show that simply excluding all missing data may be highly problematic as the primary guiding principle for the inclusion or exclusion of taxa and genes. The optimal strategy was somewhat different for each approach, a pattern that has not been shown previously. For concatenated analyses, branch support was maximized when including many taxa (44) but fewer characters (1.1 million). For species-tree analyses, branch support was maximized with minimal taxon sampling (16) but many loci (4789 of 4921). We also show that the choice of these sampling strategies can be critically important for phylogenomic analyses, since some strategies lead to demonstrably incorrect inferences (using the same method) that have strong statistical support. Our preferred estimate provides strong support for most interfamilial relationships in this important but phylogenetically challenging group.

journal_name

Syst Biol

journal_title

Systematic biology

authors

Streicher JW,Schulte JA 2nd,Wiens JJ

doi

10.1093/sysbio/syv058

subject

Has Abstract

pub_date

2016-01-01 00:00:00

pages

128-45

issue

1

eissn

1063-5157

issn

1076-836X

pii

syv058

journal_volume

65

pub_type

杂志文章
  • Toward an integrated system of clade names.

    abstract::Although the proposition that higher taxa should correspond to clades is widely accepted, current nomenclature does not distinguish clearly between different clades in nested series. In particular, the same name is often applied to a total clade, its crown clade, and clades originating with various nodes, branches, an...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150701656378

    authors: de Queiroz K

    更新日期:2007-12-01 00:00:00

  • Predicting total global species richness using rates of species description and estimates of taxonomic effort.

    abstract::We found that trends in the rate of description of 580,000 marine and terrestrial species, in the taxonomically authoritative World Register of Marine Species and Catalogue of Life databases, were similar until the 1950s. Since then, the relative number of marine to terrestrial species described per year has increased...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syr080

    authors: Costello MJ,Wilson S,Houlding B

    更新日期:2012-10-01 00:00:00

  • Untangling complex histories of genome mergings in high polyploids.

    abstract::Polyploidy, the duplication of entire genomes, plays a major role in plant evolution. In allopolyploids, genome duplication is associated with hybridization between two or more divergent genomes. Successive hybridization and polyploidization events can build up species complexes of allopolyploids with complicated netw...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150701424553

    authors: Brysting AK,Oxelman B,Huber KT,Moulton V,Brochmann C

    更新日期:2007-06-01 00:00:00

  • Tracing the temporal and spatial origins of island endemics in the Mediterranean region: a case study from the citrus family (Ruta L., Rutaceae).

    abstract::Understanding the origin of island endemics is a central task of historical biogeography. Recent methodological advances provide a rigorous framework to determine the relative contribution of different biogeographic processes (e.g., vicariance, land migration, long-distance dispersal) to the origin of island endemics....

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syq046

    authors: Salvo G,Ho SY,Rosenbaum G,Ree R,Conti E

    更新日期:2010-12-01 00:00:00

  • Multiple cophylogenetic analyses reveal frequent cospeciation between pelecaniform birds and Pectinopygus lice.

    abstract::Lice in the genus Pectinopygus parasitize a single order of birds (Pelecaniformes). To examine the degree of congruence between the phylogenies of 17 Pectinopygus species and their pelecaniform hosts, sequences from mitochondrial 12S rRNA, 16S rRNA, COI, and nuclear wingless and EF1-alpha genes (2290 nucleotides) and ...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150701311370

    authors: Hughes J,Kennedy M,Johnson KP,Palma RL,Page RD

    更新日期:2007-04-01 00:00:00

  • Mitochondrial DNA rates and biogeography in European newts (genus Euproctus).

    abstract::Sequence divergence for segments of three mitochondrial DNA (mtDNA) genes encoding the 12S and 16S ribosomal RNA and cytochrome b was examined in newts belonging to the genus Euproctus (E. asper, E. montanus, E. platycephalus) and in three other species belonging to the same family (Salamandridae), Triturus carnifex, ...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/46.1.126

    authors: Caccone A,Milinkovitch MC,Sbordoni V,Powell JR

    更新日期:1997-03-01 00:00:00

  • The effect of phylogeny on interspecific body shape variation in darters (Pisces: Percidae).

    abstract::We conducted a geometric morphometric analysis of interspecific body shape variation among representatives of 31 species of darters (Pisces: Percidae) to determine whether there is evidence of a phylogenetic effect in body shape variation. Cartesian transformation grids representing relative shape differences of indiv...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150390197019

    authors: Guill JM,Heins DC,Hood CS

    更新日期:2003-08-01 00:00:00

  • Phylogeny imbalance: taxonomic level matters.

    abstract::Two lines of evidence indicate that the degree of symmetry in phylogenetic topologies differs at different hierarchical levels. First, in a set of 61 phylogenies with superspecific taxa as their terminals, trees were on average more unbalanced (asymmetric) when the species richness of terminals was considered than whe...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150290102546

    authors: Purvis A,Agapow PM

    更新日期:2002-12-01 00:00:00

  • Empirical and hierarchical Bayesian estimation of ancestral states.

    abstract::Several methods have been proposed to infer the states at the ancestral nodes on a phylogeny. These methods assume a specific tree and set of branch lengths when estimating the ancestral character state. Inferences of the ancestral states, then, are conditioned on the tree and branch lengths being true. We develop a h...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:

    authors: Huelsenbeck JP,Bollback JP

    更新日期:2001-06-01 00:00:00

  • Simultaneously mapping and superimposing landmark configurations with parsimony as optimality criterion.

    abstract::All methods proposed to date for mapping landmark configurations on a phylogenetic tree start from an alignment generated by methods that make no use of phylogenetic information, usually by superimposing all configurations against a consensus configuration. In order to properly interpret differences between landmark c...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syr119

    authors: Catalano SA,Goloboff PA

    更新日期:2012-05-01 00:00:00

  • Biogeographic interpretation of splits graphs: least squares optimization of branch lengths.

    abstract::Although most often used to represent phylogenetic uncertainty, network methods are also potentially useful for describing the phylogenetic complexity expected to characterize recent species radiations. One network method with particular advantages in this context is split decomposition. However, in its standard imple...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150590906046

    authors: Winkworth R,Bryant D,Lockhart P,Havell D,Moulton V

    更新日期:2005-02-01 00:00:00

  • Optimal selection of gene and ingroup taxon sampling for resolving phylogenetic relationships.

    abstract::A controversial topic that underlies much of phylogenetic experimental design is the relative utility of increased taxonomic versus character sampling. Conclusions about the relative utility of adding characters or taxa to a current phylogenetic study have subtly hinged upon the appropriateness of the rate of evolutio...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syq025

    authors: Townsend JP,Lopez-Giraldez F

    更新日期:2010-07-01 00:00:00

  • Repeated evolution of dioecy from monoecy in Siparunaceae (Laurales).

    abstract::Siparunaceae comprise Glossocalyx with one species in West Africa and Siparuna with 65 species in the neotropics; all have unisexual flowers, and 15 species are monoecious, 50 dioecious. Parsimony and maximum likelihood analyses of combined nuclear ribosomal ITS and chloroplast trnL-trnF intergenic spacer sequences yi...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/106351501753328820

    authors: Renner SS,Won H

    更新日期:2001-09-01 00:00:00

  • Incomplete Lineage Sorting in Mammalian Phylogenomics.

    abstract::The impact of incomplete lineage sorting (ILS) on phylogenetic conflicts among genes, and the related issue of whether to account for ILS in species tree reconstruction, are matters of intense controversy. Here, focusing on full-genome data in placental mammals, we empirically test two assumptions underlying current u...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syw082

    authors: Scornavacca C,Galtier N

    更新日期:2017-01-01 00:00:00

  • Using supermatrices for phylogenetic inquiry: an example using the sedges.

    abstract::In this article, we use supermatrix data-mining methods to reconstruct a large, highly inclusive phylogeny of Cyperaceae from nucleotide data available on GenBank. We explore the properties of these trees and their utility for phylogenetic inference, and show that even the highly incomplete alignments characteristic o...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/sys088

    authors: Hinchliff CE,Roalson EH

    更新日期:2013-03-01 00:00:00

  • Evolutionary history of vegetative reproduction in Porpidia s.L. (Lichen-forming ascomycota).

    abstract::The evolutionary history of gains and losses of vegetative reproductive propagules (soredia) in Porpidia s.l., a group of lichen-forming ascomycetes, was clarified using Bayesian Markov chain Monte Carlo (MCMC) approaches to monophyly tests and a combined MCMC and maximum likelihood approach to ancestral character sta...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150600697465

    authors: Buschbom J,Barker D

    更新日期:2006-06-01 00:00:00

  • Quantification of homoplasy for nucleotide transitions and transversions and a reexamination of assumptions in weighted phylogenetic analysis.

    abstract::Nucleotide transitions are frequently down-weighted relative to transversions in phylogenetic analysis. This is based on the assumption that transitions, by virtue of their greater evolutionary rate, exhibit relatively more homoplasy and are therefore less reliable phylogenetic characters. Relative amounts of homoplas...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/106351500750049734

    authors: Broughton RE,Stanley SE,Durrett RT

    更新日期:2000-12-01 00:00:00

  • Phylogeny of Eunicida (Annelida) and exploring data congruence using a partition addition bootstrap alteration (PABA) approach.

    abstract::Even though relationships within Annelida are poorly understood, Eunicida is one of only a few major annelid lineages well supported by morphology. The seven recognized eunicid families possess sclerotized jaws that include mandibles and a maxillary apparatus. The maxillary apparatuses vary in shape and number of elem...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150500354910

    authors: Struck TH,Purschke G,Halanych KM

    更新日期:2006-02-01 00:00:00

  • Phylodynamic Model Adequacy Using Posterior Predictive Simulations.

    abstract::Rapidly evolving pathogens, such as viruses and bacteria, accumulate genetic change at a similar timescale over which their epidemiological processes occur, such that, it is possible to make inferences about their infectious spread using phylogenetic time-trees. For this purpose it is necessary to choose a phylodynami...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syy048

    authors: Duchene S,Bouckaert R,Duchene DA,Stadler T,Drummond AJ

    更新日期:2019-03-01 00:00:00

  • Distribution and phylogeny of Penelope-like elements in eukaryotes.

    abstract::Penelope-like elements (PLEs) are a relatively little studied class of eukaryotic retroelements, distinguished by the presence of the GIY-YIG endonuclease domain, the ability of some representatives to retain introns, and the similarity of PLE-encoded reverse transcriptases to telomerases. Although these retrotranspos...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150601077683

    authors: Arkhipova IR

    更新日期:2006-12-01 00:00:00

  • Efficient exploration of the space of reconciled gene trees.

    abstract::Gene trees record the combination of gene-level events, such as duplication, transfer and loss (DTL), and species-level events, such as speciation and extinction. Gene tree-species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species-level eve...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syt054

    authors: Szöllõsi GJ,Rosikiewicz W,Boussau B,Tannier E,Daubin V

    更新日期:2013-11-01 00:00:00

  • Phylogeny and biogeography of dolichoderine ants: effects of data partitioning and relict taxa on historical inference.

    abstract::Ants (Hymenoptera: Formicidae) are conspicuous organisms in most terrestrial ecosystems, often attaining high levels of abundance and diversity. In this study, we investigate the evolutionary history of a major clade of ants, the subfamily Dolichoderinae, whose species frequently achieve ecological dominance in ant co...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syq012

    authors: Ward PS,Brady SG,Fisher BL,Schultz TR

    更新日期:2010-05-01 00:00:00

  • Multilocus Phylogeny of the Afrotropical Freshwater Crab Fauna Reveals Historical Drainage Connectivity and Transoceanic Dispersal Since the Eocene.

    abstract::Phylogenetic reconstruction, divergence time estimations and ancestral range estimation were undertaken for 66% of the Afrotropical freshwater crab fauna (Potamonautidae) based on four partial DNA loci (12S rRNA, 16S rRNA, cytochrome oxidase one [COI], and histone 3). The present study represents the most comprehensiv...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syv011

    authors: Daniels SR,Phiri EE,Klaus S,Albrecht C,Cumberlidge N

    更新日期:2015-07-01 00:00:00

  • Multiple colonizations, in situ speciation, and volcanism-associated stepping-stone dispersals shaped the phylogeography of the Macaronesian red fescues (Festuca L., Gramineae).

    abstract::Whereas examples of insular speciation within the endemic-rich Macaronesian hotspot flora have been documented, the phylogeography of recently evolved plants in the region has received little attention. The Macaronesian red fescues constitute a narrow and recent radiation of four closely related diploid species distri...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150802302450

    authors: Díaz-Pérez A,Sequeira M,Santos-Guerra A,Catalán P

    更新日期:2008-10-01 00:00:00

  • Exploration of Plastid Phylogenomic Conflict Yields New Insights into the Deep Relationships of Leguminosae.

    abstract::Phylogenomic analyses have helped resolve many recalcitrant relationships in the angiosperm tree of life, yet phylogenetic resolution of the backbone of the Leguminosae, one of the largest and most economically and ecologically important families, remains poor due to generally limited molecular data and incomplete tax...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syaa013

    authors: Zhang R,Wang YH,Jin JJ,Stull GW,Bruneau A,Cardoso D,De Queiroz LP,Moore MJ,Zhang SD,Chen SY,Wang J,Li DZ,Yi TS

    更新日期:2020-07-01 00:00:00

  • Assessing progress in systematics with continuous jackknife function analysis.

    abstract::Systematists expect their hypotheses to be asymptotically precise. As the number of phylogenetically informative characters for a set of taxa increases, the relationships implied should stabilize on some topology. If true, this increasing stability should clearly manifest itself if an index of congruence is plotted ag...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150390132731

    authors: Miller JA

    更新日期:2003-02-01 00:00:00

  • The Ascomycota tree of life: a phylum-wide phylogeny clarifies the origin and evolution of fundamental reproductive and ecological traits.

    abstract::We present a 6-gene, 420-species maximum-likelihood phylogeny of Ascomycota, the largest phylum of Fungi. This analysis is the most taxonomically complete to date with species sampled from all 15 currently circumscribed classes. A number of superclass-level nodes that have previously evaded resolution and were unnamed...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syp020

    authors: Schoch CL,Sung GH,López-Giráldez F,Townsend JP,Miadlikowska J,Hofstetter V,Robbertse B,Matheny PB,Kauff F,Wang Z,Gueidan C,Andrie RM,Trippe K,Ciufetti LM,Wynns A,Fraker E,Hodkinson BP,Bonito G,Groenewald JZ,Arzanlou

    更新日期:2009-04-01 00:00:00

  • Biogeography explains cophylogenetic patterns in toucan chewing lice.

    abstract::Historically, comparisons of host and parasite phylogenies have concentrated on cospeciation. However, many of these comparisons have demonstrated that the phylogenies of hosts and parasites are seldom completely congruent, suggesting that phenomena other than cospeciation play an important role in the evolution of ho...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150490265085

    authors: Weckstein JD

    更新日期:2004-02-01 00:00:00

  • The PhyLoTA Browser: processing GenBank for molecular phylogenetics research.

    abstract::As an archive of sequence data for over 165,000 species, GenBank is an indispensable resource for phylogenetic inference. Here we describe an informatics processing pipeline and online database, the PhyLoTA Browser (http://loco.biosci.arizona.edu/pb), which offers a view of GenBank tailored for molecular phylogenetics...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150802158688

    authors: Sanderson MJ,Boss D,Chen D,Cranston KA,Wehe A

    更新日期:2008-06-01 00:00:00

  • Evolution of a RNA polymerase gene family in Silene (Caryophyllaceae)-incomplete concerted evolution and topological congruence among paralogues.

    abstract::Four low-copy nuclear DNA intron regions from the second largest subunits of the RNA polymerase gene family (RPA2, RPB2, RPD2a, and RPD2b), the internal transcribed spacers (ITSs) from the nuclear ribosomal regions, and the rps16 intron from the chloroplast were sequenced and used in a phylogenetic analysis of 29 spec...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150490888840

    authors: Popp M,Oxelman B

    更新日期:2004-12-01 00:00:00