Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies.

Abstract:

:The reality of larger and larger molecular databases and the need to integrate data scalably have presented a major challenge for the use of phenotypic data. Morphology is currently primarily described in discrete publications, entrenched in noncomputer readable text, and requires enormous investments of time and resources to integrate across large numbers of taxa and studies. Here we present a new methodology, using ontology-based reasoning systems working with the Phenoscape Knowledgebase (KB; kb.phenoscape.org), to automatically integrate large amounts of evolutionary character state descriptions into a synthetic character matrix of neomorphic (presence/absence) data. Using the KB, which includes more than 55 studies of sarcopterygian taxa, we generated a synthetic supermatrix of 639 variable characters scored for 1051 taxa, resulting in over 145,000 populated cells. Of these characters, over 76% were made variable through the addition of inferred presence/absence states derived by machine reasoning over the formal semantics of the source ontologies. Inferred data reduced the missing data in the variable character-subset from 98.5% to 78.2%. Machine reasoning also enables the isolation of conflicts in the data, that is, cells where both presence and absence are indicated; reports regarding conflicting data provenance can be generated automatically. Further, reasoning enables quantification and new visualizations of the data, here for example, allowing identification of character space that has been undersampled across the fin-to-limb transition. The approach and methods demonstrated here to compute synthetic presence/absence supermatrices are applicable to any taxonomic and phenotypic slice across the tree of life, providing the data are semantically annotated. Because such data can also be linked to model organism genetics through computational scoring of phenotypic similarity, they open a rich set of future research questions into phenotype-to-genome relationships.

journal_name

Syst Biol

journal_title

Systematic biology

authors

Dececchi TA,Balhoff JP,Lapp H,Mabee PM

doi

10.1093/sysbio/syv031

subject

Has Abstract

pub_date

2015-11-01 00:00:00

pages

936-52

issue

6

eissn

1063-5157

issn

1076-836X

pii

syv031

journal_volume

64

pub_type

杂志文章
  • Whole Genome Shotgun Phylogenomics Resolves the Pattern and Timing of Swallowtail Butterfly Evolution.

    abstract::Evolutionary relationships have remained unresolved in many well-studied groups, even though advances in next-generation sequencing and analysis, using approaches such as transcriptomics, anchored hybrid enrichment, or ultraconserved elements, have brought systematics to the brink of whole genome phylogenomics. Recent...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syz030

    authors: Allio R,Scornavacca C,Nabholz B,Clamens AL,Sperling FA,Condamine FL

    更新日期:2020-01-01 00:00:00

  • Evolution of a RNA polymerase gene family in Silene (Caryophyllaceae)-incomplete concerted evolution and topological congruence among paralogues.

    abstract::Four low-copy nuclear DNA intron regions from the second largest subunits of the RNA polymerase gene family (RPA2, RPB2, RPD2a, and RPD2b), the internal transcribed spacers (ITSs) from the nuclear ribosomal regions, and the rps16 intron from the chloroplast were sequenced and used in a phylogenetic analysis of 29 spec...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150490888840

    authors: Popp M,Oxelman B

    更新日期:2004-12-01 00:00:00

  • Nomenclature for the Nameless: A Proposal for an Integrative Molecular Taxonomy of Cryptic Diversity Exemplified by Planktonic Foraminifera.

    abstract::Investigations of biodiversity, biogeography, and ecological processes rely on the identification of "species" as biologically significant, natural units of evolution. In this context, morphotaxonomy only provides an adequate level of resolution if reproductive isolation matches morphological divergence. In many group...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syw031

    authors: Morard R,Escarguel G,Weiner AK,André A,Douady CJ,Wade CM,Darling KF,Ujiié Y,Seears HA,Quillévéré F,de Garidel-Thoron T,de Vargas C,Kucera M

    更新日期:2016-09-01 00:00:00

  • Testing hybridization hypotheses based on incongruent gene trees.

    abstract::Hybridization is an important evolutionary mechanism in plants and has been increasingly documented in animals. Difficulty in reconstruction of reticulate evolution, however, has been a long-standing problem in phylogenetics. Consequently, hybrid speciation may play a major role in causing topological incongruence bet...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635159950127321

    authors: Sang T,Zhong Y

    更新日期:2000-09-01 00:00:00

  • Tracing the temporal and spatial origins of island endemics in the Mediterranean region: a case study from the citrus family (Ruta L., Rutaceae).

    abstract::Understanding the origin of island endemics is a central task of historical biogeography. Recent methodological advances provide a rigorous framework to determine the relative contribution of different biogeographic processes (e.g., vicariance, land migration, long-distance dispersal) to the origin of island endemics....

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syq046

    authors: Salvo G,Ho SY,Rosenbaum G,Ree R,Conti E

    更新日期:2010-12-01 00:00:00

  • Phylogenomics, Origin and Diversification of Anthozoans (Phylum Cnidaria).

    abstract::Anthozoan cnidarians (corals and sea anemones) include some of the world's most important foundation species, capable of building massive reef complexes that support entire ecosystems. Although previous molecular phylogenetic analyses have revealed widespread homoplasy of the morphological characters traditionally use...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syaa103

    authors: McFadden CS,Quattrini AM,Brugler MR,Cowman PF,Dueñas LF,Kitahara MV,Paz-García DA,Reimer JD,Rodríguez E

    更新日期:2021-01-28 00:00:00

  • Is congruence between data partitions a reliable predictor of phylogenetic accuracy? Empirically testing an iterative procedure for choosing among phylogenetic methods.

    abstract::The relationship between phylogenetic accuracy and congruence between data partitions collected from the same taxa was explored for mitochondrial DNA sequences from two well-supported vertebrate phylogenies. An iterative procedure was adopted whereby accuracy, phylogenetic signal, and congruence were measured before a...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/46.3.464

    authors: Cunningham CW

    更新日期:1997-09-01 00:00:00

  • Phylogenetics of flowering plants based on combined analysis of plastid atpB and rbcL gene sequences.

    abstract::Following (1) the large-scale molecular phylogeny of seed plants based on plastid rbcL gene sequences (published in 1993 by Chase et al., Ann. Missouri Bot. Gard. 80:528-580) and (2) the 18S nuclear phylogeny of flowering plants (published in 1997 by Soltis et al., Ann. Missouri Bot. Gard. 84:1-49), we present a phylo...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/49.2.306

    authors: Savolainen V,Chase MW,Hoot SB,Morton CM,Soltis DE,Bayer C,Fay MF,de Bruijn AY,Sullivan S,Qiu YL

    更新日期:2000-06-01 00:00:00

  • Estimating Phylogenies from Shape and Similar Multidimensional Data: Why It Is Not Reliable.

    abstract::In recent years, there has been controversy whether multidimensional data such as geometric morphometric data or information on gene expression can be used for estimating phylogenies. This study uses simulations of evolution in multidimensional phenotype spaces to address this question and to identify specific factors...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syaa003

    authors: Varón-González C,Whelan S,Klingenberg CP

    更新日期:2020-09-01 00:00:00

  • Multiple cophylogenetic analyses reveal frequent cospeciation between pelecaniform birds and Pectinopygus lice.

    abstract::Lice in the genus Pectinopygus parasitize a single order of birds (Pelecaniformes). To examine the degree of congruence between the phylogenies of 17 Pectinopygus species and their pelecaniform hosts, sequences from mitochondrial 12S rRNA, 16S rRNA, COI, and nuclear wingless and EF1-alpha genes (2290 nucleotides) and ...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150701311370

    authors: Hughes J,Kennedy M,Johnson KP,Palma RL,Page RD

    更新日期:2007-04-01 00:00:00

  • The comparative method is not macroevolution: across-species evidence for within-species process.

    abstract::It is common for studies that employ the comparative method for the study of adaptation, i.e. documentation of potentially adaptive across-species patterns of trait-environment or trait-trait correlation, to be designated as "macroevolutionary." Authors are justified in using "macroevolution" in this way by appeal to ...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syaa086

    authors: Olson ME

    更新日期:2021-01-07 00:00:00

  • Testing for Independence between Evolutionary Processes.

    abstract::Evolutionary events co-occurring along phylogenetic trees usually point to complex adaptive phenomena, sometimes implicating epistasis. While a number of methods have been developed to account for co-occurrence of events on the same internal or external branch of an evolutionary tree, there is a need to account for th...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syw004

    authors: Behdenna A,Pothier J,Abby SS,Lambert A,Achaz G

    更新日期:2016-09-01 00:00:00

  • Biogeography explains cophylogenetic patterns in toucan chewing lice.

    abstract::Historically, comparisons of host and parasite phylogenies have concentrated on cospeciation. However, many of these comparisons have demonstrated that the phylogenies of hosts and parasites are seldom completely congruent, suggesting that phenomena other than cospeciation play an important role in the evolution of ho...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150490265085

    authors: Weckstein JD

    更新日期:2004-02-01 00:00:00

  • Phylogenetic signal variation in the genomes of Medicago (Fabaceae).

    abstract::Genome-scale data offer the opportunity to clarify phylogenetic relationships that are difficult to resolve with few loci, but they can also identify genomic regions with evolutionary history distinct from that of the species history. We collected whole-genome sequence data from 29 taxa in the legume genus Medicago, t...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syt009

    authors: Yoder JB,Briskine R,Mudge J,Farmer A,Paape T,Steele K,Weiblen GD,Bharti AK,Zhou P,May GD,Young ND,Tiffin P

    更新日期:2013-05-01 00:00:00

  • Monophyly, Taxon Sampling, and the Nature of Ranks in the Classification of Orb-Weaving Spiders (Araneae: Araneoidea).

    abstract::We address some of the taxonomic and classification changes proposed by Kuntner et al. (2019) in a comparative study on the evolution of sexual size dimorphism in nephiline spiders. Their proposal to recircumscribe araneids and to rank the subfamily Nephilinae as a family is fundamentally flawed as it renders the fami...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syz043

    authors: Kallal RJ,Dimitrov D,Arnedo MA,Giribet G,Hormiga G

    更新日期:2020-03-01 00:00:00

  • Gene Tree Discordance Causes Apparent Substitution Rate Variation.

    abstract::Substitution rates are known to be variable among genes, chromosomes, species, and lineages due to multifarious biological processes. Here, we consider another source of substitution rate variation due to a technical bias associated with gene tree discordance. Discordance has been found to be rampant in genome-wide da...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syw018

    authors: Mendes FK,Hahn MW

    更新日期:2016-07-01 00:00:00

  • Biogeographic interpretation of splits graphs: least squares optimization of branch lengths.

    abstract::Although most often used to represent phylogenetic uncertainty, network methods are also potentially useful for describing the phylogenetic complexity expected to characterize recent species radiations. One network method with particular advantages in this context is split decomposition. However, in its standard imple...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150590906046

    authors: Winkworth R,Bryant D,Lockhart P,Havell D,Moulton V

    更新日期:2005-02-01 00:00:00

  • Testing congruence in phylogenomic analysis.

    abstract::Phylogenomic analyses of large sets of genes or proteins have the potential to revolutionize our understanding of the tree of life. However, problems arise because estimated phylogenies from individual loci often differ because of different histories, systematic bias, or stochastic error. We have developed Concaterpil...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150801910436

    authors: Leigh JW,Susko E,Baumgartner M,Roger AJ

    更新日期:2008-02-01 00:00:00

  • Restriction-Site-Associated DNA Sequencing Reveals a Cryptic Viburnum Species on the North American Coastal Plain.

    abstract::Species are the starting point for most studies of ecology and evolution, but the proper circumscription of species can be extremely difficult in morphologically variable lineages, and there are still few convincing examples of molecularly informed species delimitation in plants. Here, we focus on the Viburnum nudum c...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syy084

    authors: Spriggs EL,Eaton DAR,Sweeney PW,Schlutius C,Edwards EJ,Donoghue MJ

    更新日期:2019-03-01 00:00:00

  • Toward an integrated system of clade names.

    abstract::Although the proposition that higher taxa should correspond to clades is widely accepted, current nomenclature does not distinguish clearly between different clades in nested series. In particular, the same name is often applied to a total clade, its crown clade, and clades originating with various nodes, branches, an...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150701656378

    authors: de Queiroz K

    更新日期:2007-12-01 00:00:00

  • Large-scale phylogenies and measuring the performance of phylogenetic estimators.

    abstract::Performance measures of phylogenetic estimation methods such as accuracy, consistency, and power are an attempt at summarizing an ensemble of a given estimator's behavior. These summaries characterize an ensemble behavior with a single number, leading to a variety of definitions. In particular, the relationships betwe...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/106351598261021

    authors: Kim J

    更新日期:1998-03-01 00:00:00

  • Computing Bayes factors using thermodynamic integration.

    abstract::In the Bayesian paradigm, a common method for comparing two models is to compute the Bayes factor, defined as the ratio of their respective marginal likelihoods. In recent phylogenetic works, the numerical evaluation of marginal likelihoods has often been performed using the harmonic mean estimation procedure. In the ...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150500433722

    authors: Lartillot N,Philippe H

    更新日期:2006-04-01 00:00:00

  • Predicting total global species richness using rates of species description and estimates of taxonomic effort.

    abstract::We found that trends in the rate of description of 580,000 marine and terrestrial species, in the taxonomically authoritative World Register of Marine Species and Catalogue of Life databases, were similar until the 1950s. Since then, the relative number of marine to terrestrial species described per year has increased...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syr080

    authors: Costello MJ,Wilson S,Houlding B

    更新日期:2012-10-01 00:00:00

  • More characters or more taxa for a robust phylogeny--case study from the coffee family (Rubiaceae).

    abstract::Using different data sets mainly from the plant family Rubiaceae, but in parts also from the Apocynaceae, Asteraceae, Lardizabalaceae, Saxifragaceae, and Solanaceae, we have investigated the effect of number of characters, number of taxa, and kind of data on bootstrap values within phylogenetic trees. The percentage o...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/106351599260085

    authors: Bremer B,Jansen RK,Oxelman B,Backlund M,Lantz H,Kim KJ

    更新日期:1999-09-01 00:00:00

  • Bacterial species and speciation.

    abstract::Bacteria are profoundly different from eukaryotes in their patterns of genetic exchange. Nevertheless, ecological diversity is organized in the same way across all of life: individual organisms fall into more less discrete clusters on the basis of their phenotypic, ecological, and DNA sequence characteristics. Each se...

    journal_title:Systematic biology

    pub_type: 杂志文章,评审

    doi:10.1080/10635150118398

    authors: Cohan FM

    更新日期:2001-08-01 00:00:00

  • Independent contrasts and PGLS regression estimators are equivalent.

    abstract::We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has seve...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syr118

    authors: Blomberg SP,Lefevre JG,Wells JA,Waterhouse M

    更新日期:2012-05-01 00:00:00

  • Untangling complex histories of genome mergings in high polyploids.

    abstract::Polyploidy, the duplication of entire genomes, plays a major role in plant evolution. In allopolyploids, genome duplication is associated with hybridization between two or more divergent genomes. Successive hybridization and polyploidization events can build up species complexes of allopolyploids with complicated netw...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150701424553

    authors: Brysting AK,Oxelman B,Huber KT,Moulton V,Brochmann C

    更新日期:2007-06-01 00:00:00

  • Multilocus phylogenetics of a rapid radiation in the genus Thomomys (Rodentia: Geomyidae).

    abstract::Species complexes undergoing rapid radiation present a challenge in molecular systematics because of the possibility that ancestral polymorphism is retained in component gene trees. Coalescent theory has demonstrated that gene trees often fail to match lineage trees when taxon divergence times are less than the ancest...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150802044011

    authors: Belfiore NM,Liu L,Moritz C

    更新日期:2008-04-01 00:00:00

  • Phylogenetic Trees and Networks Can Serve as Powerful and Complementary Approaches for Analysis of Genomic Data.

    abstract::Genomic data have had a profound impact on nearly every biological discipline. In systematics and phylogenetics, the thousands of loci that are now being sequenced can be analyzed under the multispecies coalescent model (MSC) to explicitly account for gene tree discordance due to incomplete lineage sorting (ILS). Howe...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1093/sysbio/syz056

    authors: Blair C,Ané C

    更新日期:2020-05-01 00:00:00

  • Arrival and diversification of caviomorph rodents and platyrrhine primates in South America.

    abstract::Platyrrhine primates and caviomorph rodents are clades of mammals that colonized South America during its period of isolation from the other continents, between 100 and 3 million years ago (Mya). Until now, no molecular study investigated the timing of the South American colonization by these two lineages with the sam...

    journal_title:Systematic biology

    pub_type: 杂志文章

    doi:10.1080/10635150500481390

    authors: Poux C,Chevret P,Huchon D,de Jong WW,Douzery EJ

    更新日期:2006-04-01 00:00:00