A universal genomic coordinate translator for comparative genomics.

Abstract:

BACKGROUND:Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. RESULTS:Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. CONCLUSIONS:Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Zamani N,Sundström G,Meadows JR,Höppner MP,Dainat J,Lantz H,Haas BJ,Grabherr MG

doi

10.1186/1471-2105-15-227

subject

Has Abstract

pub_date

2014-06-30 00:00:00

pages

227

issn

1471-2105

pii

1471-2105-15-227

journal_volume

15

pub_type

杂志文章
  • On reliable discovery of molecular signatures.

    abstract:BACKGROUND:Molecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype. Reliable signature discovery methods could yield valuable insight into cell biology and mechanisms of human disease. However, it is currently not clear how to control ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-38

    authors: Nilsson R,Björkegren J,Tegnér J

    更新日期:2009-01-29 00:00:00

  • COPASAAR--a database for proteomic analysis of single amino acid repeats.

    abstract:BACKGROUND:Single amino acid repeats make up a significant proportion in all of the proteomes that have currently been determined. They have been shown to be functionally and medically significant, and are associated with cancers and neuro-degenerative diseases such as Huntington's Chorea, where a poly-glutamine repeat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-196

    authors: Depledge DP,Dalby AR

    更新日期:2005-08-03 00:00:00

  • Species-specific analysis of protein sequence motifs using mutual information.

    abstract:BACKGROUND:Protein sequence motifs are by definition short fragments of conserved amino acids, often associated with a specific function. Accordingly protein sequence profiles derived from multiple sequence alignments provide an alternative description of functional motifs characterizing families of related sequences. ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-164

    authors: Hummel J,Keshvari N,Weckwerth W,Selbig J

    更新日期:2005-06-29 00:00:00

  • Evidence for intron length conservation in a set of mammalian genes associated with embryonic development.

    abstract:BACKGROUND:We carried out an analysis of intron length conservation across a diverse group of nineteen mammalian species. Motivated by recent research suggesting a role for time delays associated with intron transcription in gene expression oscillations required for early embryonic patterning, we searched for examples ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S9-S16

    authors: Seoighe C,Korir PK

    更新日期:2011-10-05 00:00:00

  • Automatic detection of anchor points for multiple sequence alignment.

    abstract:BACKGROUND:determining beforehand specific positions to align (anchor points) has proved valuable for the accuracy of automated multiple sequence alignment (MSA) software. This feature can be used manually to include biological expertise, or automatically, usually by pairwise similarity searches. Multiple local similar...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-445

    authors: Pitschi F,Devauchelle C,Corel E

    更新日期:2010-09-02 00:00:00

  • Relation extraction between bacteria and biotopes from biomedical texts with attention mechanisms and domain-specific contextual representations.

    abstract:BACKGROUND:The Bacteria Biotope (BB) task is a biomedical relation extraction (RE) that aims to study the interaction between bacteria and their locations. This task is considered to pertain to fundamental knowledge in applied microbiology. Some previous investigations conducted the study by applying feature-based mode...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3217-3

    authors: Jettakul A,Wichadakul D,Vateekul P

    更新日期:2019-12-03 00:00:00

  • Correlation analysis reveals the emergence of coherence in the gene expression dynamics following system perturbation.

    abstract::Time course gene expression experiments are a popular means to infer co-expression. Many methods have been proposed to cluster genes or to build networks based on similarity measures of their expression dynamics. In this paper we apply a correlation based approach to network reconstruction to three datasets of time se...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S1-S16

    authors: Neretti N,Remondini D,Tatar M,Sedivy JM,Pierini M,Mazzatti D,Powell J,Franceschi C,Castellani GC

    更新日期:2007-03-08 00:00:00

  • Image-based classification of plant genus and family for trained and untrained plant species.

    abstract:BACKGROUND:Modern plant taxonomy reflects phylogenetic relationships among taxa based on proposed morphological and genetic similarities. However, taxonomical relation is not necessarily reflected by close overall resemblance, but rather by commonality of very specific morphological characters or similarity on the mole...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2474-x

    authors: Seeland M,Rzanny M,Boho D,Wäldchen J,Mäder P

    更新日期:2019-01-03 00:00:00

  • BiPOm: a rule-based ontology to represent and infer molecule knowledge from a biological process-centered viewpoint.

    abstract:BACKGROUND:Managing and organizing biological knowledge remains a major challenge, due to the complexity of living systems. Recently, systemic representations have been promising in tackling such a challenge at the whole-cell scale. In such representations, the cell is considered as a system composed of interlocked sub...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03637-9

    authors: Henry V,Saïs F,Inizan O,Marchadier E,Dibie J,Goelzer A,Fromion V

    更新日期:2020-07-23 00:00:00

  • CorrelaGenes: a new tool for the interpretation of the human transcriptome.

    abstract:BACKGROUND:The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform them in information easily accessible to biologists. RESULTS:By exploiting expression data publicly available in the Gene Expression Omnibus (GEO) d...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S1-S6

    authors: Cremaschi P,Rovida S,Sacchi L,Lisa A,Calvi F,Montecucco A,Biamonti G,Bione S,Sacchi G

    更新日期:2014-01-01 00:00:00

  • Generating confidence intervals on biological networks.

    abstract:BACKGROUND:In the analysis of networks we frequently require the statistical significance of some network statistic, such as measures of similarity for the properties of interacting nodes. The structure of the network may introduce dependencies among the nodes and it will in general be necessary to account for these de...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-467

    authors: Thorne T,Stumpf MP

    更新日期:2007-11-30 00:00:00

  • Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks.

    abstract:BACKGROUND:To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides inf...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-284

    authors: Oh SJ,Joung JG,Chang JH,Zhang BT

    更新日期:2006-06-06 00:00:00

  • Building blocks for automated elucidation of metabolites: natural product-likeness for candidate ranking.

    abstract:BACKGROUND:In metabolomics experiments, spectral fingerprints of metabolites with no known structural identity are detected routinely. Computer-assisted structure elucidation (CASE) has been used to determine the structural identities of unknown compounds. It is generally accepted that a single 1D NMR spectrum or mass ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-234

    authors: Jayaseelan KV,Steinbeck C

    更新日期:2014-07-05 00:00:00

  • STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow.

    abstract:BACKGROUND:De novo assembly of RNA-seq data allows the study of transcriptome in absence of a reference genome either if data is obtained from a single organism or from a mixed sample as in metatranscriptomics studies. Given the high number of sequences obtained from NGS approaches, a critical step in any analysis work...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2174-6

    authors: Saggese I,Bona E,Conway M,Favero F,Ladetto M,Liò P,Manzini G,Mignone F

    更新日期:2018-07-09 00:00:00

  • Discrimination of cell cycle phases in PCNA-immunolabeled cells.

    abstract:BACKGROUND:Protein function in eukaryotic cells is often controlled in a cell cycle-dependent manner. Therefore, the correct assignment of cellular phenotypes to cell cycle phases is a crucial task in cell biology research. Nuclear proteins whose localization varies during the cell cycle are valuable and frequently use...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0618-9

    authors: Schönenberger F,Deutzmann A,Ferrando-May E,Merhof D

    更新日期:2015-05-29 00:00:00

  • Restricted DCJ-indel model: sorting linear genomes with DCJ and indels.

    abstract:BACKGROUND:The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S19-S14

    authors: da Silva PH,Machado R,Dantas S,Braga MD

    更新日期:2012-01-01 00:00:00

  • A rapid and accurate approach for prediction of interactomes from co-elution data (PrInCE).

    abstract:BACKGROUND:An organism's protein interactome, or complete network of protein-protein interactions, defines the protein complexes that drive cellular processes. Techniques for studying protein complexes have traditionally applied targeted strategies such as yeast two-hybrid or affinity purification-mass spectrometry to ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1865-8

    authors: Stacey RG,Skinnider MA,Scott NE,Foster LJ

    更新日期:2017-10-23 00:00:00

  • Graph regularized L2,1-nonnegative matrix factorization for miRNA-disease association prediction.

    abstract:BACKGROUND:The aberrant expression of microRNAs is closely connected to the occurrence and development of a great deal of human diseases. To study human diseases, numerous effective computational models that are valuable and meaningful have been presented by researchers. RESULTS:Here, we present a computational framew...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3409-x

    authors: Gao Z,Wang YT,Wu QW,Ni JC,Zheng CH

    更新日期:2020-02-18 00:00:00

  • Correction to: Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage.

    abstract::Following publication of the original article [1], the author reported that there are several errors in the original article. ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章,已发布勘误

    doi:10.1186/s12859-019-3318-z

    authors: Ranjard L,Wong TKF,Rodrigo AG

    更新日期:2020-01-22 00:00:00

  • EGenBio: a data management system for evolutionary genomics and biodiversity.

    abstract:BACKGROUND:Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; http://egenbio.lsu.edu) to begin to address this....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-S2-S7

    authors: Nahum LA,Reynolds MT,Wang ZO,Faith JJ,Jonna R,Jiang ZJ,Meyer TJ,Pollock DD

    更新日期:2006-09-06 00:00:00

  • Metabolite coupling in genome-scale metabolic networks.

    abstract:BACKGROUND:Biochemically detailed stoichiometric matrices have now been reconstructed for various bacteria, yeast, and for the human cardiac mitochondrion based on genomic and proteomic data. These networks have been manually curated based on legacy data and elementally and charge balanced. Comparative analysis of thes...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-111

    authors: Becker SA,Price ND,Palsson BØ

    更新日期:2006-03-06 00:00:00

  • DisCons: a novel tool to quantify and classify evolutionary conservation of intrinsic protein disorder.

    abstract:BACKGROUND:Analyzing the amino acid sequence of an intrinsically disordered protein (IDP) in an evolutionary context can yield novel insights on the functional role of disordered regions and sequence element(s). However, in the case of many IDPs, the lack of evolutionary conservation of the primary sequence can hamper ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0592-2

    authors: Varadi M,Guharoy M,Zsolyomi F,Tompa P

    更新日期:2015-05-13 00:00:00

  • Bacterial protein meta-interactomes predict cross-species interactions and protein function.

    abstract:BACKGROUND:Protein-protein interactions (PPIs) can offer compelling evidence for protein function, especially when viewed in the context of proteome-wide interactomes. Bacteria have been popular subjects of interactome studies: more than six different bacterial species have been the subjects of comprehensive interactom...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1585-0

    authors: Caufield JH,Wimble C,Shary S,Wuchty S,Uetz P

    更新日期:2017-03-16 00:00:00

  • 2D electrophoresis image brightness correction based on gradient interval histogram.

    abstract:BACKGROUND:Two-dimensional electrophoresis (2DE) is one of the most widely applied techniques in comparative proteomics. The basic task of 2DE is to identify differential protein expression by quantitative analysis of 2DE images. To reduce the errors of spot quantification in 2DE images, a novel brightness correction m...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3432-y

    authors: Ou Q,Xiao J,Yu L,Wu K,Xiong B

    更新日期:2020-03-19 00:00:00

  • Anatomy of enzyme channels.

    abstract:BACKGROUND:Enzyme active sites can be connected to the exterior environment by one or more channels passing through the protein. Despite our current knowledge of enzyme structure and function, surprisingly little is known about how often channels are present or about any structural features such channels may have in co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0379-x

    authors: Pravda L,Berka K,Svobodová Vařeková R,Sehnal D,Banáš P,Laskowski RA,Koča J,Otyepka M

    更新日期:2014-11-18 00:00:00

  • Predicting blood pressure from physiological index data using the SVR algorithm.

    abstract:BACKGROUND:Blood pressure diseases have increasingly been identified as among the main factors threatening human health. How to accurately and conveniently measure blood pressure is the key to the implementation of effective prevention and control measures for blood pressure diseases. Traditional blood pressure measure...

    journal_title:BMC bioinformatics

    pub_type: 临床试验,杂志文章

    doi:10.1186/s12859-019-2667-y

    authors: Zhang B,Ren H,Huang G,Cheng Y,Hu C

    更新日期:2019-02-28 00:00:00

  • SplicerAV: a tool for mining microarray expression data for changes in RNA processing.

    abstract:BACKGROUND:Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but h...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-108

    authors: Robinson TJ,Dinan MA,Dewhirst M,Garcia-Blanco MA,Pearson JL

    更新日期:2010-02-25 00:00:00

  • Hotspot Hunter: a computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes.

    abstract:BACKGROUND:T-cell epitopes that promiscuously bind to multiple alleles of a human leukocyte antigen (HLA) supertype are prime targets for development of vaccines and immunotherapies because they are relevant to a large proportion of the human population. The presence of clusters of promiscuous T-cell epitopes, immunolo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S1-S19

    authors: Zhang GL,Khan AM,Srinivasan KN,Heiny A,Lee K,Kwoh CK,August JT,Brusic V

    更新日期:2008-01-01 00:00:00

  • Identification of germ cell-specific genes in mammalian meiotic prophase.

    abstract:BACKGROUND:Mammalian germ cells undergo meiosis to produce sperm or eggs, haploid cells that are primed to meet and propagate life. Meiosis is initiated by retinoic acid and meiotic prophase is the first and most complex stage of meiosis when homologous chromosomes pair to exchange genetic information. Errors in meiosi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-72

    authors: Li Y,Ray D,Ye P

    更新日期:2013-02-27 00:00:00

  • Efficient inference of homologs in large eukaryotic pan-proteomes.

    abstract:BACKGROUND:Identification of homologous genes is fundamental to comparative genomics, functional genomics and phylogenomics. Extensive public homology databases are of great value for investigating homology but need to be continually updated to incorporate new sequences. As new sequences are rapidly being generated, th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2362-4

    authors: Sheikhizadeh Anari S,de Ridder D,Schranz ME,Smit S

    更新日期:2018-09-26 00:00:00