Abstract:
BACKGROUND:Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. RESULTS:Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. CONCLUSIONS:Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Zamani N,Sundström G,Meadows JR,Höppner MP,Dainat J,Lantz H,Haas BJ,Grabherr MGdoi
10.1186/1471-2105-15-227subject
Has Abstractpub_date
2014-06-30 00:00:00pages
227issn
1471-2105pii
1471-2105-15-227journal_volume
15pub_type
杂志文章abstract:BACKGROUND:Molecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype. Reliable signature discovery methods could yield valuable insight into cell biology and mechanisms of human disease. However, it is currently not clear how to control ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-38
更新日期:2009-01-29 00:00:00
abstract:BACKGROUND:Single amino acid repeats make up a significant proportion in all of the proteomes that have currently been determined. They have been shown to be functionally and medically significant, and are associated with cancers and neuro-degenerative diseases such as Huntington's Chorea, where a poly-glutamine repeat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-196
更新日期:2005-08-03 00:00:00
abstract:BACKGROUND:Protein sequence motifs are by definition short fragments of conserved amino acids, often associated with a specific function. Accordingly protein sequence profiles derived from multiple sequence alignments provide an alternative description of functional motifs characterizing families of related sequences. ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-164
更新日期:2005-06-29 00:00:00
abstract:BACKGROUND:We carried out an analysis of intron length conservation across a diverse group of nineteen mammalian species. Motivated by recent research suggesting a role for time delays associated with intron transcription in gene expression oscillations required for early embryonic patterning, we searched for examples ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S9-S16
更新日期:2011-10-05 00:00:00
abstract:BACKGROUND:determining beforehand specific positions to align (anchor points) has proved valuable for the accuracy of automated multiple sequence alignment (MSA) software. This feature can be used manually to include biological expertise, or automatically, usually by pairwise similarity searches. Multiple local similar...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-445
更新日期:2010-09-02 00:00:00
abstract:BACKGROUND:The Bacteria Biotope (BB) task is a biomedical relation extraction (RE) that aims to study the interaction between bacteria and their locations. This task is considered to pertain to fundamental knowledge in applied microbiology. Some previous investigations conducted the study by applying feature-based mode...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3217-3
更新日期:2019-12-03 00:00:00
abstract::Time course gene expression experiments are a popular means to infer co-expression. Many methods have been proposed to cluster genes or to build networks based on similarity measures of their expression dynamics. In this paper we apply a correlation based approach to network reconstruction to three datasets of time se...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-S1-S16
更新日期:2007-03-08 00:00:00
abstract:BACKGROUND:Modern plant taxonomy reflects phylogenetic relationships among taxa based on proposed morphological and genetic similarities. However, taxonomical relation is not necessarily reflected by close overall resemblance, but rather by commonality of very specific morphological characters or similarity on the mole...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2474-x
更新日期:2019-01-03 00:00:00
abstract:BACKGROUND:Managing and organizing biological knowledge remains a major challenge, due to the complexity of living systems. Recently, systemic representations have been promising in tackling such a challenge at the whole-cell scale. In such representations, the cell is considered as a system composed of interlocked sub...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03637-9
更新日期:2020-07-23 00:00:00
abstract:BACKGROUND:The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform them in information easily accessible to biologists. RESULTS:By exploiting expression data publicly available in the Gene Expression Omnibus (GEO) d...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S1-S6
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:In the analysis of networks we frequently require the statistical significance of some network statistic, such as measures of similarity for the properties of interacting nodes. The structure of the network may introduce dependencies among the nodes and it will in general be necessary to account for these de...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-467
更新日期:2007-11-30 00:00:00
abstract:BACKGROUND:To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides inf...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-284
更新日期:2006-06-06 00:00:00
abstract:BACKGROUND:In metabolomics experiments, spectral fingerprints of metabolites with no known structural identity are detected routinely. Computer-assisted structure elucidation (CASE) has been used to determine the structural identities of unknown compounds. It is generally accepted that a single 1D NMR spectrum or mass ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-234
更新日期:2014-07-05 00:00:00
abstract:BACKGROUND:De novo assembly of RNA-seq data allows the study of transcriptome in absence of a reference genome either if data is obtained from a single organism or from a mixed sample as in metatranscriptomics studies. Given the high number of sequences obtained from NGS approaches, a critical step in any analysis work...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2174-6
更新日期:2018-07-09 00:00:00
abstract:BACKGROUND:Protein function in eukaryotic cells is often controlled in a cell cycle-dependent manner. Therefore, the correct assignment of cellular phenotypes to cell cycle phases is a crucial task in cell biology research. Nuclear proteins whose localization varies during the cell cycle are valuable and frequently use...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0618-9
更新日期:2015-05-29 00:00:00
abstract:BACKGROUND:The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S19-S14
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:An organism's protein interactome, or complete network of protein-protein interactions, defines the protein complexes that drive cellular processes. Techniques for studying protein complexes have traditionally applied targeted strategies such as yeast two-hybrid or affinity purification-mass spectrometry to ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1865-8
更新日期:2017-10-23 00:00:00
abstract:BACKGROUND:The aberrant expression of microRNAs is closely connected to the occurrence and development of a great deal of human diseases. To study human diseases, numerous effective computational models that are valuable and meaningful have been presented by researchers. RESULTS:Here, we present a computational framew...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3409-x
更新日期:2020-02-18 00:00:00
abstract::Following publication of the original article [1], the author reported that there are several errors in the original article. ...
journal_title:BMC bioinformatics
pub_type: 杂志文章,已发布勘误
doi:10.1186/s12859-019-3318-z
更新日期:2020-01-22 00:00:00
abstract:BACKGROUND:Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; http://egenbio.lsu.edu) to begin to address this....
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-S2-S7
更新日期:2006-09-06 00:00:00
abstract:BACKGROUND:Biochemically detailed stoichiometric matrices have now been reconstructed for various bacteria, yeast, and for the human cardiac mitochondrion based on genomic and proteomic data. These networks have been manually curated based on legacy data and elementally and charge balanced. Comparative analysis of thes...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-111
更新日期:2006-03-06 00:00:00
abstract:BACKGROUND:Analyzing the amino acid sequence of an intrinsically disordered protein (IDP) in an evolutionary context can yield novel insights on the functional role of disordered regions and sequence element(s). However, in the case of many IDPs, the lack of evolutionary conservation of the primary sequence can hamper ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0592-2
更新日期:2015-05-13 00:00:00
abstract:BACKGROUND:Protein-protein interactions (PPIs) can offer compelling evidence for protein function, especially when viewed in the context of proteome-wide interactomes. Bacteria have been popular subjects of interactome studies: more than six different bacterial species have been the subjects of comprehensive interactom...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1585-0
更新日期:2017-03-16 00:00:00
abstract:BACKGROUND:Two-dimensional electrophoresis (2DE) is one of the most widely applied techniques in comparative proteomics. The basic task of 2DE is to identify differential protein expression by quantitative analysis of 2DE images. To reduce the errors of spot quantification in 2DE images, a novel brightness correction m...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3432-y
更新日期:2020-03-19 00:00:00
abstract:BACKGROUND:Enzyme active sites can be connected to the exterior environment by one or more channels passing through the protein. Despite our current knowledge of enzyme structure and function, surprisingly little is known about how often channels are present or about any structural features such channels may have in co...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0379-x
更新日期:2014-11-18 00:00:00
abstract:BACKGROUND:Blood pressure diseases have increasingly been identified as among the main factors threatening human health. How to accurately and conveniently measure blood pressure is the key to the implementation of effective prevention and control measures for blood pressure diseases. Traditional blood pressure measure...
journal_title:BMC bioinformatics
pub_type: 临床试验,杂志文章
doi:10.1186/s12859-019-2667-y
更新日期:2019-02-28 00:00:00
abstract:BACKGROUND:Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but h...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-108
更新日期:2010-02-25 00:00:00
abstract:BACKGROUND:T-cell epitopes that promiscuously bind to multiple alleles of a human leukocyte antigen (HLA) supertype are prime targets for development of vaccines and immunotherapies because they are relevant to a large proportion of the human population. The presence of clusters of promiscuous T-cell epitopes, immunolo...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S1-S19
更新日期:2008-01-01 00:00:00
abstract:BACKGROUND:Mammalian germ cells undergo meiosis to produce sperm or eggs, haploid cells that are primed to meet and propagate life. Meiosis is initiated by retinoic acid and meiotic prophase is the first and most complex stage of meiosis when homologous chromosomes pair to exchange genetic information. Errors in meiosi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-72
更新日期:2013-02-27 00:00:00
abstract:BACKGROUND:Identification of homologous genes is fundamental to comparative genomics, functional genomics and phylogenomics. Extensive public homology databases are of great value for investigating homology but need to be continually updated to incorporate new sequences. As new sequences are rapidly being generated, th...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2362-4
更新日期:2018-09-26 00:00:00