A database of phylogenetically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm.

Abstract:

BACKGROUND:The process of horizontal gene transfer (HGT) is believed to be widespread in Bacteria and Archaea, but little comparative data is available addressing its occurrence in complete microbial genomes. Collection of high-quality, automated HGT prediction data based on phylogenetic evidence has previously been impractical for large numbers of genomes at once, due to prohibitive computational demands. DarkHorse, a recently described statistical method for discovering phylogenetically atypical genes on a genome-wide basis, provides a means to solve this problem through lineage probability index (LPI) ranking scores. LPI scores inversely reflect phylogenetic distance between a test amino acid sequence and its closest available database matches. Proteins with low LPI scores are good horizontal gene transfer candidates; those with high scores are not. DESCRIPTION:The DarkHorse algorithm has been applied to 955 microbial genome sequences, and the results organized into a web-searchable relational database, called the DarkHorse HGT Candidate Resource http://darkhorse.ucsd.edu. Users can select individual genomes or groups of genomes to screen by LPI score, search for protein functions by descriptive annotation or amino acid sequence similarity, or select proteins with unusual G+C composition in their underlying coding sequences. The search engine reports LPI scores for match partners as well as query sequences, providing the opportunity to explore whether potential HGT donor sequences are phylogenetically typical or atypical within their own genomes. This information can be used to predict whether or not sufficient information is available to build a well-supported phylogenetic tree using the potential donor sequence. CONCLUSION:The DarkHorse HGT Candidate database provides a powerful, flexible set of tools for identifying phylogenetically atypical proteins, allowing researchers to explore both individual HGT events in single genomes, and large-scale HGT patterns among protein families and genome groups. Although the DarkHorse algorithm cannot, by itself, provide definitive proof of horizontal gene transfer, it is a flexible, powerful tool that can be combined with slower, more rigorous methods in situations where these other methods could not otherwise be applied.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Podell S,Gaasterland T,Allen EE

doi

10.1186/1471-2105-9-419

subject

Has Abstract

pub_date

2008-10-07 00:00:00

pages

419

issn

1471-2105

pii

1471-2105-9-419

journal_volume

9

pub_type

杂志文章
  • metaBEETL: high-throughput analysis of heterogeneous microbial populations from shotgun DNA sequences.

    abstract::Environmental shotgun sequencing (ESS) has potential to give greater insight into microbial communities than targeted sequencing of 16S regions, but requires much higher sequence coverage. The advent of next-generation sequencing has made it feasible for the Human Microbiome Project and other initiatives to generate E...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S5-S2

    authors: Ander C,Schulz-Trieglaff OB,Stoye J,Cox AJ

    更新日期:2013-01-01 00:00:00

  • Stochastic models for the in silico simulation of synaptic processes.

    abstract:BACKGROUND:Research in life sciences is benefiting from a large availability of formal description techniques and analysis methodologies. These allow both the phenomena investigated to be precisely modeled and virtual experiments to be performed in silico. Such experiments may result in easier, faster, and satisfying a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S4-S7

    authors: Bracciali A,Brunelli M,Cataldo E,Degano P

    更新日期:2008-04-25 00:00:00

  • Multi-view feature selection for identifying gene markers: a diversified biological data driven approach.

    abstract:BACKGROUND:In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene express...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03810-0

    authors: Acharya S,Cui L,Pan Y

    更新日期:2020-12-30 00:00:00

  • DeepQA: improving the estimation of single protein model quality with deep belief networks.

    abstract:BACKGROUND:Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1405-y

    authors: Cao R,Bhattacharya D,Hou J,Cheng J

    更新日期:2016-12-05 00:00:00

  • MPD: multiplex primer design for next-generation targeted sequencing.

    abstract:BACKGROUND:Targeted resequencing offers a cost-effective alternative to whole-genome and whole-exome sequencing when investigating regions known to be associated with a trait or disease. There are a number of approaches to targeted resequencing, including microfluidic PCR amplification, which may be enhanced by multipl...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1453-3

    authors: Wingo TS,Kotlar A,Cutler DJ

    更新日期:2017-01-05 00:00:00

  • Towards mainstreaming of biodiversity data publishing: recommendations of the GBIF Data Publishing Framework Task Group.

    abstract:BACKGROUND:Data are the evidentiary basis for scientific hypotheses, analyses and publication, for policy formation and for decision-making. They are essential to the evaluation and testing of results by peer scientists both present and future. There is broad consensus in the scientific and conservation communities tha...

    journal_title:BMC bioinformatics

    pub_type: 指南,杂志文章

    doi:10.1186/1471-2105-12-S15-S1

    authors: Moritz T,Krishnan S,Roberts D,Ingwersen P,Agosti D,Penev L,Cockerill M,Chavan V,Data Publishing Framework Task Group.

    更新日期:2011-01-01 00:00:00

  • Island method for estimating the statistical significance of profile-profile alignment scores.

    abstract:BACKGROUND:In the last decade, a significant improvement in detecting remote similarity between protein sequences has been made by utilizing alignment profiles in place of amino-acid strings. Unfortunately, no analytical theory is available for estimating the significance of a gapped alignment of two profiles. Many exp...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-112

    authors: Poleksic A

    更新日期:2009-04-20 00:00:00

  • Sequencing error correction without a reference genome.

    abstract:BACKGROUND:Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing errors fr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-367

    authors: Sleep JA,Schreiber AW,Baumann U

    更新日期:2013-12-18 00:00:00

  • An automated method for rapid identification of putative gene family members in plants.

    abstract:BACKGROUND:Gene duplication events have played a significant role in genome evolution, particularly in plants. Exhaustive searches for all members of a known gene family as well as the identification of new gene families has become increasingly important. Subfunctionalization via changes in regulatory sequences followi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-S2-S19

    authors: Frank RL,Mane A,Ercal F

    更新日期:2006-09-06 00:00:00

  • On the detection of functionally coherent groups of protein domains with an extension to protein annotation.

    abstract:BACKGROUND:Protein domains coordinate to perform multifaceted cellular functions, and domain combinations serve as the functional building blocks of the cell. The available methods to identify functional domain combinations are limited in their scope, e.g. to the identification of combinations falling within individual...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-390

    authors: McLaughlin WA,Chen K,Hou T,Wang W

    更新日期:2007-10-16 00:00:00

  • The GMOseek matrix: a decision support tool for optimizing the detection of genetically modified plants.

    abstract:BACKGROUND:Since their first commercialization, the diversity of taxa and the genetic composition of transgene sequences in genetically modified plants (GMOs) are constantly increasing. To date, the detection of GMOs and derived products is commonly performed by PCR-based methods targeting specific DNA sequences introd...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-256

    authors: Block A,Debode F,Grohmann L,Hulin J,Taverniers I,Kluga L,Barbau-Piednoir E,Broeders S,Huber I,Van den Bulcke M,Heinze P,Berben G,Busch U,Roosens N,Janssen E,Žel J,Gruden K,Morisset D

    更新日期:2013-08-22 00:00:00

  • Integrated olfactory receptor and microarray gene expression databases.

    abstract:BACKGROUND:Gene expression patterns of olfactory receptors (ORs) are an important component of the signal encoding mechanism in the olfactory system since they determine the interactions between odorant ligands and sensory neurons. We have developed the Olfactory Receptor Microarray Database (ORMD) to house OR gene exp...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-231

    authors: Liu N,Crasto CJ,Ma M

    更新日期:2007-06-30 00:00:00

  • Cell subset prediction for blood genomic studies.

    abstract:BACKGROUND:Genome-wide transcriptional profiling of patient blood samples offers a powerful tool to investigate underlying disease mechanisms and personalized treatment decisions. Most studies are based on analysis of total peripheral blood mononuclear cells (PBMCs), a mixed population. In this case, accuracy is inhere...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-258

    authors: Bolen CR,Uduman M,Kleinstein SH

    更新日期:2011-06-24 00:00:00

  • iMEGES: integrated mental-disorder GEnome score by deep neural network for prioritizing the susceptibility genes for mental disorders in personal genomes.

    abstract:BACKGROUND:A range of rare and common genetic variants have been discovered to be potentially associated with mental diseases, but many more have not been uncovered. Powerful integrative methods are needed to systematically prioritize both variants and genes that confer susceptibility to mental diseases in personal gen...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2469-7

    authors: Khan A,Liu Q,Wang K

    更新日期:2018-12-28 00:00:00

  • An automatic method to calculate heart rate from zebrafish larval cardiac videos.

    abstract:BACKGROUND:Zebrafish is a widely used model organism for studying heart development and cardiac-related pathogenesis. With the ability of surviving without a functional circulation at larval stages, strong genetic similarity between zebrafish and mammals, prolific reproduction and optically transparent embryos, zebrafi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2166-6

    authors: Kang CP,Tu HC,Fu TF,Wu JM,Chu PH,Chang DT

    更新日期:2018-05-09 00:00:00

  • EGNAS: an exhaustive DNA sequence design algorithm.

    abstract:BACKGROUND:The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA) is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of seq...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-138

    authors: Kick A,Bönsch M,Mertig M

    更新日期:2012-06-20 00:00:00

  • Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression.

    abstract:BACKGROUND:The identification of differentially expressed genes (DEGs) from Affymetrix GeneChips arrays is currently done by first computing expression levels from the low-level probe intensities, then deriving significance by comparing these expression levels between conditions. The proposed PL-LM (Probe-Level Linear ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-391

    authors: Lemieux S

    更新日期:2006-08-25 00:00:00

  • Inferring gene expression dynamics via functional regression analysis.

    abstract:BACKGROUND:Temporal gene expression profiles characterize the time-dynamics of expression of specific genes and are increasingly collected in current gene expression experiments. In the analysis of experiments where gene expression is obtained over the life cycle, it is of interest to relate temporal patterns of gene e...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-60

    authors: Müller HG,Chiou JM,Leng X

    更新日期:2008-01-28 00:00:00

  • Evidence for intron length conservation in a set of mammalian genes associated with embryonic development.

    abstract:BACKGROUND:We carried out an analysis of intron length conservation across a diverse group of nineteen mammalian species. Motivated by recent research suggesting a role for time delays associated with intron transcription in gene expression oscillations required for early embryonic patterning, we searched for examples ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S9-S16

    authors: Seoighe C,Korir PK

    更新日期:2011-10-05 00:00:00

  • Automating dChip: toward reproducible sharing of microarray data analysis.

    abstract:BACKGROUND:During the past decade, many software packages have been developed for analysis and visualization of various types of microarrays. We have developed and maintained the widely used dChip as a microarray analysis software package accessible to both biologist and data analysts. However, challenges arise when dC...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-231

    authors: Li C

    更新日期:2008-05-08 00:00:00

  • Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida.

    abstract:BACKGROUND:Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to envi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S7-S7

    authors: Pirooznia M,Gong P,Guan X,Inouye LS,Yang K,Perkins EJ,Deng Y

    更新日期:2007-11-01 00:00:00

  • LS-NMF: a modified non-negative matrix factorization algorithm utilizing uncertainty estimates.

    abstract:BACKGROUND:Non-negative matrix factorisation (NMF), a machine learning algorithm, has been applied to the analysis of microarray data. A key feature of NMF is the ability to identify patterns that together explain the data as a linear combination of expression signatures. Microarray data generally includes individual e...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-175

    authors: Wang G,Kossenkov AV,Ochs MF

    更新日期:2006-03-28 00:00:00

  • μHEM for identification of differentially expressed miRNAs using hypercuboid equivalence partition matrix.

    abstract:BACKGROUND:The miRNAs, a class of short approximately 22-nucleotide non-coding RNAs, often act post-transcriptionally to inhibit mRNA expression. In effect, they control gene expression by targeting mRNA. They also help in carrying out normal functioning of a cell as they play an important role in various cellular proc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-266

    authors: Paul S,Maji P

    更新日期:2013-09-04 00:00:00

  • Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering.

    abstract:BACKGROUND:Microarray technologies produced large amount of data. The hierarchical clustering is commonly used to identify clusters of co-expressed genes. However, microarray datasets often contain missing values (MVs) representing a major drawback for the use of the clustering methods. Usually the MVs are not treated,...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-114

    authors: de Brevern AG,Hazout S,Malpertuy A

    更新日期:2004-08-23 00:00:00

  • Notos - a galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types.

    abstract:BACKGROUND:DNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms. Consequently, it rema...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2115-4

    authors: Bulla I,Aliaga B,Lacal V,Bulla J,Grunau C,Chaparro C

    更新日期:2018-03-27 00:00:00

  • Homology induction: the use of machine learning to improve sequence similarity searches.

    abstract:BACKGROUND:The inference of homology between proteins is a key problem in molecular biology The current best approaches only identify approximately 50% of homologies (with a false positive rate set at 1/1000). RESULTS:We present Homology Induction (HI), a new approach to inferring homology. HI uses machine learning to...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-3-11

    authors: Karwath A,King RD

    更新日期:2002-04-23 00:00:00

  • Colony size measurement of the yeast gene deletion strains for functional genomics.

    abstract:BACKGROUND:Numerous functional genomics approaches have been developed to study the model organism yeast, Saccharomyces cerevisiae, with the aim of systematically understanding the biology of the cell. Some of these techniques are based on yeast growth differences under different conditions, such as those generated by ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-117

    authors: Memarian N,Jessulat M,Alirezaie J,Mir-Rashed N,Xu J,Zareie M,Smith M,Golshani A

    更新日期:2007-04-04 00:00:00

  • SNP and gene networks construction and analysis from classification of copy number variations data.

    abstract:BACKGROUND:Detection of genomic DNA copy number variations (CNVs) can provide a complete and more comprehensive view of human disease. It is interesting to identify and represent relevant CNVs from a genome-wide data due to high data volume and the complexity of interactions. RESULTS:In this paper, we incorporate the ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S5-S4

    authors: Liu Y,Lee YF,Ng MK

    更新日期:2011-01-01 00:00:00

  • Efficient inference of homologs in large eukaryotic pan-proteomes.

    abstract:BACKGROUND:Identification of homologous genes is fundamental to comparative genomics, functional genomics and phylogenomics. Extensive public homology databases are of great value for investigating homology but need to be continually updated to incorporate new sequences. As new sequences are rapidly being generated, th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2362-4

    authors: Sheikhizadeh Anari S,de Ridder D,Schranz ME,Smit S

    更新日期:2018-09-26 00:00:00

  • Selection of optimal reference genes for normalization in quantitative RT-PCR.

    abstract:BACKGROUND:Normalization in real-time qRT-PCR is necessary to compensate for experimental variation. A popular normalization strategy employs reference gene(s), which may introduce additional variability into normalized expression levels due to innate variation (between tissues, individuals, etc). To minimize this inna...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-253

    authors: Chervoneva I,Li Y,Schulz S,Croker S,Wilson C,Waldman SA,Hyslop T

    更新日期:2010-05-14 00:00:00