Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix.

Abstract:

BACKGROUND:Selective pressures at the DNA level shape genes into profiles consisting of patterns of rapidly evolving sites and sites withstanding change. These profiles remain detectable even when protein sequences become extensively diverged. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. This study presents an algorithm that uses the evolutionary rate at codon sites, the dN/dS (ω) parameter, coupled to a substitution matrix as an alignment metric for detecting distantly related proteins. The algorithm, called BLOSUM-FIRE couples a newer and improved version of the original FIRE (Functional Inference using Rates of Evolution) algorithm with an amino acid substitution matrix in a dynamic scoring function. The enigmatic hepatitis B virus X protein was used as a test case for BLOSUM-FIRE and its associated database EvoDB. RESULTS:The evolutionary rate based approach was coupled with a conventional BLOSUM substitution matrix. The two approaches are combined in a dynamic scoring function, which uses the selective pressure to score aligned residues. The dynamic scoring function is based on a coupled additive approach that scores aligned sites based on the level of conservation inferred from the ω values. Evaluation of the accuracy of this new implementation, BLOSUM-FIRE, using MAFFT alignment as reference alignments has shown that it is more accurate than its predecessor FIRE. Comparison of the alignment quality with widely used algorithms (MUSCLE, T-COFFEE, and CLUSTAL Omega) revealed that the BLOSUM-FIRE algorithm performs as well as conventional algorithms. Its main strength lies in that it provides greater potential for aligning divergent sequences and addresses the problem of low specificity inherent in the original FIRE algorithm. The utility of this algorithm is demonstrated using the Hepatitis B virus X (HBx) protein, a protein of unknown function, as a test case. CONCLUSION:This study describes the utility of an evolutionary rate based approach coupled to the BLOSUM62 amino acid substitution matrix in inferring protein domain function. We demonstrate that such an approach is robust and performs as well as an array of conventional algorithms.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Ndhlovu A,Hazelhurst S,Durand PM

doi

10.1186/s12859-015-0688-8

subject

Has Abstract

pub_date

2015-08-14 00:00:00

pages

255

issn

1471-2105

pii

10.1186/s12859-015-0688-8

journal_volume

16

pub_type

杂志文章
  • Fast batch searching for protein homology based on compression and clustering.

    abstract:BACKGROUND:In bioinformatics community, many tasks associate with matching a set of protein query sequences in large sequence datasets. To conduct multiple queries in the database, a common used method is to run BLAST on each original querey or on the concatenated queries. It is inefficient since it doesn't exploit the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1938-8

    authors: Ge H,Sun L,Yu J

    更新日期:2017-11-21 00:00:00

  • Integration of open access literature into the RCSB Protein Data Bank using BioLit.

    abstract:BACKGROUND:Biological data have traditionally been stored and made publicly available through a variety of on-line databases, whereas biological knowledge has traditionally been found in the printed literature. With journals now on-line and providing an increasing amount of open access content, often free of copyright ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-220

    authors: Prlić A,Martinez MA,Dimitropoulos D,Beran B,Yukich BT,Rose PW,Bourne PE,Fink JL

    更新日期:2010-04-29 00:00:00

  • AT excursion: a new approach to predict replication origins in viral genomes by locating AT-rich regions.

    abstract:BACKGROUND:Replication origins are considered important sites for understanding the molecular mechanisms involved in DNA replication. Many computational methods have been developed for predicting their locations in archaeal, bacterial and eukaryotic genomes. However, a prediction method designed for a particular kind o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-163

    authors: Chew DS,Leung MY,Choi KP

    更新日期:2007-05-21 00:00:00

  • Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods.

    abstract:BACKGROUND:Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. In order to fully capture the intrinsic value and knowledge expressed within them, we need to take advantage of their inner structure, which implicitly co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-265

    authors: Groza T,Hunter J,Zankl A

    更新日期:2012-10-15 00:00:00

  • A simple method for assessing sample sizes in microarray experiments.

    abstract:BACKGROUND:In this short article, we discuss a simple method for assessing sample size requirements in microarray experiments. RESULTS:Our method starts with the output from a permutation-based analysis for a set of pilot data, e.g. from the SAM package. Then for a given hypothesized mean difference and various sample...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-106

    authors: Tibshirani R

    更新日期:2006-03-02 00:00:00

  • Missing genes in the annotation of prokaryotic genomes.

    abstract:BACKGROUND:Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting). Therefore the question ari...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-131

    authors: Warren AS,Archuleta J,Feng WC,Setubal JC

    更新日期:2010-03-15 00:00:00

  • BINDER: computationally inferring a gene regulatory network for Mycobacterium abscessus.

    abstract:BACKGROUND:Although many of the genic features in Mycobacterium abscessus have been fully validated, a comprehensive understanding of the regulatory elements remains lacking. Moreover, there is little understanding of how the organism regulates its transcriptomic profile, enabling cells to survive in hostile environmen...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3042-8

    authors: Staunton PM,Miranda-CasoLuengo AA,Loftus BJ,Gormley IC

    更新日期:2019-09-10 00:00:00

  • A new pooling strategy for high-throughput screening: the Shifted Transversal Design.

    abstract:BACKGROUND:In binary high-throughput screening projects where the goal is the identification of low-frequency events, beyond the obvious issue of efficiency, false positives and false negatives are a major concern. Pooling constitutes a natural solution: it reduces the number of tests, while providing critical duplicat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-28

    authors: Thierry-Mieg N

    更新日期:2006-01-19 00:00:00

  • Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms.

    abstract:BACKGROUND:Predicting protein function has become increasingly demanding in the era of next generation sequencing technology. The task to assign a curator-reviewed function to every single sequence is impracticable. Bioinformatics tools, easy to use and able to provide automatic and reliable annotations at a genomic sc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S4-S14

    authors: Falda M,Toppo S,Pescarolo A,Lavezzo E,Di Camillo B,Facchinetti A,Cilia E,Velasco R,Fontana P

    更新日期:2012-03-28 00:00:00

  • ModuleOrganizer: detecting modules in families of transposable elements.

    abstract:BACKGROUND:Most known eukaryotic genomes contain mobile copied elements called transposable elements. In some species, these elements account for the majority of the genome sequence. They have been subject to many mutations and other genomic events (copies, deletions, captures) during transposition. The identification ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-474

    authors: Tempel S,Rousseau C,Tahi F,Nicolas J

    更新日期:2010-09-22 00:00:00

  • Fast and robust group-wise eQTL mapping using sparse graphical models.

    abstract:BACKGROUND:Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. The traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression tra...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0421-z

    authors: Cheng W,Shi Y,Zhang X,Wang W

    更新日期:2015-01-16 00:00:00

  • Quality determination and the repair of poor quality spots in array experiments.

    abstract:BACKGROUND:A common feature of microarray experiments is the occurrence of missing gene expression data. These missing values occur for a variety of reasons, in particular, because of the filtering of poor quality spots and the removal of undefined values when a logarithmic transformation is applied to negative backgro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-234

    authors: Tom BD,Gilks WR,Brooke-Powell ET,Ajioka JW

    更新日期:2005-09-26 00:00:00

  • Functional relevance of dynamic properties of Dimeric NADP-dependent Isocitrate Dehydrogenases.

    abstract:BACKGROUND:Isocitrate Dehydrogenases (IDHs) are important enzymes present in all living cells. Three subfamilies of functionally dimeric IDHs (subfamilies I, II, III) are known. Subfamily I are well-studied bacterial IDHs, like that of Escherischia coli. Subfamily II has predominantly eukaryotic members, but it also ha...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S17-S2

    authors: Vinekar R,Verma C,Ghosh I

    更新日期:2012-01-01 00:00:00

  • Correlation analysis reveals the emergence of coherence in the gene expression dynamics following system perturbation.

    abstract::Time course gene expression experiments are a popular means to infer co-expression. Many methods have been proposed to cluster genes or to build networks based on similarity measures of their expression dynamics. In this paper we apply a correlation based approach to network reconstruction to three datasets of time se...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S1-S16

    authors: Neretti N,Remondini D,Tatar M,Sedivy JM,Pierini M,Mazzatti D,Powell J,Franceschi C,Castellani GC

    更新日期:2007-03-08 00:00:00

  • DART: Denoising Algorithm based on Relevance network Topology improves molecular pathway activity inference.

    abstract:BACKGROUND:Inferring molecular pathway activity is an important step towards reducing the complexity of genomic data, understanding the heterogeneity in clinical outcome, and obtaining molecular correlates of cancer imaging traits. Increasingly, approaches towards pathway activity inference combine molecular profiles (...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-403

    authors: Jiao Y,Lawler K,Patel GS,Purushotham A,Jones AF,Grigoriadis A,Tutt A,Ng T,Teschendorff AE

    更新日期:2011-10-19 00:00:00

  • Enhancing HMM-based protein profile-profile alignment with structural features and evolutionary coupling information.

    abstract:BACKGROUND:Protein sequence profile-profile alignment is an important approach to recognizing remote homologs and generating accurate pairwise alignments. It plays an important role in protein sequence database search, protein structure prediction, protein function prediction, and phylogenetic analysis. RESULTS:In thi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-252

    authors: Deng X,Cheng J

    更新日期:2014-07-25 00:00:00

  • Texture based skin lesion abruptness quantification to detect malignancy.

    abstract:BACKGROUND:Abruptness of pigment patterns at the periphery of a skin lesion is one of the most important dermoscopic features for detection of malignancy. In current clinical setting, abrupt cutoff of a skin lesion determined by an examination of a dermatologist. This process is subjective, nonquantitative, and error-p...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1892-5

    authors: Erol R,Bayraktar M,Kockara S,Kaya S,Halic T

    更新日期:2017-12-28 00:00:00

  • The IronChip evaluation package: a package of perl modules for robust analysis of custom microarrays.

    abstract:BACKGROUND:Gene expression studies greatly contribute to our understanding of complex relationships in gene regulatory networks. However, the complexity of array design, production and manipulations are limiting factors, affecting data quality. The use of customized DNA microarrays improves overall data quality in many...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-112

    authors: Vainshtein Y,Sanchez M,Brazma A,Hentze MW,Dandekar T,Muckenthaler MU

    更新日期:2010-03-01 00:00:00

  • Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.

    abstract:BACKGROUND:In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multip...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1273-5

    authors: Voillet V,Besse P,Liaubet L,San Cristobal M,González I

    更新日期:2016-10-03 00:00:00

  • Novel domain expansion methods to improve the computational efficiency of the Chemical Master Equation solution for large biological networks.

    abstract:BACKGROUND:Numerical solutions of the chemical master equation (CME) are important for understanding the stochasticity of biochemical systems. However, solving CMEs is a formidable task. This task is complicated due to the nonlinear nature of the reactions and the size of the networks which result in different realizat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03668-2

    authors: Kosarwal R,Kulasiri D,Samarasinghe S

    更新日期:2020-11-11 00:00:00

  • A context-blocks model for identifying clinical relationships in patient records.

    abstract:BACKGROUND:Patient records contain valuable information regarding explanation of diagnosis, progression of disease, prescription and/or effectiveness of treatment, and more. Automatic recognition of clinically important concepts and the identification of relationships between those concepts in patient records are preli...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S3-S3

    authors: Islamaj Doğan R,Névéol A,Lu Z

    更新日期:2011-06-09 00:00:00

  • imputeqc: an R package for assessing imputation quality of genotypes and optimizing imputation parameters.

    abstract:BACKGROUND:The imputation of genotypes increases the power of genome-wide association studies. However, the imputation quality should be assessed in each particular case. Nevertheless, not all imputation softwares control the error of output, e.g., the last release of fastPHASE program (1.4.8) lacks such an option. In ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03589-0

    authors: Khvorykh GV,Khrunin AV

    更新日期:2020-07-24 00:00:00

  • Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data.

    abstract:BACKGROUND:Genome imputation, admixture resolution and genome-wide association analyses are timely and computationally intensive processes with many composite and requisite steps. Analysis time increases further when building and installing the run programs required for these analyses. For scientists that may not be as...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2964-5

    authors: Eller RJ,Janga SC,Walsh S

    更新日期:2019-06-28 00:00:00

  • Frnakenstein: multiple target inverse RNA folding.

    abstract:BACKGROUND:RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more rece...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-260

    authors: Lyngsø RB,Anderson JW,Sizikova E,Badugu A,Hyland T,Hein J

    更新日期:2012-10-09 00:00:00

  • HAT: hypergeometric analysis of tiling-arrays with application to promoter-GeneChip data.

    abstract:BACKGROUND:Tiling-arrays are applicable to multiple types of biological research questions. Due to its advantages (high sensitivity, resolution, unbiased), the technology is often employed in genome-wide investigations. A major challenge in the analysis of tiling-array data is to define regions-of-interest, i.e., conti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-275

    authors: Taskesen E,Beekman R,de Ridder J,Wouters BJ,Peeters JK,Touw IP,Reinders MJ,Delwel R

    更新日期:2010-05-21 00:00:00

  • Reranking candidate gene models with cross-species comparison for improved gene prediction.

    abstract:BACKGROUND:Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-433

    authors: Liu Q,Crammer K,Pereira FC,Roos DS

    更新日期:2008-10-14 00:00:00

  • Multi-omic analysis of signalling factors in inflammatory comorbidities.

    abstract:BACKGROUND:Inflammation is a core element of many different, systemic and chronic diseases that usually involve an important autoimmune component. The clinical phase of inflammatory diseases is often the culmination of a long series of pathologic events that started years before. The systemic characteristics and relate...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2413-x

    authors: Xiao H,Bartoszek K,Lio' P

    更新日期:2018-11-30 00:00:00

  • Augmented annotation and orthologue analysis for Oryctolagus cuniculus: Better Bunny.

    abstract:BACKGROUND:The rabbit is an important model organism used in a wide range of biomedical research. However, the rabbit genome is still sparsely annotated, thus prohibiting extensive functional analysis of gene sets derived from whole-genome experiments. We developed a web-based application that provides augmented annota...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-84

    authors: Craig DB,Kannan S,Dombkowski AA

    更新日期:2012-05-08 00:00:00

  • FITBAR: a web tool for the robust prediction of prokaryotic regulons.

    abstract:BACKGROUND:The binding of regulatory proteins to their specific DNA targets determines the accurate expression of the neighboring genes. The in silico prediction of new binding sites in completely sequenced genomes is a key aspect in the deeper understanding of gene regulatory networks. Several algorithms have been des...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-554

    authors: Oberto J

    更新日期:2010-11-11 00:00:00

  • In silico docking of urokinase plasminogen activator and integrins.

    abstract:BACKGROUND:Urokinase, its receptor and the integrins are functionally associated and involved in regulation of cell signaling, migration, adhesion and proliferation. No structural information is available on this potential multimolecular complex. However, the tri-dimensional structure of urokinase, urokinase receptor a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S2-S8

    authors: Degryse B,Fernandez-Recio J,Citro V,Blasi F,Cubellis MV

    更新日期:2008-03-26 00:00:00