HMMvar-func: a new method for predicting the functional outcome of genetic variants.

Abstract:

BACKGROUND:Numerous tools have been developed to predict the fitness effects (i.e., neutral, deleterious, or beneficial) of genetic variants on corresponding proteins. However, prediction in terms of whether a variant causes the variant bearing protein to lose the original function or gain new function is also needed for better understanding of how the variant contributes to disease/cancer. To address this problem, the present work introduces and computationally defines four types of functional outcome of a variant: gain, loss, switch, and conservation of function. The deployment of multiple hidden Markov models is proposed to computationally classify mutations by the four functional impact types. RESULTS:The functional outcome is predicted for over a hundred thyroid stimulating hormone receptor (TSHR) mutations, as well as cancer related mutations in oncogenes or tumor suppressor genes. The results show that the proposed computational method is effective in fine grained prediction of the functional outcome of a mutation, and can be used to help elucidate the molecular mechanism of disease/cancer causing mutations. The program is freely available at http://bioinformatics.cs.vt.edu/zhanglab/HMMvar/download.php. CONCLUSION:This work is the first to computationally define and predict functional impact of mutations, loss, switch, gain, or conservation of function. These fine grained predictions can be especially useful for identifying mutations that cause or are linked to cancer.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Liu M,Watson LT,Zhang L

doi

10.1186/s12859-015-0781-z

subject

Has Abstract

pub_date

2015-10-30 00:00:00

pages

351

issn

1471-2105

pii

10.1186/s12859-015-0781-z

journal_volume

16

pub_type

杂志文章
  • Unsupervised fuzzy pattern discovery in gene expression data.

    abstract:BACKGROUND:Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S5-S5

    authors: Wu GP,Chan KC,Wong AK

    更新日期:2011-01-01 00:00:00

  • Large scale analysis of protein conformational transitions from aqueous to non-aqueous media.

    abstract:BACKGROUND:Biocatalysis in organic solvents is nowadays a common practice with a large potential in Biotechnology. Several studies report that proteins which are co-crystallized or soaked in organic solvents preserve their fold integrity showing almost identical arrangements when compared to their aqueous forms. Howeve...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2044-2

    authors: Rueda AJV,Monzon AM,Ardanaz SM,Iglesias LE,Parisi G

    更新日期:2018-01-30 00:00:00

  • antaRNA--Multi-objective inverse folding of pseudoknot RNA using ant-colony optimization.

    abstract:BACKGROUND:Many functional RNA molecules fold into pseudoknot structures, which are often essential for the formation of an RNA's 3D structure. Currently the design of RNA molecules, which fold into a specific structure (known as RNA inverse folding) within biotechnological applications, is lacking the feature of incor...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0815-6

    authors: Kleinkauf R,Houwaart T,Backofen R,Mann M

    更新日期:2015-11-18 00:00:00

  • Improving interoperability between microbial information and sequence databases.

    abstract:BACKGROUND:Biological resources are essential tools for biomedical research. Their availability is promoted through on-line catalogues. Common Access to Biological Resources and Information (CABRI) is a service for distribution of biological resources and related data collected by 28 European culture collections. Linki...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-S4-S23

    authors: Romano P,Dawyndt P,Piersigilli F,Swings J

    更新日期:2005-12-01 00:00:00

  • Evaluation of gene-expression clustering via mutual information distance measure.

    abstract:BACKGROUND:The definition of a distance measure plays a key role in the evaluation of different clustering solutions of gene expression profiles. In this empirical study we compare different clustering solutions when using the Mutual Information (MI) measure versus the use of the well known Euclidean distance and Pears...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-111

    authors: Priness I,Maimon O,Ben-Gal I

    更新日期:2007-03-30 00:00:00

  • Fpocket: an open source platform for ligand pocket detection.

    abstract:BACKGROUND:Virtual screening methods start to be well established as effective approaches to identify hits, candidates and leads for drug discovery research. Among those, structure based virtual screening (SBVS) approaches aim at docking collections of small compounds in the target structure to identify potent compound...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-168

    authors: Le Guilloux V,Schmidtke P,Tuffery P

    更新日期:2009-06-02 00:00:00

  • Integrated prediction of one-dimensional structural features and their relationships with conformational flexibility in helical membrane proteins.

    abstract:BACKGROUND:Many structural properties such as solvent accessibility, dihedral angles and helix-helix contacts can be assigned to each residue in a membrane protein. Independent studies exist on the analysis and sequence-based prediction of some of these so-called one-dimensional features. However, there is little expla...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-533

    authors: Ahmad S,Singh YH,Paudel Y,Mori T,Sugita Y,Mizuguchi K

    更新日期:2010-10-27 00:00:00

  • A weighted string kernel for protein fold recognition.

    abstract:BACKGROUND:Alignment-free methods for comparing protein sequences have proved to be viable alternatives to approaches that first rely on an alignment of the sequences to be compared. Much work however need to be done before those methods provide reliable fold recognition for proteins whose sequences share little simila...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1795-5

    authors: Nojoomi S,Koehl P

    更新日期:2017-08-25 00:00:00

  • SIS: a program to generate draft genome sequence scaffolds for prokaryotes.

    abstract:BACKGROUND:Decreasing costs of DNA sequencing have made prokaryotic draft genome sequences increasingly common. A contig scaffold is an ordering of contigs in the correct orientation. A scaffold can help genome comparisons and guide gap closure efforts. One popular technique for obtaining contig scaffolds is to map con...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-96

    authors: Dias Z,Dias U,Setubal JC

    更新日期:2012-05-14 00:00:00

  • Simultaneous fitting of real-time PCR data with efficiency of amplification modeled as Gaussian function of target fluorescence.

    abstract:BACKGROUND:In real-time PCR, it is necessary to consider the efficiency of amplification (EA) of amplicons in order to determine initial target levels properly. EAs can be deduced from standard curves, but these involve extra effort and cost and may yield invalid EAs. Alternatively, EA can be extracted from individual ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-95

    authors: Batsch A,Noetel A,Fork C,Urban A,Lazic D,Lucas T,Pietsch J,Lazar A,Schömig E,Gründemann D

    更新日期:2008-02-12 00:00:00

  • Knowledge discovery of drug data on the example of adverse reaction prediction.

    abstract:BACKGROUND:Antibiotics are the widely prescribed drugs for children and most likely to be related with adverse reactions. Record on adverse reactions and allergies from antibiotics considerably affect the prescription choices. We consider this a biomedical decision-making problem and explore hidden knowledge in survey ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S6-S7

    authors: Yildirim P,Majnarić L,Ekmekci O,Holzinger A

    更新日期:2014-01-01 00:00:00

  • Fast batch searching for protein homology based on compression and clustering.

    abstract:BACKGROUND:In bioinformatics community, many tasks associate with matching a set of protein query sequences in large sequence datasets. To conduct multiple queries in the database, a common used method is to run BLAST on each original querey or on the concatenated queries. It is inefficient since it doesn't exploit the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1938-8

    authors: Ge H,Sun L,Yu J

    更新日期:2017-11-21 00:00:00

  • Combining sequence and network information to enhance protein-protein interaction prediction.

    abstract:BACKGROUND:Protein-protein interactions (PPIs) are of great importance in cellular systems of organisms, since they are the basis of cellular structure and function and many essential cellular processes are related to that. Most proteins perform their functions by interacting with other proteins, so predicting PPIs acc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03896-6

    authors: Liu L,Zhu X,Ma Y,Piao H,Yang Y,Hao X,Fu Y,Wang L,Peng J

    更新日期:2020-12-16 00:00:00

  • Performance of a genetic algorithm for mass spectrometry proteomics.

    abstract:BACKGROUND:Recently, mass spectrometry data have been mined using a genetic algorithm to produce discriminatory models that distinguish healthy individuals from those with cancer. This algorithm is the basis for claims of 100% sensitivity and specificity in two related publicly available datasets. To date, no detailed ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-180

    authors: Jeffries NO

    更新日期:2004-11-19 00:00:00

  • Prediction of novel long non-coding RNAs based on RNA-Seq data of mouse Klf1 knockout study.

    abstract:BACKGROUND:Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions. RESULTS:We present a computational pipeline for detecting novel lncRNAs fro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-331

    authors: Sun L,Zhang Z,Bailey TL,Perkins AC,Tallack MR,Xu Z,Liu H

    更新日期:2012-12-13 00:00:00

  • GOAL: a software tool for assessing biological significance of genes groups.

    abstract:BACKGROUND:Modern high throughput experimental techniques such as DNA microarrays often result in large lists of genes. Computational biology tools such as clustering are then used to group together genes based on their similarity in expression profiles. Genes in each group are probably functionally related. The functi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-229

    authors: Tchagang AB,Gawronski A,Bérubé H,Phan S,Famili F,Pan Y

    更新日期:2010-05-06 00:00:00

  • Finding sRNA generative locales from high-throughput sequencing data with NiBLS.

    abstract:BACKGROUND:Next-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment. One important use of the technology is the sequencing of small non-coding regulatory RNAs and the identification of the genomic locales from which they originate. Currently, there is a pauci...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-93

    authors: MacLean D,Moulton V,Studholme DJ

    更新日期:2010-02-18 00:00:00

  • Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression.

    abstract:BACKGROUND:The identification of differentially expressed genes (DEGs) from Affymetrix GeneChips arrays is currently done by first computing expression levels from the low-level probe intensities, then deriving significance by comparing these expression levels between conditions. The proposed PL-LM (Probe-Level Linear ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-391

    authors: Lemieux S

    更新日期:2006-08-25 00:00:00

  • ICoVax 2013: the 3rd ISV Pre-conference Computational Vaccinology Workshop.

    abstract::Following last year's computational vaccinology workshop in Shanghai, China, the third ISV Pre-conference Computational Vaccinology Workshop (ICoVax 2013) was held in Barcelona, Spain. ICoVax 2013 provided an international platform for the attendees to showcase their research and discuss problems and solutions in the ...

    journal_title:BMC bioinformatics

    pub_type:

    doi:10.1186/1471-2105-15-S4-I1

    authors: De Groot AS,De Groot P,He Y

    更新日期:2014-01-01 00:00:00

  • ImmunoGlobe: enabling systems immunology with a manually curated intercellular immune interaction network.

    abstract:BACKGROUND:While technological advances have made it possible to profile the immune system at high resolution, translating high-throughput data into knowledge of immune mechanisms has been challenged by the complexity of the interactions underlying immune processes. Tools to explore the immune network are critical for ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03702-3

    authors: Atallah MB,Tandon V,Hiam KJ,Boyce H,Hori M,Atallah W,Spitzer MH,Engleman E,Mallick P

    更新日期:2020-08-10 00:00:00

  • CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies.

    abstract:BACKGROUND:Current taxonomic classification tools use exact string matching algorithms that are effective to tackle the data from the next generation sequencing technology. However, the unique error patterns in the third generation sequencing (TGS) technologies could reduce the accuracy of these programs. RESULTS:We d...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03777-y

    authors: Bui VK,Wei C

    更新日期:2020-10-20 00:00:00

  • Shared data science infrastructure for genomics data.

    abstract:BACKGROUND:Creating a scalable computational infrastructure to analyze the wealth of information contained in data repositories is difficult due to significant barriers in organizing, extracting and analyzing relevant data. Shared data science infrastructures like Boag is needed to efficiently process and parse data co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2967-2

    authors: Bagheri H,Muppirala U,Masonbrink RE,Severin AJ,Rajan H

    更新日期:2019-08-22 00:00:00

  • FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics.

    abstract:BACKGROUND:High throughput DNA/RNA sequencing has revolutionized biological and clinical research. Sequencing is widely used, and generates very large amounts of data, mainly due to reduced cost and advanced technologies. Quickly assessing the quality of giga-to-tera base levels of sequencing data has become a routine ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3015-y

    authors: Chanumolu SK,Albahrani M,Otu HH

    更新日期:2019-08-15 00:00:00

  • Mining published lists of cancer related microarray experiments: identification of a gene expression signature having a critical role in cell-cycle control.

    abstract:BACKGROUND:Routine application of gene expression microarray technology is rapidly producing large amounts of data that necessitate new approaches of analysis. The analysis of a specific microarray experiment profits enormously from cross-comparing to other experiments. This process is generally performed by numerical ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-S4-S14

    authors: Finocchiaro G,Mancuso F,Muller H

    更新日期:2005-12-01 00:00:00

  • An evidence-based approach to identify aging-related genes in Caenorhabditis elegans.

    abstract:BACKGROUND:Extensive studies have been carried out on Caenorhabditis elegans as a model organism to elucidate mechanisms of aging and the effects of perturbing known aging-related genes on lifespan and behavior. This research has generated large amounts of experimental data that is increasingly difficult to integrate a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0469-4

    authors: Callahan A,Cifuentes JJ,Dumontier M

    更新日期:2015-02-07 00:00:00

  • Primary orthologs from local sequence context.

    abstract:BACKGROUND:The evolutionary history of genes serves as a cornerstone of contemporary biology. Most conserved sequences in mammalian genomes don't code for proteins, yielding a need to infer evolutionary history of sequences irrespective of what kind of functional element they may encode. Thus, sequence-, as opposed to ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3384-2

    authors: Gao K,Miller J

    更新日期:2020-02-06 00:00:00

  • iMEGES: integrated mental-disorder GEnome score by deep neural network for prioritizing the susceptibility genes for mental disorders in personal genomes.

    abstract:BACKGROUND:A range of rare and common genetic variants have been discovered to be potentially associated with mental diseases, but many more have not been uncovered. Powerful integrative methods are needed to systematically prioritize both variants and genes that confer susceptibility to mental diseases in personal gen...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2469-7

    authors: Khan A,Liu Q,Wang K

    更新日期:2018-12-28 00:00:00

  • Improved functional prediction of proteins by learning kernel combinations in multilabel settings.

    abstract:BACKGROUND:We develop a probabilistic model for combining kernel matrices to predict the function of proteins. It extends previous approaches in that it can handle multiple labels which naturally appear in the context of protein function. RESULTS:Explicit modeling of multilabels significantly improves the capability o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S2-S12

    authors: Roth V,Fischer B

    更新日期:2007-05-03 00:00:00

  • Spectral estimation in unevenly sampled space of periodically expressed microarray time series data.

    abstract:BACKGROUND:Periodogram analysis of time-series is widespread in biology. A new challenge for analyzing the microarray time series data is to identify genes that are periodically expressed. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, an...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-137

    authors: Liew AW,Xian J,Wu S,Smith D,Yan H

    更新日期:2007-04-24 00:00:00

  • Critique of the pairwise method for estimating qPCR amplification efficiency: beware of correlated data!

    abstract:BACKGROUND:A recently proposed method for estimating qPCR amplification efficiency E analyzes fluorescence intensity ratios from pairs of points deemed to lie in the exponential growth region on the amplification curves for all reactions in a dilution series. This method suffers from a serious problem: The resulting ra...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03604-4

    authors: Tellinghuisen J

    更新日期:2020-07-08 00:00:00