Disease candidate gene identification and prioritization using protein interaction networks.

Abstract:

BACKGROUND:Although most of the current disease candidate gene identification and prioritization methods depend on functional annotations, the coverage of the gene functional annotations is a limiting factor. In the current study, we describe a candidate gene prioritization method that is entirely based on protein-protein interaction network (PPIN) analyses. RESULTS:For the first time, extended versions of the PageRank and HITS algorithms, and the K-Step Markov method are applied to prioritize disease candidate genes in a training-test schema. Using a list of known disease-related genes from our earlier study as a training set ("seeds"), and the rest of the known genes as a test list, we perform large-scale cross validation to rank the candidate genes and also evaluate and compare the performance of our approach. Under appropriate settings - for example, a back probability of 0.3 for PageRank with Priors and HITS with Priors, and step size 6 for K-Step Markov method - the three methods achieved a comparable AUC value, suggesting a similar performance. CONCLUSION:Even though network-based methods are generally not as effective as integrated functional annotation-based methods for disease candidate gene prioritization, in a one-to-one comparison, PPIN-based candidate gene prioritization performs better than all other gene features or annotations. Additionally, we demonstrate that methods used for studying both social and Web networks can be successfully used for disease candidate gene prioritization.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Chen J,Aronow BJ,Jegga AG

doi

10.1186/1471-2105-10-73

subject

Has Abstract

pub_date

2009-02-27 00:00:00

pages

73

issn

1471-2105

pii

1471-2105-10-73

journal_volume

10

pub_type

杂志文章
  • The acquisition of novel N-glycosylation sites in conserved proteins during human evolution.

    abstract:BACKGROUND:N-linked protein glycosylation plays an important role in various biological processes, including protein folding and trafficking, and cell adhesion and signaling. The acquisition of a novel N-glycosylation site may have significant effect on protein structure and function, and therefore, on the phenotype. ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0468-5

    authors: Kim DS,Hahn Y

    更新日期:2015-01-28 00:00:00

  • Blazing Signature Filter: a library for fast pairwise similarity comparisons.

    abstract:BACKGROUND:Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2210-6

    authors: Lee JY,Fujimoto GM,Wilson R,Wiley HS,Payne SH

    更新日期:2018-06-11 00:00:00

  • An algorithm for automated closure during assembly.

    abstract:BACKGROUND:Finishing is the process of improving the quality and utility of draft genome sequences generated by shotgun sequencing and computational assembly. Finishing can involve targeted sequencing. Finishing reads may be incorporated by manual or automated means. One automated method uses targeted addition by local...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-457

    authors: Koren S,Miller JR,Walenz BP,Sutton G

    更新日期:2010-09-10 00:00:00

  • Homology induction: the use of machine learning to improve sequence similarity searches.

    abstract:BACKGROUND:The inference of homology between proteins is a key problem in molecular biology The current best approaches only identify approximately 50% of homologies (with a false positive rate set at 1/1000). RESULTS:We present Homology Induction (HI), a new approach to inferring homology. HI uses machine learning to...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-3-11

    authors: Karwath A,King RD

    更新日期:2002-04-23 00:00:00

  • Determining significance of pairwise co-occurrences of events in bursty sequences.

    abstract:BACKGROUND:Event sequences where different types of events often occur close together arise, e.g., when studying potential transcription factor binding sites (TFBS, events) of certain transcription factors (TF, types) in a DNA sequence. These events tend to occur in bursts: in some genomic regions there are more genes ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-336

    authors: Haiminen N,Mannila H,Terzi E

    更新日期:2008-08-08 00:00:00

  • Random generalized linear model: a highly accurate and interpretable ensemble predictor.

    abstract:BACKGROUND:Ensemble predictors such as the random forest are known to have superior accuracy but their black-box predictions are difficult to interpret. In contrast, a generalized linear model (GLM) is very interpretable especially when forward feature selection is used to construct the model. However, forward feature ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-5

    authors: Song L,Langfelder P,Horvath S

    更新日期:2013-01-16 00:00:00

  • Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data.

    abstract:BACKGROUND:A survey of presences and absences of specific species across multiple biogeographic units (or bioregions) are used in a broad area of biological studies from ecology to microbiology. Using binary presence-absence data, we evaluate species co-occurrences that help elucidate relationships among organisms and ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3118-5

    authors: Chung NC,Miasojedow B,Startek M,Gambin A

    更新日期:2019-12-24 00:00:00

  • Knowledge discovery of drug data on the example of adverse reaction prediction.

    abstract:BACKGROUND:Antibiotics are the widely prescribed drugs for children and most likely to be related with adverse reactions. Record on adverse reactions and allergies from antibiotics considerably affect the prescription choices. We consider this a biomedical decision-making problem and explore hidden knowledge in survey ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S6-S7

    authors: Yildirim P,Majnarić L,Ekmekci O,Holzinger A

    更新日期:2014-01-01 00:00:00

  • Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms.

    abstract:BACKGROUND:Predicting protein function has become increasingly demanding in the era of next generation sequencing technology. The task to assign a curator-reviewed function to every single sequence is impracticable. Bioinformatics tools, easy to use and able to provide automatic and reliable annotations at a genomic sc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S4-S14

    authors: Falda M,Toppo S,Pescarolo A,Lavezzo E,Di Camillo B,Facchinetti A,Cilia E,Velasco R,Fontana P

    更新日期:2012-03-28 00:00:00

  • Calibration and assessment of channel-specific biases in microarray data with extended dynamical range.

    abstract:BACKGROUND:Non-linearities in observed log-ratios of gene expressions, also known as intensity dependent log-ratios, can often be accounted for by global biases in the two channels being compared. Any step in a microarray process may introduce such offsets and in this article we study the biases introduced by the micro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-177

    authors: Bengtsson H,Jönsson G,Vallon-Christersson J

    更新日期:2004-11-12 00:00:00

  • Mapping transcription mechanisms from multimodal genomic data.

    abstract:BACKGROUND:Identification of expression quantitative trait loci (eQTLs) is an emerging area in genomic study. The task requires an integrated analysis of genome-wide single nucleotide polymorphism (SNP) data and gene expression data, raising a new computational challenge due to the tremendous size of data. RESULTS:We ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S9-S2

    authors: Chang HH,McGeachie M,Alterovitz G,Ramoni MF

    更新日期:2010-10-28 00:00:00

  • TooT-T: discrimination of transport proteins from non-transport proteins.

    abstract:BACKGROUND:Membrane transport proteins (transporters) play an essential role in every living cell by transporting hydrophilic molecules across the hydrophobic membranes. While the sequences of many membrane proteins are known, their structure and function is still not well characterized and understood, owing to the imm...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3311-6

    authors: Alballa M,Butler G

    更新日期:2020-04-23 00:00:00

  • R/BHC: fast Bayesian hierarchical clustering for microarray data.

    abstract:BACKGROUND:Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. RESULTS:We present an R/Bioconductor port of a fast novel algorithm for...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-242

    authors: Savage RS,Heller K,Xu Y,Ghahramani Z,Truman WM,Grant M,Denby KJ,Wild DL

    更新日期:2009-08-06 00:00:00

  • Quantitative prediction of the effect of genetic variation using hidden Markov models.

    abstract:BACKGROUND:With the development of sequencing technologies, more and more sequence variants are available for investigation. Different classes of variants in the human genome have been identified, including single nucleotide substitutions, insertion and deletion, and large structural variations such as duplications and...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-5

    authors: Liu M,Watson LT,Zhang L

    更新日期:2014-01-09 00:00:00

  • Predicting the interactome of Xanthomonas oryzae pathovar oryzae for target selection and DB service.

    abstract:BACKGROUND:Protein-protein interactions (PPIs) play key roles in various cellular functions. In addition, some critical inter-species interactions such as host-pathogen interactions and pathogenicity occur through PPIs. Phytopathogenic bacteria infect hosts through attachment to host tissue, enzyme secretion, exopolysa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-41

    authors: Kim JG,Park D,Kim BC,Cho SW,Kim YT,Park YJ,Cho HJ,Park H,Kim KB,Yoon KO,Park SJ,Lee BM,Bhak J

    更新日期:2008-01-24 00:00:00

  • Molecular evolution of dihydrouridine synthases.

    abstract:BACKGROUND:Dihydrouridine (D) is a modified base found in conserved positions in the D-loop of tRNA in Bacteria, Eukaryota, and some Archaea. Despite the abundant occurrence of D, little is known about its biochemical roles in mediating tRNA function. It is assumed that D may destabilize the structure of tRNA and thus ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-153

    authors: Kasprzak JM,Czerwoniec A,Bujnicki JM

    更新日期:2012-06-28 00:00:00

  • Indicators for the Data Usage Index (DUI): an incentive for publishing primary biodiversity data through global information infrastructure.

    abstract:BACKGROUND:A professional recognition mechanism is required to encourage expedited publishing of an adequate volume of 'fit-for-use' biodiversity data. As a component of such a recognition mechanism, we propose the development of the Data Usage Index (DUI) to demonstrate to data publishers that their efforts of creatin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S15-S3

    authors: Ingwersen P,Chavan V

    更新日期:2011-01-01 00:00:00

  • InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes.

    abstract:BACKGROUND:Although many genomic features have been used in the prediction of protein-protein interactions (PPIs), frequently only one is used in a computational method. After realizing the limited power in the prediction using only one genomic feature, investigators are now moving toward integration. So far, there hav...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-414

    authors: Sun J,Sun Y,Ding G,Liu Q,Wang C,He Y,Shi T,Li Y,Zhao Z

    更新日期:2007-10-26 00:00:00

  • An improved distance measure between the expression profiles linking co-expression and co-regulation in mouse.

    abstract:BACKGROUND:Many statistical algorithms combine microarray expression data and genome sequence data to identify transcription factor binding motifs in the low eukaryotic genomes. Finding cis-regulatory elements in higher eukaryote genomes, however, remains a challenge, as searching in the promoter regions of genes with ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-44

    authors: Kim RS,Ji H,Wong WH

    更新日期:2006-01-26 00:00:00

  • A method of predicting changes in human gene splicing induced by genetic variants in context of cis-acting elements.

    abstract:BACKGROUND:Polymorphic variants and mutations disrupting canonical splicing isoforms are among the leading causes of human hereditary disorders. While there is a substantial evidence of aberrant splicing causing Mendelian diseases, the implication of such events in multi-genic disorders is yet to be well understood. We...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-22

    authors: Churbanov A,Vorechovský I,Hicks C

    更新日期:2010-01-12 00:00:00

  • Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns.

    abstract:BACKGROUND:It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enou...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-226

    authors: Meyer F,Kurtz S,Beckstette M

    更新日期:2013-07-17 00:00:00

  • Time-course analysis of genome-wide gene expression data from hormone-responsive human breast cancer cells.

    abstract:BACKGROUND:Microarray experiments enable simultaneous measurement of the expression levels of virtually all transcripts present in cells, thereby providing a 'molecular picture' of the cell state. On the other hand, the genomic responses to a pharmacological or hormonal stimulus are dynamic molecular processes, where t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S2-S12

    authors: Mutarelli M,Cicatiello L,Ferraro L,Grober OM,Ravo M,Facchiano AM,Angelini C,Weisz A

    更新日期:2008-03-26 00:00:00

  • TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas.

    abstract:BACKGROUND:Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1419-5

    authors: Cumbo F,Fiscon G,Ceri S,Masseroli M,Weitschek E

    更新日期:2017-01-03 00:00:00

  • Process attributes in bio-ontologies.

    abstract:BACKGROUND:Biomedical processes can provide essential information about the (mal-) functioning of an organism and are thus frequently represented in biomedical terminologies and ontologies, including the GO Biological Process branch. These processes often need to be described and categorised in terms of their attribute...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-217

    authors: Andrade AQ,Blondé W,Hastings J,Schulz S

    更新日期:2012-08-28 00:00:00

  • BISR-RNAseq: an efficient and scalable RNAseq analysis workflow with interactive report generation.

    abstract:BACKGROUND:RNA sequencing has become an increasingly affordable way to profile gene expression patterns. Here we introduce a workflow implementing several open-source softwares that can be run on a high performance computing environment. RESULTS:Developed as a tool by the Bioinformatics Shared Resource Group (BISR) at...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3251-1

    authors: Gadepalli VS,Ozer HG,Yilmaz AS,Pietrzak M,Webb A

    更新日期:2019-12-20 00:00:00

  • Hierarchical structure and modules in the Escherichia coli transcriptional regulatory network revealed by a new top-down approach.

    abstract:BACKGROUND:Cellular functions are coordinately carried out by groups of genes forming functional modules. Identifying such modules in the transcriptional regulatory network (TRN) of organisms is important for understanding the structure and function of these fundamental cellular networks and essential for the emerging ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-199

    authors: Ma HW,Buer J,Zeng AP

    更新日期:2004-12-16 00:00:00

  • A study on multi-omic oscillations in Escherichia coli metabolic networks.

    abstract:BACKGROUND:Two important challenges in the analysis of molecular biology information are data (multi-omic information) integration and the detection of patterns across large scale molecular networks and sequences. They are are actually coupled beause the integration of omic information may provide better means to detec...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2175-5

    authors: Bardozzo F,Lió P,Tagliaferri R

    更新日期:2018-07-09 00:00:00

  • Swellix: a computational tool to explore RNA conformational space.

    abstract:BACKGROUND:The sequence of nucleotides in an RNA determines the possible base pairs for an RNA fold and thus also determines the overall shape and function of an RNA. The Swellix program presented here combines a helix abstraction with a combinatorial approach to the RNA folding problem in order to compute all possible...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1910-7

    authors: Sloat N,Liu JW,Schroeder SJ

    更新日期:2017-11-21 00:00:00

  • Bias detection and correction in RNA-Sequencing data.

    abstract:BACKGROUND:High throughput sequencing technology provides us unprecedented opportunities to study transcriptome dynamics. Compared to microarray-based gene expression profiling, RNA-Seq has many advantages, such as high resolution, low background, and ability to identify novel transcripts. Moreover, for genes with mult...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-290

    authors: Zheng W,Chung LM,Zhao H

    更新日期:2011-07-19 00:00:00

  • A platform for processing expression of short time series (PESTS).

    abstract:BACKGROUND:Time course microarray profiles examine the expression of genes over a time domain. They are necessary in order to determine the complete set of genes that are dynamically expressed under given conditions, and to determine the interaction between these genes. Because of cost and resource issues, most time se...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-13

    authors: Sinha A,Markatou M

    更新日期:2011-01-11 00:00:00