Abstract:
BACKGROUND:Genetic interaction profiles are highly informative and helpful for understanding the functional linkages between genes, and therefore have been extensively exploited for annotating gene functions and dissecting specific pathway structures. However, our understanding is rather limited to the relationship between double concurrent perturbation and various higher level phenotypic changes, e.g. those in cells, tissues or organs. Modifier screens, such as synthetic genetic arrays (SGA) can help us to understand the phenotype caused by combined gene mutations. Unfortunately, exhaustive tests on all possible combined mutations in any genome are vulnerable to combinatorial explosion and are infeasible either technically or financially. Therefore, an accurate computational approach to predict genetic interaction is highly desirable, and such methods have the potential of alleviating the bottleneck on experiment design. RESULTS:In this work, we introduce a computational systems biology approach for the accurate prediction of pairwise synthetic genetic interactions (SGI). First, a high-coverage and high-precision functional gene network (FGN) is constructed by integrating protein-protein interaction (PPI), protein complex and gene expression data; then, a graph-based semi-supervised learning (SSL) classifier is utilized to identify SGI, where the topological properties of protein pairs in weighted FGN is used as input features of the classifier. We compare the proposed SSL method with the state-of-the-art supervised classifier, the support vector machines (SVM), on a benchmark dataset in S. cerevisiae to validate our method's ability to distinguish synthetic genetic interactions from non-interaction gene pairs. Experimental results show that the proposed method can accurately predict genetic interactions in S. cerevisiae (with a sensitivity of 92% and specificity of 91%). Noticeably, the SSL method is more efficient than SVM, especially for very small training sets and large test sets. CONCLUSIONS:We developed a graph-based SSL classifier for predicting the SGI. The classifier employs topological properties of weighted FGN as input features and simultaneously employs information induced from labelled and unlabelled data. Our analysis indicates that the topological properties of weighted FGN can be employed to accurately predict SGI. Also, the graph-based SSL method outperforms the traditional standard supervised approach, especially when used with small training sets. The proposed method can alleviate experimental burden of exhaustive test and provide a useful guide for the biologist in narrowing down the candidate gene pairs with SGI. The data and source code implementing the method are available from the website: http://home.ustc.edu.cn/~yzh33108/GeneticInterPred.htm.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
You ZH,Yin Z,Han K,Huang DS,Zhou Xdoi
10.1186/1471-2105-11-343subject
Has Abstractpub_date
2010-06-24 00:00:00pages
343issn
1471-2105pii
1471-2105-11-343journal_volume
11pub_type
杂志文章abstract:BACKGROUND:Comparative genomics has become an essential approach for identifying homologous gene candidates and their functions, and for studying genome evolution. There are many tools available for genome comparisons. Unfortunately, most of them are not applicable for the identification of unique genes and the inferen...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-S4-S18
更新日期:2006-12-12 00:00:00
abstract:BACKGROUND:In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-318
更新日期:2006-06-23 00:00:00
abstract:BACKGROUND:DNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms. Consequently, it rema...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2115-4
更新日期:2018-03-27 00:00:00
abstract:BACKGROUND:Novel sequence motifs detection is becoming increasingly essential in computational biology. However, the high computational cost greatly constrains the efficiency of most motif discovery algorithms. RESULTS:In this paper, we accelerate MEME algorithm targeted on Intel Many Integrated Core (MIC) Architectur...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2276-1
更新日期:2018-08-13 00:00:00
abstract:BACKGROUND:Targeted resequencing offers a cost-effective alternative to whole-genome and whole-exome sequencing when investigating regions known to be associated with a trait or disease. There are a number of approaches to targeted resequencing, including microfluidic PCR amplification, which may be enhanced by multipl...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1453-3
更新日期:2017-01-05 00:00:00
abstract:BACKGROUND:Protein sequence profile-profile alignment is an important approach to recognizing remote homologs and generating accurate pairwise alignments. It plays an important role in protein sequence database search, protein structure prediction, protein function prediction, and phylogenetic analysis. RESULTS:In thi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-252
更新日期:2014-07-25 00:00:00
abstract:BACKGROUND:Logic Learning Machine (LLM) is an innovative method of supervised analysis capable of constructing models based on simple and intelligible rules. In this investigation the performance of LLM in classifying patients with cancer was evaluated using a set of eight publicly available gene expression databases f...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2953-8
更新日期:2019-11-22 00:00:00
abstract:BACKGROUND:In microarray experiments the numbers of replicates are often limited due to factors such as cost, availability of sample or poor hybridization. There are currently few choices for the analysis of a pair of microarrays where N = 1 in each condition. In this paper, we demonstrate the effectiveness of a new al...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-489
更新日期:2008-11-21 00:00:00
abstract:BACKGROUND:Microarray technology provides an efficient means for globally exploring physiological processes governed by the coordinated expression of multiple genes. However, identification of genes differentially expressed in microarray experiments is challenging because of their potentially high type I error rate. Me...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-142
更新日期:2008-03-06 00:00:00
abstract::Following publication of the original article [1], the author reported that there are several errors in the original article. ...
journal_title:BMC bioinformatics
pub_type: 杂志文章,已发布勘误
doi:10.1186/s12859-019-3318-z
更新日期:2020-01-22 00:00:00
abstract:BACKGROUND:Human breast cancer resistance protein (BCRP) is an ATP-binding cassette (ABC) efflux transporter that confers multidrug resistance in cancers and also plays an important role in the absorption, distribution and elimination of drugs. Prediction as to if drugs or new molecular entities are BCRP substrates sho...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-130
更新日期:2013-04-15 00:00:00
abstract:BACKGROUND:DNA microarrays offer motivation and hope for the simultaneous study of variations in multiple genes. Gene expression is a temporal process that allows variations in expression levels with a characterized gene function over a period of time. Temporal gene expression curves can be treated as functional data s...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1860-0
更新日期:2017-10-12 00:00:00
abstract:BACKGROUND:The statistical modeling of biomedical corpora could yield integrated, coarse-to-fine views of biological phenomena that complement discoveries made from analysis of molecular sequence and profiling data. Here, the potential of such modeling is demonstrated by examining the 5,225 free-text items in the Caeno...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-250
更新日期:2006-05-08 00:00:00
abstract:BACKGROUND:Accurate annotation of translation initiation sites (TISs) is essential for understanding the translation initiation mechanism. However, the reliability of TIS annotation in widely used databases such as RefSeq is uncertain due to the lack of experimental benchmarks. RESULTS:Based on a homogeneity assumptio...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-160
更新日期:2008-03-25 00:00:00
abstract:BACKGROUND:This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulatio...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-310
更新日期:2012-11-21 00:00:00
abstract:BACKGROUND:As part of its broad and ambitious mission, the MicroArray Quality Control (MAQC) project reported the results of experiments using External RNA Controls (ERCs) on five microarray platforms. For most platforms, several different methods of data processing were considered. However, there was no similar consid...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-371
更新日期:2007-10-03 00:00:00
abstract:BACKGROUND:Extant genomes share regions where genes have the same order and orientation, which are thought to arise from the conservation of an ancestral order of genes during evolution. Such regions of so-called conserved synteny, or synteny blocks, must be precisely identified and quantified, as a prerequisite to bet...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-268
更新日期:2014-08-08 00:00:00
abstract:BACKGROUND:Enhancers are stretches of DNA (100-1000 bp) that play a major role in development gene expression, evolution and disease. It has been recently shown that in high-level eukaryotes enhancers rarely work alone, instead they collaborate by forming clusters of cis-regulatory modules (CRMs). Although the binding ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-0980-2
更新日期:2016-03-18 00:00:00
abstract:BACKGROUND:Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting). Therefore the question ari...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-131
更新日期:2010-03-15 00:00:00
abstract:BACKGROUND:Creating a scalable computational infrastructure to analyze the wealth of information contained in data repositories is difficult due to significant barriers in organizing, extracting and analyzing relevant data. Shared data science infrastructures like Boag is needed to efficiently process and parse data co...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2967-2
更新日期:2019-08-22 00:00:00
abstract:BACKGROUND:The main goal of the whole transcriptome analysis is to correctly identify all expressed transcripts within a specific cell/tissue--at a particular stage and condition--to determine their structures and to measure their abundances. RNA-seq data promise to allow identification and quantification of transcript...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-135
更新日期:2014-05-09 00:00:00
abstract:BACKGROUND:The efficiency of lymph nodes depends on tissue structure and organization, which allow the coordination of lymphocyte traffic. Despite their essential role, our understanding of lymph node specific mechanisms is still incomplete and currently a topic of intense research. RESULTS:In this paper, we present a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-387
更新日期:2009-11-25 00:00:00
abstract:BACKGROUND:Amplified fragment length polymorphism (AFLP) is a PCR-based technique that involves restriction of genomic DNA followed by ligation of adaptors to the fragments generated and selective PCR amplification of a subset of these fragments. The amplified fragments are separated on a sequencing gel and visualized ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-4-7
更新日期:2003-02-25 00:00:00
abstract:BACKGROUND:Genomes are subjected to rearrangements that change the orientation and ordering of genes during evolution. The most common rearrangements that occur in uni-chromosomal genomes are inversions (or reversals) to adapt to the changing environment. Since genome rearrangements are rarer than point mutations, gene...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3293-4
更新日期:2019-12-27 00:00:00
abstract:BACKGROUND:High-throughput screens comparing growth rates of arrays of distinct micro-organism cultures on solid agar are useful, rapid methods of quantifying genetic interactions. Growth rate is an informative phenotype which can be estimated by measuring cell densities at one or more times after inoculation. Precise ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-287
更新日期:2010-05-28 00:00:00
abstract:BACKGROUND:Over the last few years transcriptome sequencing (RNA-Seq) has almost completely taken over microarrays for high-throughput studies of gene expression. Currently, the most popular use of RNA-Seq is to identify genes which are differentially expressed between two or more conditions. Despite the importance of ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0397-8
更新日期:2014-12-05 00:00:00
abstract:BACKGROUND:Biological data have traditionally been stored and made publicly available through a variety of on-line databases, whereas biological knowledge has traditionally been found in the printed literature. With journals now on-line and providing an increasing amount of open access content, often free of copyright ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-220
更新日期:2010-04-29 00:00:00
abstract:BACKGROUND:Researchers involved in the annotation of large numbers of gene, clone or protein identifiers are usually required to perform a one-by-one conversion for each identifier. When the field of research is one such as microarray experiments, this number may be around 30,000. RESULTS:To help researchers map acces...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-9
更新日期:2007-01-10 00:00:00
abstract:BACKGROUND:Abruptness of pigment patterns at the periphery of a skin lesion is one of the most important dermoscopic features for detection of malignancy. In current clinical setting, abrupt cutoff of a skin lesion determined by an examination of a dermatologist. This process is subjective, nonquantitative, and error-p...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1892-5
更新日期:2017-12-28 00:00:00
abstract:BACKGROUND:Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on c...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-272
更新日期:2010-05-20 00:00:00