Abstract:
BACKGROUND:In microarray data analysis, factors such as data quality, biological variation, and the increasingly multi-layered nature of more complex biological systems complicates the modelling of regulatory networks that can represent and capture the interactions among genes. We believe that the use of multiple datasets derived from related biological systems leads to more robust models. Therefore, we developed a novel framework for modelling regulatory networks that involves training and evaluation on independent datasets. Our approach includes the following steps: (1) ordering the datasets based on their level of noise and informativeness; (2) selection of a Bayesian classifier with an appropriate level of complexity by evaluation of predictive performance on independent data sets; (3) comparing the different gene selections and the influence of increasing the model complexity; (4) functional analysis of the informative genes. RESULTS:In this paper, we identify the most appropriate model complexity using cross-validation and independent test set validation for predicting gene expression in three published datasets related to myogenesis and muscle differentiation. Furthermore, we demonstrate that models trained on simpler datasets can be used to identify interactions among genes and select the most informative. We also show that these models can explain the myogenesis-related genes (genes of interest) significantly better than others (P < 0.004) since the improvement in their rankings is much more pronounced. Finally, after further evaluating our results on synthetic datasets, we show that our approach outperforms a concordance method by Lai et al. in identifying informative genes from multiple datasets with increasing complexity whilst additionally modelling the interaction between genes. CONCLUSIONS:We show that Bayesian networks derived from simpler controlled systems have better performance than those trained on datasets from more complex biological systems. Further, we present that highly predictive and consistent genes, from the pool of differentially expressed genes, across independent datasets are more likely to be fundamentally involved in the biological process under study. We conclude that networks trained on simpler controlled systems, such as in vitro experiments, can be used to model and capture interactions among genes in more complex datasets, such as in vivo experiments, where these interactions would otherwise be concealed by a multitude of other ongoing events.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Anvar SY,'t Hoen PA,Tucker Adoi
10.1186/1471-2105-11-32subject
Has Abstractpub_date
2010-01-15 00:00:00pages
32issn
1471-2105pii
1471-2105-11-32journal_volume
11pub_type
杂志文章abstract:BACKGROUND:Over the last decade, next generation sequencing (NGS) has become widely available, and is now the sequencing technology of choice for most researchers. Nonetheless, NGS presents a challenge for the evolutionary biologists who wish to estimate evolutionary genetic parameters from a mixed sample of unlabelled...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0810-y
更新日期:2015-11-04 00:00:00
abstract:BACKGROUND:Genome imputation, admixture resolution and genome-wide association analyses are timely and computationally intensive processes with many composite and requisite steps. Analysis time increases further when building and installing the run programs required for these analyses. For scientists that may not be as...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2964-5
更新日期:2019-06-28 00:00:00
abstract:BACKGROUND:Great strides have been made in the effective treatment of HIV-1 with the development of second-generation protease inhibitors (PIs) that are effective against historically multi-PI-resistant HIV-1 variants. Nevertheless, mutation patterns that confer decreasing susceptibility to available PIs continue to ar...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-477
更新日期:2011-12-15 00:00:00
abstract:BACKGROUND:Bacterial pan-genomes, comprised of conserved and variable genes across multiple sequenced bacterial genomes, allow for identification of genomic regions that are phylogenetically discriminating or functionally important. Pan-genomes consist of large amounts of data, which can restrict researchers ability to...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2250-y
更新日期:2018-06-27 00:00:00
abstract:BACKGROUND:Bioinformatics tools for automatic processing of biomedical literature are invaluable for both the design and interpretation of large-scale experiments. Many information extraction (IE) systems that incorporate natural language processing (NLP) techniques have thus been developed for use in the biomedical fi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-325
更新日期:2007-09-01 00:00:00
abstract:BACKGROUND:This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulatio...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-310
更新日期:2012-11-21 00:00:00
abstract:BACKGROUND:Protein-protein docking, which aims to predict the structure of a protein-protein complex from its unbound components, remains an unresolved challenge in structural bioinformatics. An important step is the ranking of docked poses using a scoring function, for which many methods have been developed. There is ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-286
更新日期:2013-10-01 00:00:00
abstract:BACKGROUND:The development of next-generation sequencing has made it possible to sequence whole genomes at a relatively low cost. However, de novo genome assemblies remain challenging due to short read length, missing data, repetitive regions, polymorphisms and sequencing errors. As more and more genomes are sequenced,...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1911-6
更新日期:2017-11-10 00:00:00
abstract:BACKGROUND:Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and hel...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-311
更新日期:2011-07-29 00:00:00
abstract:BACKGROUND:It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enou...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-226
更新日期:2013-07-17 00:00:00
abstract:BACKGROUND:A massive amount of proteomic data is generated on a daily basis, nonetheless annotating all sequences is costly and often unfeasible. As a countermeasure, machine learning methods have been used to automatically annotate new protein functions. More specifically, many studies have investigated hierarchical m...
journal_title:BMC bioinformatics
pub_type: 杂志文章,评审
doi:10.1186/s12859-019-3060-6
更新日期:2019-09-23 00:00:00
abstract:BACKGROUND:The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. RESULTS:...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2103-8
更新日期:2018-03-09 00:00:00
abstract:BACKGROUND:Infections are often associated to comorbidity that increases the risk of medical conditions which can lead to further morbidity and mortality. SARS is a threat which is similar to MERS virus, but the comorbidity is the key aspect to underline their different impacts. One UK doctor says "I'd rather have HIV ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-333
更新日期:2014-10-24 00:00:00
abstract:BACKGROUND:The miRNAs, a class of short approximately 22-nucleotide non-coding RNAs, often act post-transcriptionally to inhibit mRNA expression. In effect, they control gene expression by targeting mRNA. They also help in carrying out normal functioning of a cell as they play an important role in various cellular proc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-266
更新日期:2013-09-04 00:00:00
abstract:BACKGROUND:The detection of bias due to cryptic population structure is an important step in the evaluation of findings of genetic association studies. The standard method of measuring this bias in a genetic association study is to compare the observed median association test statistic to the expected median test stati...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0496-1
更新日期:2015-02-20 00:00:00
abstract:BACKGROUND:Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on c...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-272
更新日期:2010-05-20 00:00:00
abstract:BACKGROUND:The search for enriched features has become widely used to characterize a set of genes or proteins. A key aspect of this technique is its ability to identify correlations amongst heterogeneous data such as Gene Ontology annotations, gene expression data and genome location of genes. Despite the rapid growth ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-332
更新日期:2007-09-11 00:00:00
abstract:BACKGROUND:Replication origins are considered important sites for understanding the molecular mechanisms involved in DNA replication. Many computational methods have been developed for predicting their locations in archaeal, bacterial and eukaryotic genomes. However, a prediction method designed for a particular kind o...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-163
更新日期:2007-05-21 00:00:00
abstract:BACKGROUND:Intracellular signal transduction is achieved by networks of proteins and small molecules that transmit information from the cell surface to the nucleus, where they ultimately effect transcriptional changes. Understanding the mechanisms cells use to accomplish this important process requires a detailed molec...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-3-34
更新日期:2002-11-01 00:00:00
abstract:BACKGROUND:Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic seq...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-227
更新日期:2014-06-30 00:00:00
abstract:BACKGROUND:The sequencing of many genomes and tiling arrays consisting of millions of DNA segments spanning entire genomes have made high-resolution copy number analysis possible. Microarray-based comparative genomic hybridization (array CGH) has enabled the high-resolution detection of DNA copy number aberrations. Whi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-203
更新日期:2007-06-14 00:00:00
abstract:BACKGROUND:Common existing phylogenetic tree visualisation tools are not able to display readable trees with more than a few thousand nodes. These existing methodologies are based in two dimensional space. RESULTS:We introduce the idea of visualising phylogenetic trees in three dimensional hyperbolic space with the Wa...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-48
更新日期:2004-04-29 00:00:00
abstract:BACKGROUND:Protein-protein interactions (PPIs) play crucial roles in virtually every aspect of cellular function within an organism. Over the last decade, the development of novel high-throughput techniques has resulted in enormous amounts of data and provided valuable resources for studying protein interactions. Howev...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S7-S3
更新日期:2012-05-08 00:00:00
abstract:BACKGROUND:MHC/HLA class II molecules are important components of the immune system and play a critical role in processes such as phagocytosis. Understanding peptide recognition properties of the hundreds of MHC class II alleles is essential to appreciate determinants of antigenicity and ultimately to predict epitopes....
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S1-S55
更新日期:2010-01-18 00:00:00
abstract:BACKGROUND:Hepatocellular carcinoma (HCC) is an aggressive epithelial tumor which shows very poor prognosis and high rate of recurrence, representing an urgent problem for public healthcare. MicroRNAs (miRNAs/miRs) are a class of small, non-coding RNAs that attract great attention because of their role in regulation of...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0836-1
更新日期:2015-12-10 00:00:00
abstract:BACKGROUND:This paper summarises the lessons and experiences gained from a case study of the application of semantic web technologies to the integration of data from the bacterial species Francisella tularensis novicida (Fn). Fn data sources are disparate and heterogeneous, as multiple laboratories across the world, us...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S10-S3
更新日期:2009-10-01 00:00:00
abstract:BACKGROUND:In protein design, correct use of topology is among the initial and most critical feature. Meticulous selection of backbone topology aids in drastically reducing the structure search space. With ProLego, we present a server application to explore the component aspect of protein structures and provide an intu...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2171-9
更新日期:2018-05-04 00:00:00
abstract:BACKGROUND:Alignment-free methods for comparing protein sequences have proved to be viable alternatives to approaches that first rely on an alignment of the sequences to be compared. Much work however need to be done before those methods provide reliable fold recognition for proteins whose sequences share little simila...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1795-5
更新日期:2017-08-25 00:00:00
abstract:BACKGROUND:Cancer is caused through a multistep process, in which a succession of genetic changes, each conferring a competitive advantage for growth and proliferation, leads to the progressive conversion of normal human cells into malignant cancer cells. Interrogation of cancer genomes holds the promise of understandi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-189
更新日期:2010-04-14 00:00:00
abstract:BACKGROUND:Biomedical processes can provide essential information about the (mal-) functioning of an organism and are thus frequently represented in biomedical terminologies and ontologies, including the GO Biological Process branch. These processes often need to be described and categorised in terms of their attribute...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-217
更新日期:2012-08-28 00:00:00