A Bayesian data fusion based approach for learning genome-wide transcriptional regulatory networks.

Abstract:

BACKGROUND:Reverse engineering of transcriptional regulatory networks (TRN) from genomics data has always represented a computational challenge in System Biology. The major issue is modeling the complex crosstalk among transcription factors (TFs) and their target genes, with a method able to handle both the high number of interacting variables and the noise in the available heterogeneous experimental sources of information. RESULTS:In this work, we propose a data fusion approach that exploits the integration of complementary omics-data as prior knowledge within a Bayesian framework, in order to learn and model large-scale transcriptional networks. We develop a hybrid structure-learning algorithm able to jointly combine TFs ChIP-Sequencing data and gene expression compendia to reconstruct TRNs in a genome-wide perspective. Applying our method to high-throughput data, we verified its ability to deal with the complexity of a genomic TRN, providing a snapshot of the synergistic TFs regulatory activity. Given the noisy nature of data-driven prior knowledge, which potentially contains incorrect information, we also tested the method's robustness to false priors on a benchmark dataset, comparing the proposed approach to other regulatory network reconstruction algorithms. We demonstrated the effectiveness of our framework by evaluating structural commonalities of our learned genomic network with other existing networks inferred by different DNA binding information-based methods. CONCLUSIONS:This Bayesian omics-data fusion based methodology allows to gain a genome-wide picture of the transcriptional interplay, helping to unravel key hierarchical transcriptional interactions, which could be subsequently investigated, and it represents a promising learning approach suitable for multi-layered genomic data integration, given its robustness to noisy sources and its tailored framework for handling high dimensional data.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Sauta E,Demartini A,Vitali F,Riva A,Bellazzi R

doi

10.1186/s12859-020-3510-1

subject

Has Abstract

pub_date

2020-05-29 00:00:00

pages

219

issue

1

issn

1471-2105

pii

10.1186/s12859-020-3510-1

journal_volume

21

pub_type

杂志文章
  • Application of the common base method to regression and analysis of covariance (ANCOVA) in qPCR experiments and subsequent relative expression calculation.

    abstract:BACKGROUND:Quantitative polymerase chain reaction (qPCR) is the technique of choice for quantifying gene expression. While the technique itself is well established, approaches for the analysis of qPCR data continue to improve. RESULTS:Here we expand on the common base method to develop procedures for testing linear re...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03696-y

    authors: Ganger MT,Dietz GD,Headley P,Ewing SJ

    更新日期:2020-09-29 00:00:00

  • Spectral estimation in unevenly sampled space of periodically expressed microarray time series data.

    abstract:BACKGROUND:Periodogram analysis of time-series is widespread in biology. A new challenge for analyzing the microarray time series data is to identify genes that are periodically expressed. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, an...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-137

    authors: Liew AW,Xian J,Wu S,Smith D,Yan H

    更新日期:2007-04-24 00:00:00

  • Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis.

    abstract:BACKGROUND:In mass spectrometry (MS) based proteomic data analysis, peak detection is an essential step for subsequent analysis. Recently, there has been significant progress in the development of various peak detection algorithms. However, neither a comprehensive survey nor an experimental comparison of these algorith...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-4

    authors: Yang C,He Z,Yu W

    更新日期:2009-01-06 00:00:00

  • MetAssimulo: simulation of realistic NMR metabolic profiles.

    abstract:BACKGROUND:Probing the complex fusion of genetic and environmental interactions, metabolic profiling (or metabolomics/metabonomics), the study of small molecules involved in metabolic reactions, is a rapidly expanding 'omics' field. A major technique for capturing metabolite data is 1H-NMR spectroscopy and this yields ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-496

    authors: Muncey HJ,Jones R,De Iorio M,Ebbels TM

    更新日期:2010-10-06 00:00:00

  • Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method.

    abstract:BACKGROUND:Many processes in molecular biology involve the recognition of short sequences of nucleic-or amino acids, such as the binding of immunogenic peptides to major histocompatibility complex (MHC) molecules. From experimental data, a model of the sequence specificity of these processes can be constructed, such as...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-132

    authors: Peters B,Sette A

    更新日期:2005-05-31 00:00:00

  • A multifaceted analysis of HIV-1 protease multidrug resistance phenotypes.

    abstract:BACKGROUND:Great strides have been made in the effective treatment of HIV-1 with the development of second-generation protease inhibitors (PIs) that are effective against historically multi-PI-resistant HIV-1 variants. Nevertheless, mutation patterns that confer decreasing susceptibility to available PIs continue to ar...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-477

    authors: Doherty KM,Nakka P,King BM,Rhee SY,Holmes SP,Shafer RW,Radhakrishnan ML

    更新日期:2011-12-15 00:00:00

  • A format for databasing and comparison of AFLP fingerprint profiles.

    abstract:BACKGROUND:Amplified fragment length polymorphism (AFLP) is a PCR-based technique that involves restriction of genomic DNA followed by ligation of adaptors to the fragments generated and selective PCR amplification of a subset of these fragments. The amplified fragments are separated on a sequencing gel and visualized ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-4-7

    authors: Hong Y,Chuah A

    更新日期:2003-02-25 00:00:00

  • A framework for space-efficient read clustering in metagenomic samples.

    abstract:BACKGROUND:A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using reference genomes. Si...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1466-6

    authors: Alanko J,Cunial F,Belazzougui D,Mäkinen V

    更新日期:2017-03-14 00:00:00

  • SEQprocess: a modularized and customizable pipeline framework for NGS processing in R package.

    abstract:BACKGROUNDS:Next-Generation Sequencing (NGS) is now widely used in biomedical research for various applications. Processing of NGS data requires multiple programs and customization of the processing pipelines according to the data platforms. However, rapid progress of the NGS applications and processing methods urgentl...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2676-x

    authors: Joo T,Choi JH,Lee JH,Park SE,Jeon Y,Jung SH,Woo HG

    更新日期:2019-02-20 00:00:00

  • SplicerAV: a tool for mining microarray expression data for changes in RNA processing.

    abstract:BACKGROUND:Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but h...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-108

    authors: Robinson TJ,Dinan MA,Dewhirst M,Garcia-Blanco MA,Pearson JL

    更新日期:2010-02-25 00:00:00

  • FIGENIX: intelligent automation of genomic annotation: expertise integration in a new software platform.

    abstract:BACKGROUND:Two of the main objectives of the genomic and post-genomic era are to structurally and functionally annotate genomes which consists of detecting genes' position and structure, and inferring their function (as well as of other features of genomes). Structural and functional annotation both require the complex...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-198

    authors: Gouret P,Vitiello V,Balandraud N,Gilles A,Pontarotti P,Danchin EG

    更新日期:2005-08-05 00:00:00

  • REGULATOR: a database of metazoan transcription factors and maternal factors for developmental studies.

    abstract:BACKGROUND:Genes encoding transcription factors that constitute gene-regulatory networks and maternal factors accumulating in egg cytoplasm are two classes of essential genes that play crucial roles in developmental processes. Transcription factors control the expression of their downstream target genes by interacting ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0552-x

    authors: Wang K,Nishida H

    更新日期:2015-04-10 00:00:00

  • COPASAAR--a database for proteomic analysis of single amino acid repeats.

    abstract:BACKGROUND:Single amino acid repeats make up a significant proportion in all of the proteomes that have currently been determined. They have been shown to be functionally and medically significant, and are associated with cancers and neuro-degenerative diseases such as Huntington's Chorea, where a poly-glutamine repeat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-196

    authors: Depledge DP,Dalby AR

    更新日期:2005-08-03 00:00:00

  • PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility.

    abstract:BACKGROUND:Protein solvent accessibility prediction is a pivotal intermediate step towards modeling protein tertiary structures directly from one-dimensional sequences. It also plays an important part in identifying protein folds and domains. Although some methods have been presented to the protein solvent accessibilit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0851-2

    authors: Fan C,Liu D,Huang R,Chen Z,Deng L

    更新日期:2016-01-11 00:00:00

  • Mutation status coupled with RNA-sequencing data can efficiently identify important non-significantly mutated genes serving as diagnostic biomarkers of endometrial cancer.

    abstract:BACKGROUND:Endometrial cancers (ECs) are one of the most common types of malignant tumor in females. Substantial efforts had been made to identify significantly mutated genes (SMGs) in ECs and use them as biomarkers for the classification of histological subtypes and the prediction of clinical outcomes. However, the im...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1891-6

    authors: Liu K,He L,Liu Z,Xu J,Liu Y,Kuang Q,Wen Z,Li M

    更新日期:2017-12-28 00:00:00

  • Structural characterization of genomes by large scale sequence-structure threading: application of reliability analysis in structural genomics.

    abstract:BACKGROUND:We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, ma...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-101

    authors: Cherkasov A,Ho Sui SJ,Brunham RC,Jones SJ

    更新日期:2004-07-26 00:00:00

  • An unsupervised classification scheme for improving predictions of prokaryotic TIS.

    abstract:BACKGROUND:Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-121

    authors: Tech M,Meinicke P

    更新日期:2006-03-09 00:00:00

  • The ontology of biological sequences.

    abstract:BACKGROUND:Biological sequences play a major role in molecular and computational biology. They are studied as information-bearing entities that make up DNA, RNA or proteins. The Sequence Ontology, which is part of the OBO Foundry, contains descriptions and definitions of sequences and their properties. Yet the most bas...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-377

    authors: Hoehndorf R,Kelso J,Herre H

    更新日期:2009-11-18 00:00:00

  • Constructing a meaningful evolutionary average at the phylogenetic center of mass.

    abstract:BACKGROUND:As a consequence of the evolutionary process, data collected from related species tend to be similar. This similarity by descent can obscure subtler signals in the data such as the evidence of constraint on variation due to shared selective pressures. In comparative sequence analysis, for example, sequence s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-222

    authors: Stone EA,Sidow A

    更新日期:2007-06-26 00:00:00

  • A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies.

    abstract:BACKGROUND:Identification of causal SNPs in most genome wide association studies relies on approaches that consider each SNP individually. However, there is a strong correlation structure among SNPs that needs to be taken into account. Hence, increasingly modern computationally expensive regression methods are employed...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-284

    authors: Zuber V,Duarte Silva AP,Strimmer K

    更新日期:2012-10-31 00:00:00

  • Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens.

    abstract:BACKGROUND:The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologist...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-264

    authors: Yin Z,Zhou X,Bakal C,Li F,Sun Y,Perrimon N,Wong ST

    更新日期:2008-06-05 00:00:00

  • Method to represent the distribution of QTL additive and dominance effects associated with quantitative traits in computer simulation.

    abstract:BACKGROUND:Computer simulation is a resource which can be employed to identify optimal breeding strategies to effectively and efficiently achieve specific goals in developing improved cultivars. In some instances, it is crucial to assess in silico the options as well as the impact of various crossing schemes and breedi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-0906-z

    authors: Sun X,Mumm RH

    更新日期:2016-02-06 00:00:00

  • Quality determination and the repair of poor quality spots in array experiments.

    abstract:BACKGROUND:A common feature of microarray experiments is the occurrence of missing gene expression data. These missing values occur for a variety of reasons, in particular, because of the filtering of poor quality spots and the removal of undefined values when a logarithmic transformation is applied to negative backgro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-234

    authors: Tom BD,Gilks WR,Brooke-Powell ET,Ajioka JW

    更新日期:2005-09-26 00:00:00

  • Multi-scale structural community organisation of the human genome.

    abstract:BACKGROUND:Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1616-x

    authors: Boulos RE,Tremblay N,Arneodo A,Borgnat P,Audit B

    更新日期:2017-04-11 00:00:00

  • Googling DNA sequences on the World Wide Web.

    abstract:BACKGROUND:New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bio...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S14-S4

    authors: Hajibabaei M,Singer GA

    更新日期:2009-11-10 00:00:00

  • Comparative study of discretization methods of microarray data for inferring transcriptional regulatory networks.

    abstract:BACKGROUND:Microarray data discretization is a basic preprocess for many algorithms of gene regulatory network inference. Some common discretization methods in informatics are used to discretize microarray data. Selection of the discretization method is often arbitrary and no systematic comparison of different discreti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-520

    authors: Li Y,Liu L,Bai X,Cai H,Ji W,Guo D,Zhu Y

    更新日期:2010-10-19 00:00:00

  • A semi-parametric statistical model for integrating gene expression profiles across different platforms.

    abstract:BACKGROUND:Determining differentially expressed genes (DEGs) between biological samples is the key to understand how genotype gives rise to phenotype. RNA-seq and microarray are two main technologies for profiling gene expression levels. However, considerable discrepancy has been found between DEGs detected using the t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0847-y

    authors: Lyu Y,Li Q

    更新日期:2016-01-11 00:00:00

  • CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation.

    abstract:BACKGROUND:The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB), thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been develo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-439

    authors: Li GH,Huang JF

    更新日期:2010-08-27 00:00:00

  • A database of phylogenetically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm.

    abstract:BACKGROUND:The process of horizontal gene transfer (HGT) is believed to be widespread in Bacteria and Archaea, but little comparative data is available addressing its occurrence in complete microbial genomes. Collection of high-quality, automated HGT prediction data based on phylogenetic evidence has previously been im...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-419

    authors: Podell S,Gaasterland T,Allen EE

    更新日期:2008-10-07 00:00:00

  • Inferring gene expression dynamics via functional regression analysis.

    abstract:BACKGROUND:Temporal gene expression profiles characterize the time-dynamics of expression of specific genes and are increasingly collected in current gene expression experiments. In the analysis of experiments where gene expression is obtained over the life cycle, it is of interest to relate temporal patterns of gene e...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-60

    authors: Müller HG,Chiou JM,Leng X

    更新日期:2008-01-28 00:00:00