Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information.

Abstract:

BACKGROUND:High-throughput technology allows for genome-wide measurements at different molecular levels for the same patient, e.g. single nucleotide polymorphisms (SNPs) and gene expression. Correspondingly, it might be beneficial to also integrate complementary information from different molecular levels when building multivariable risk prediction models for a clinical endpoint, such as treatment response or survival. Unfortunately, such a high-dimensional modeling task will often be complicated by a limited overlap of molecular measurements at different levels between patients, i.e. measurements from all molecular levels are available only for a smaller proportion of patients. RESULTS:We propose a sequential strategy for building clinical risk prediction models that integrate genome-wide measurements from two molecular levels in a complementary way. To deal with partial overlap, we develop an imputation approach that allows us to use all available data. This approach is investigated in two acute myeloid leukemia applications combining gene expression with either SNP or DNA methylation data. After obtaining a sparse risk prediction signature e.g. from SNP data, an automatically selected set of prognostic SNPs, by componentwise likelihood-based boosting, imputation is performed for the corresponding linear predictor by a linking model that incorporates e.g. gene expression measurements. The imputed linear predictor is then used for adjustment when building a prognostic signature from the gene expression data. For evaluation, we consider stability, as quantified by inclusion frequencies across resampling data sets. Despite an extremely small overlap in the application example with gene expression and SNPs, several genes are seen to be more stably identified when taking the (imputed) linear predictor from the SNP data into account. In the application with gene expression and DNA methylation, prediction performance with respect to survival also indicates that the proposed approach might work well. CONCLUSIONS:We consider imputation of linear predictor values to be a feasible and sensible approach for dealing with partial overlap in complementary integrative analysis of molecular measurements at different levels. More generally, these results indicate that a complementary strategy for integrating different molecular levels can result in more stable risk prediction signatures, potentially providing a more reliable insight into the underlying biology.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Hieke S,Benner A,Schlenl RF,Schumacher M,Bullinger L,Binder H

doi

10.1186/s12859-016-1183-6

subject

Has Abstract

pub_date

2016-08-30 00:00:00

pages

327

issue

1

issn

1471-2105

pii

10.1186/s12859-016-1183-6

journal_volume

17

pub_type

杂志文章
  • DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark.

    abstract:BACKGROUND:XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts. A more scalable implementation would reduce the need for specialized computational resources and enable increased exploration of the configuration parame...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3108-7

    authors: Linderman MD,Chia D,Wallace F,Nothaft FA

    更新日期:2019-10-11 00:00:00

  • Hit integration for identifying optimal spaced seeds.

    abstract:BACKGROUND:Introduction of spaced speeds opened a way of sensitivity improvement in homology search without loss of search speed. Since then, the efforts of finding optimal seed which maximizes the sensitivity have been continued today. The sensitivity of a seed is generally computed by its hit probability. However, th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S1-S37

    authors: Chung WH,Park SB

    更新日期:2010-01-18 00:00:00

  • Qxpak.5: old mixed model solutions for new genomics problems.

    abstract:BACKGROUND:Mixed models have a long and fruitful history in statistics. They are pertinent to genomics problems because they are highly versatile, accommodating a wide variety of situations within the same theoretical and algorithmic framework. RESULTS:Qxpak is a package for versatile statistical genomics, specificall...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-202

    authors: Pérez-Enciso M,Misztal I

    更新日期:2011-05-25 00:00:00

  • Protein subcellular localization prediction based on compartment-specific features and structure conservation.

    abstract:BACKGROUND:Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction h...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-330

    authors: Su EC,Chiu HS,Lo A,Hwang JK,Sung TY,Hsu WL

    更新日期:2007-09-08 00:00:00

  • QPath: a method for querying pathways in a protein-protein interaction network.

    abstract:BACKGROUND:Sequence comparison is one of the most prominent tools in biological research, and is instrumental in studying gene function and evolution. The rapid development of high-throughput technologies for measuring protein interactions calls for extending this fundamental operation to the level of pathways in prote...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-199

    authors: Shlomi T,Segal D,Ruppin E,Sharan R

    更新日期:2006-04-10 00:00:00

  • Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes.

    abstract:BACKGROUND:The landscape of biological and biomedical research is being changed rapidly with the invention of microarrays which enables simultaneous view on the transcription levels of a huge number of genes across different experimental conditions or time points. Using microarray data sets, clustering algorithms have ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-27

    authors: Maulik U,Mukhopadhyay A,Bandyopadhyay S

    更新日期:2009-01-20 00:00:00

  • Finite mixture clustering of human tissues with different levels of IGF-1 splice variants mRNA transcripts.

    abstract:BACKGROUND:This study addresses a recurrent biological problem, that is to define a formal clustering structure for a set of tissues on the basis of the relative abundance of multiple alternatively spliced isoforms mRNAs generated by the same gene. To this aim, we have used a model-based clustering approach, based on a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0689-7

    authors: Pelosi M,Alfò M,Martella F,Pappalardo E,Musarò A

    更新日期:2015-09-15 00:00:00

  • Inclusion of the fitness sharing technique in an evolutionary algorithm to analyze the fitness landscape of the genetic code adaptability.

    abstract:BACKGROUND:The canonical code, although prevailing in complex genomes, is not universal. It was shown the canonical genetic code superior robustness compared to random codes, but it is not clearly determined how it evolved towards its current form. The error minimization theory considers the minimization of point mutat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1608-x

    authors: Santos J,Monteagudo Á

    更新日期:2017-03-27 00:00:00

  • Shared data science infrastructure for genomics data.

    abstract:BACKGROUND:Creating a scalable computational infrastructure to analyze the wealth of information contained in data repositories is difficult due to significant barriers in organizing, extracting and analyzing relevant data. Shared data science infrastructures like Boag is needed to efficiently process and parse data co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2967-2

    authors: Bagheri H,Muppirala U,Masonbrink RE,Severin AJ,Rajan H

    更新日期:2019-08-22 00:00:00

  • Connectivity independent protein-structure alignment: a hierarchical approach.

    abstract:BACKGROUND:Protein-structure alignment is a fundamental tool to study protein function, evolution and model building. In the last decade several methods for structure alignment were introduced, but most of them ignore that structurally similar proteins can share the same spatial arrangement of secondary structure eleme...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-510

    authors: Kolbeck B,May P,Schmidt-Goenner T,Steinke T,Knapp EW

    更新日期:2006-11-21 00:00:00

  • OscoNet: inferring oscillatory gene networks.

    abstract:BACKGROUND:Oscillatory genes, with periodic expression at the mRNA and/or protein level, have been shown to play a pivotal role in many biological contexts. However, with the exception of the circadian clock and cell cycle, only a few such genes are known. Detecting oscillatory genes from snapshot single-cell experimen...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03561-y

    authors: Cutillo L,Boukouvalas A,Marinopoulou E,Papalopulu N,Rattray M

    更新日期:2020-08-21 00:00:00

  • 3off2: A network reconstruction algorithm based on 2-point and 3-point information statistics.

    abstract:BACKGROUND:The reconstruction of reliable graphical models from observational data is important in bioinformatics and other computational fields applying network reconstruction methods to large, yet finite datasets. The main network reconstruction approaches are either based on Bayesian scores, which enable the ranking...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0856-x

    authors: Affeldt S,Verny L,Isambert H

    更新日期:2016-01-20 00:00:00

  • Uncovering packaging features of co-regulated modules based on human protein interaction and transcriptional regulatory networks.

    abstract:BACKGROUND:Network co-regulated modules are believed to have the functionality of packaging multiple biological entities, and can thus be assumed to coordinate many biological functions in their network neighbouring regions. RESULTS:Here, we weighted edges of a human protein interaction network and a transcriptional r...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-392

    authors: Chen L,Wang H,Zhang L,Li W,Wang Q,Shang Y,He Y,He W,Li X,Tai J,Li X

    更新日期:2010-07-22 00:00:00

  • Fast batch searching for protein homology based on compression and clustering.

    abstract:BACKGROUND:In bioinformatics community, many tasks associate with matching a set of protein query sequences in large sequence datasets. To conduct multiple queries in the database, a common used method is to run BLAST on each original querey or on the concatenated queries. It is inefficient since it doesn't exploit the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1938-8

    authors: Ge H,Sun L,Yu J

    更新日期:2017-11-21 00:00:00

  • Sample size calculation while controlling false discovery rate for differential expression analysis with RNA-sequencing experiments.

    abstract:BACKGROUND:RNA-Sequencing (RNA-seq) experiments have been popularly applied to transcriptome studies in recent years. Such experiments are still relatively costly. As a result, RNA-seq experiments often employ a small number of replicates. Power analysis and sample size calculation are challenging in the context of dif...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-0994-9

    authors: Bi R,Liu P

    更新日期:2016-03-31 00:00:00

  • The Korean Bird Information System (KBIS) through open and public participation.

    abstract:BACKGROUND:The importance of biodiversity conservation has been increasing steadily due to its benefits to human beings. Recently, producing and managing biodiversity databases have become much easier because of the information technology (IT) advancement. This made the general public's participation in biodiversity co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S15-S11

    authors: Paik IH,Lim J,Chun BS,Jin SD,Yu JP,Lee JW,Bhak J,Paek WK

    更新日期:2009-12-03 00:00:00

  • Predicting MoRFs in protein sequences using HMM profiles.

    abstract:BACKGROUND:Intrinsically Disordered Proteins (IDPs) lack an ordered three-dimensional structure and are enriched in various biological processes. The Molecular Recognition Features (MoRFs) are functional regions within IDPs that undergo a disorder-to-order transition on binding to a partner protein. Identifying MoRFs i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1375-0

    authors: Sharma R,Kumar S,Tsunoda T,Patil A,Sharma A

    更新日期:2016-12-22 00:00:00

  • Identification of discriminative characteristics for clusters from biologic data with InforBIO software.

    abstract:BACKGROUND:There are a number of different methods for generation of trees and algorithms for phylogenetic analysis in the study of bacterial taxonomy. Genotypic information, such as SSU rRNA gene sequences, now plays a more prominent role in microbial systematics than does phenotypic information. However, the integrat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-281

    authors: Tanaka N,Uchino M,Miyazaki S,Sugawara H

    更新日期:2007-08-02 00:00:00

  • A novel computational strategy for DNA methylation imputation using mixture regression model (MRM).

    abstract:BACKGROUND:DNA methylation is an important heritable epigenetic mark that plays a crucial role in transcriptional regulation and the pathogenesis of various human disorders. The commonly used DNA methylation measurement approaches, e.g., Illumina Infinium HumanMethylation-27 and -450 BeadChip arrays (27 K and 450 K arr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03865-z

    authors: Yu F,Xu C,Deng HW,Shen H

    更新日期:2020-12-01 00:00:00

  • Improved functional prediction of proteins by learning kernel combinations in multilabel settings.

    abstract:BACKGROUND:We develop a probabilistic model for combining kernel matrices to predict the function of proteins. It extends previous approaches in that it can handle multiple labels which naturally appear in the context of protein function. RESULTS:Explicit modeling of multilabels significantly improves the capability o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S2-S12

    authors: Roth V,Fischer B

    更新日期:2007-05-03 00:00:00

  • Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model.

    abstract:BACKGROUND:Copy number variants (CNVs) have been demonstrated to occur at a high frequency and are now widely believed to make a significant contribution to the phenotypic variation in human populations. Array-based comparative genomic hybridization (array-CGH) and newly developed read-depth approach through ultrahigh ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-539

    authors: Zhang ZD,Gerstein MB

    更新日期:2010-10-31 00:00:00

  • OpWise: operons aid the identification of differentially expressed genes in bacterial microarray experiments.

    abstract:BACKGROUND:Differentially expressed genes are typically identified by analyzing the variation between replicate measurements. These procedures implicitly assume that there are no systematic errors in the data even though several sources of systematic error are known. RESULTS:OpWise estimates the amount of systematic e...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-19

    authors: Price MN,Arkin AP,Alm EJ

    更新日期:2006-01-13 00:00:00

  • Generating confidence intervals on biological networks.

    abstract:BACKGROUND:In the analysis of networks we frequently require the statistical significance of some network statistic, such as measures of similarity for the properties of interacting nodes. The structure of the network may introduce dependencies among the nodes and it will in general be necessary to account for these de...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-467

    authors: Thorne T,Stumpf MP

    更新日期:2007-11-30 00:00:00

  • Bios2mds: an R package for comparing orthologous protein families by metric multidimensional scaling.

    abstract:BACKGROUND:The distance matrix computed from multiple alignments of homologous sequences is widely used by distance-based phylogenetic methods to provide information on the evolution of protein families. This matrix can also be visualized in a low dimensional space by metric multidimensional scaling (MDS). Applied to p...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-133

    authors: Pelé J,Bécu JM,Abdi H,Chabbert M

    更新日期:2012-06-15 00:00:00

  • Evaluating eukaryotic secreted protein prediction.

    abstract:BACKGROUND:Improvements in protein sequence annotation and an increase in the number of annotated protein databases has fueled development of an increasing number of software tools to predict secreted proteins. Six software programs capable of high throughput and employing a wide range of prediction methods, SignalP 3....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-256

    authors: Klee EW,Ellis LB

    更新日期:2005-10-14 00:00:00

  • Structator: fast index-based search for RNA sequence-structure patterns.

    abstract:BACKGROUND:The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running ti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-214

    authors: Meyer F,Kurtz S,Backofen R,Will S,Beckstette M

    更新日期:2011-05-27 00:00:00

  • Coordinates and intervals in graph-based reference genomes.

    abstract:BACKGROUND:It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1678-9

    authors: Rand KD,Grytten I,Nederbragt AJ,Storvik GO,Glad IK,Sandve GK

    更新日期:2017-05-18 00:00:00

  • CONSTAX: a tool for improved taxonomic resolution of environmental fungal ITS sequences.

    abstract:BACKGROUND:One of the most crucial steps in high-throughput sequence-based microbiome studies is the taxonomic assignment of sequences belonging to operational taxonomic units (OTUs). Without taxonomic classification, functional and biological information of microbial communities cannot be inferred or interpreted. The ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1952-x

    authors: Gdanetz K,Benucci GMN,Vande Pol N,Bonito G

    更新日期:2017-12-06 00:00:00

  • Informative gene selection and the direct classification of tumors based on relative simplicity.

    abstract:BACKGROUND:Selecting a parsimonious set of informative genes to build highly generalized performance classifier is the most important task for the analysis of tumor microarray expression data. Many existing gene pair evaluation methods cannot highlight diverse patterns of gene pairs only used one strategy of vertical c...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-0893-0

    authors: Chen Y,Wang L,Li L,Zhang H,Yuan Z

    更新日期:2016-01-20 00:00:00

  • Genomic prediction of tuberculosis drug-resistance: benchmarking existing databases and prediction algorithms.

    abstract:BACKGROUND:It is possible to predict whether a tuberculosis (TB) patient will fail to respond to specific antibiotics by sequencing the genome of the infecting Mycobacterium tuberculosis (Mtb) and observing whether the pathogen carries specific mutations at drug-resistance sites. This advancement has led to the collati...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2658-z

    authors: Ngo TM,Teo YY

    更新日期:2019-02-08 00:00:00