Robust joint score tests in the application of DNA methylation data analysis.

Abstract:

BACKGROUND:Recently differential variability has been showed to be valuable in evaluating the association of DNA methylation to the risks of complex human diseases. The statistical tests based on both differential methylation level and differential variability can be more powerful than those based only on differential methylation level. Anh and Wang (2013) proposed a joint score test (AW) to simultaneously detect for differential methylation and differential variability. However, AW's method seems to be quite conservative and has not been fully compared with existing joint tests. RESULTS:We proposed three improved joint score tests, namely iAW.Lev, iAW.BF, and iAW.TM, and have made extensive comparisons with the joint likelihood ratio test (jointLRT), the Kolmogorov-Smirnov (KS) test, and the AW test. Systematic simulation studies showed that: 1) the three improved tests performed better (i.e., having larger power, while keeping nominal Type I error rates) than the other three tests for data with outliers and having different variances between cases and controls; 2) for data from normal distributions, the three improved tests had slightly lower power than jointLRT and AW. The analyses of two Illumina HumanMethylation27 data sets GSE37020 and GSE20080 and one Illumina Infinium MethylationEPIC data set GSE107080 demonstrated that three improved tests had higher true validation rates than those from jointLRT, KS, and AW. CONCLUSIONS:The three proposed joint score tests are robust against the violation of normality assumption and presence of outlying observations in comparison with other three existing tests. Among the three proposed tests, iAW.BF seems to be the most robust and effective one for all simulated scenarios and also in real data analyses.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Li X,Fu Y,Wang X,Qiu W

doi

10.1186/s12859-018-2185-3

subject

Has Abstract

pub_date

2018-05-18 00:00:00

pages

174

issue

1

issn

1471-2105

pii

10.1186/s12859-018-2185-3

journal_volume

19

pub_type

杂志文章
  • Impact of polymorphic transposable elements on transcription in lymphoblastoid cell lines from public data.

    abstract:BACKGROUND:Transposable elements (TEs) are DNA sequences able to mobilize themselves and to increase their copy-number in the host genome. In the past, they have been considered mainly selfish DNA without evident functions. Nevertheless, currently they are believed to have been extensively involved in the evolution of ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3113-x

    authors: Spirito G,Mangoni D,Sanges R,Gustincich S

    更新日期:2019-11-22 00:00:00

  • Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods.

    abstract:BACKGROUND:Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. In order to fully capture the intrinsic value and knowledge expressed within them, we need to take advantage of their inner structure, which implicitly co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-265

    authors: Groza T,Hunter J,Zankl A

    更新日期:2012-10-15 00:00:00

  • TableButler - a Windows based tool for processing large data tables generated with high-throughput methods.

    abstract:BACKGROUND:High-throughput "omics" based data analysis play emerging roles in life sciences and molecular diagnostics. This emphasizes the urgent need for user-friendly windows-based software interfaces that could process the diversity of large tab-delimited raw data files generated by these methods. Depending on the s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-235

    authors: Schwager C,Wirkner U,Abdollahi A,Huber PE

    更新日期:2009-07-29 00:00:00

  • Identification and utilization of inter-species conserved (ISC) probesets on Affymetrix human GeneChip platforms for the optimization of the assessment of expression patterns in non human primate (NHP) samples.

    abstract:BACKGROUND:While researchers have utilized versions of the Affymetrix human GeneChip for the assessment of expression patterns in non human primate (NHP) samples, there has been no comprehensive sequence analysis study undertaken to demonstrate that the probe sequences designed to detect human transcripts are reliably ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-165

    authors: Wang Z,Lewis MG,Nau ME,Arnold A,Vahey MT

    更新日期:2004-10-26 00:00:00

  • Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data.

    abstract:BACKGROUND:A fundamental fact in biology states that genes do not operate in isolation, and yet, methods that infer regulatory networks for single cell gene expression data have been slow to emerge. With single cell sequencing methods now becoming accessible, general network inference algorithms that were initially dev...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2217-z

    authors: Chen S,Mar JC

    更新日期:2018-06-19 00:00:00

  • A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships.

    abstract:BACKGROUND:Substitution matrices are key parameters for the alignment of two protein sequences, and consequently for most comparative genomics studies. The composition of biological sequences can vary importantly between species and groups of species, and classical matrices such as those in the BLOSUM series fail to ac...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-457

    authors: Lemaitre C,Barré A,Citti C,Tardy F,Thiaucourt F,Sirand-Pugnet P,Thébault P

    更新日期:2011-11-24 00:00:00

  • Accelerating a cross-correlation score function to search modifications using a single GPU.

    abstract:BACKGROUND:A cross-correlation (XCorr) score function is one of the most popular score functions utilized to search peptide identifications in databases, and many computer programs, such as SEQUEST, Comet, and Tide, currently use this score function. Recently, the HiXCorr algorithm was developed to speed up this score ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2559-6

    authors: Kim H,Han S,Um JH,Park K

    更新日期:2018-12-12 00:00:00

  • NASQAR: a web-based platform for high-throughput sequencing data analysis and visualization.

    abstract:BACKGROUND:As high-throughput sequencing applications continue to evolve, the rapid growth in quantity and variety of sequence-based data calls for the development of new software libraries and tools for data analysis and visualization. Often, effective use of these tools requires computational skills beyond those of m...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03577-4

    authors: Yousif A,Drou N,Rowe J,Khalfan M,Gunsalus KC

    更新日期:2020-06-29 00:00:00

  • Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model.

    abstract:BACKGROUND:Copy number variants (CNVs) have been demonstrated to occur at a high frequency and are now widely believed to make a significant contribution to the phenotypic variation in human populations. Array-based comparative genomic hybridization (array-CGH) and newly developed read-depth approach through ultrahigh ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-539

    authors: Zhang ZD,Gerstein MB

    更新日期:2010-10-31 00:00:00

  • Systematic integration of experimental data and models in systems biology.

    abstract:BACKGROUND:The behaviour of biological systems can be deduced from their mathematical models. However, multiple sources of data in diverse forms are required in the construction of a model in order to define its components and their biochemical reactions, and corresponding parameters. Automating the assembly and use of...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-582

    authors: Li P,Dada JO,Jameson D,Spasic I,Swainston N,Carroll K,Dunn W,Khan F,Malys N,Messiha HL,Simeonidis E,Weichart D,Winder C,Wishart J,Broomhead DS,Goble CA,Gaskell SJ,Kell DB,Westerhoff HV,Mendes P,Paton NW

    更新日期:2010-11-29 00:00:00

  • ILP-based maximum likelihood genome scaffolding.

    abstract:BACKGROUND:Interest in de novo genome assembly has been renewed in the past decade due to rapid advances in high-throughput sequencing (HTS) technologies which generate relatively short reads resulting in highly fragmented assemblies consisting of contigs. Additional long-range linkage information is typically used to ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S9-S9

    authors: Lindsay J,Salooti H,Măndoiu I,Zelikovsky A

    更新日期:2014-01-01 00:00:00

  • Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM.

    abstract:BACKGROUND:Identification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated method for reliably and ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-340

    authors: Li L,Yu S,Xiao W,Li Y,Huang L,Zheng X,Zhou S,Yang H

    更新日期:2014-11-20 00:00:00

  • Conceptual-level workflow modeling of scientific experiments using NMR as a case study.

    abstract:BACKGROUND:Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phase...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-31

    authors: Verdi KK,Ellis HJ,Gryk MR

    更新日期:2007-01-30 00:00:00

  • LS-NMF: a modified non-negative matrix factorization algorithm utilizing uncertainty estimates.

    abstract:BACKGROUND:Non-negative matrix factorisation (NMF), a machine learning algorithm, has been applied to the analysis of microarray data. A key feature of NMF is the ability to identify patterns that together explain the data as a linear combination of expression signatures. Microarray data generally includes individual e...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-175

    authors: Wang G,Kossenkov AV,Ochs MF

    更新日期:2006-03-28 00:00:00

  • SILVA tree viewer: interactive web browsing of the SILVA phylogenetic guide trees.

    abstract:BACKGROUND:Phylogenetic trees are an important tool to study the evolutionary relationships among organisms. The huge amount of available taxa poses difficulties in their interactive visualization. This hampers the interaction with the users to provide feedback for the further improvement of the taxonomic framework. R...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1841-3

    authors: Beccati A,Gerken J,Quast C,Yilmaz P,Glöckner FO

    更新日期:2017-09-30 00:00:00

  • Pairwise protein expression classifier for candidate biomarker discovery for early detection of human disease prognosis.

    abstract:BACKGROUND:An approach to molecular classification based on the comparative expression of protein pairs is presented. The method overcomes some of the present limitations in using peptide intensity data for class prediction for problems such as the detection of a disease, disease prognosis, or for predicting treatment ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-191

    authors: Kaur P,Schlatzer D,Cooke K,Chance MR

    更新日期:2012-08-07 00:00:00

  • Ontology driven integration platform for clinical and translational research.

    abstract::Semantic Web technologies offer a promising framework for integration of disparate biomedical data. In this paper we present the semantic information integration platform under development at the Center for Clinical and Translational Sciences (CCTS) at the University of Texas Health Science Center at Houston (UTHSC-H)...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S2-S2

    authors: Mirhaji P,Zhu M,Vagnoni M,Bernstam EV,Zhang J,Smith JW

    更新日期:2009-02-05 00:00:00

  • IILLS: predicting virus-receptor interactions based on similarity and semi-supervised learning.

    abstract:BACKGROUND:Viral infectious diseases are the serious threat for human health. The receptor-binding is the first step for the viral infection of hosts. To more effectively treat human viral infectious diseases, the hidden virus-receptor interactions must be discovered. However, current computational methods for predicti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3278-3

    authors: Yan C,Duan G,Wu FX,Wang J

    更新日期:2019-12-27 00:00:00

  • A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    abstract:BACKGROUND:Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1142-2

    authors: Thakur S,Guttman DS

    更新日期:2016-06-30 00:00:00

  • Directed acyclic graph kernels for structural RNA analysis.

    abstract:BACKGROUND:Recent discoveries of a large variety of important roles for non-coding RNAs (ncRNAs) have been reported by numerous researchers. In order to analyze ncRNAs by kernel methods including support vector machines, we propose stem kernels as an extension of string kernels for measuring the similarities between tw...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-318

    authors: Sato K,Mituyama T,Asai K,Sakakibara Y

    更新日期:2008-07-22 00:00:00

  • An automatic method to calculate heart rate from zebrafish larval cardiac videos.

    abstract:BACKGROUND:Zebrafish is a widely used model organism for studying heart development and cardiac-related pathogenesis. With the ability of surviving without a functional circulation at larval stages, strong genetic similarity between zebrafish and mammals, prolific reproduction and optically transparent embryos, zebrafi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2166-6

    authors: Kang CP,Tu HC,Fu TF,Wu JM,Chu PH,Chang DT

    更新日期:2018-05-09 00:00:00

  • DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark.

    abstract:BACKGROUND:XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts. A more scalable implementation would reduce the need for specialized computational resources and enable increased exploration of the configuration parame...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3108-7

    authors: Linderman MD,Chia D,Wallace F,Nothaft FA

    更新日期:2019-10-11 00:00:00

  • Discovering biological connections between experimental conditions based on common patterns of differential gene expression.

    abstract:BACKGROUND:Identifying similarities between patterns of differential gene expression provides an opportunity to identify similarities between the experimental and biological conditions that give rise to these gene expression alterations. The growing volume of gene expression data in open data repositories such as the N...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-381

    authors: Gower AC,Spira A,Lenburg ME

    更新日期:2011-09-27 00:00:00

  • Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data.

    abstract:BACKGROUND:This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulatio...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-310

    authors: Barros RC,Winck AT,Machado KS,Basgalupp MP,de Carvalho AC,Ruiz DD,de Souza ON

    更新日期:2012-11-21 00:00:00

  • SIS: a program to generate draft genome sequence scaffolds for prokaryotes.

    abstract:BACKGROUND:Decreasing costs of DNA sequencing have made prokaryotic draft genome sequences increasingly common. A contig scaffold is an ordering of contigs in the correct orientation. A scaffold can help genome comparisons and guide gap closure efforts. One popular technique for obtaining contig scaffolds is to map con...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-96

    authors: Dias Z,Dias U,Setubal JC

    更新日期:2012-05-14 00:00:00

  • XenofilteR: computational deconvolution of mouse and human reads in tumor xenograft sequence data.

    abstract:BACKGROUND:Mouse xenografts from (patient-derived) tumors (PDX) or tumor cell lines are widely used as models to study various biological and preclinical aspects of cancer. However, analyses of their RNA and DNA profiles are challenging, because they comprise reads not only from the grafted human cancer but also from t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2353-5

    authors: Kluin RJC,Kemper K,Kuilman T,de Ruiter JR,Iyer V,Forment JV,Cornelissen-Steijger P,de Rink I,Ter Brugge P,Song JY,Klarenbeek S,McDermott U,Jonkers J,Velds A,Adams DJ,Peeper DS,Krijgsman O

    更新日期:2018-10-04 00:00:00

  • NIFTI: an evolutionary approach for finding number of clusters in microarray data.

    abstract:BACKGROUND:Clustering techniques are routinely used in gene expression data analysis to organize the massive data. Clustering techniques arrange a large number of genes or assays into a few clusters while maximizing the intra-cluster similarity and inter-cluster separation. While clustering of genes facilitates learnin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-40

    authors: Jonnalagadda S,Srinivasan R

    更新日期:2009-01-30 00:00:00

  • Species-specific analysis of protein sequence motifs using mutual information.

    abstract:BACKGROUND:Protein sequence motifs are by definition short fragments of conserved amino acids, often associated with a specific function. Accordingly protein sequence profiles derived from multiple sequence alignments provide an alternative description of functional motifs characterizing families of related sequences. ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-164

    authors: Hummel J,Keshvari N,Weckwerth W,Selbig J

    更新日期:2005-06-29 00:00:00

  • Stochastic models for the in silico simulation of synaptic processes.

    abstract:BACKGROUND:Research in life sciences is benefiting from a large availability of formal description techniques and analysis methodologies. These allow both the phenomena investigated to be precisely modeled and virtual experiments to be performed in silico. Such experiments may result in easier, faster, and satisfying a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S4-S7

    authors: Bracciali A,Brunelli M,Cataldo E,Degano P

    更新日期:2008-04-25 00:00:00

  • MPAgenomics: an R package for multi-patient analysis of genomic markers.

    abstract:BACKGROUND:Last generations of Single Nucleotide Polymorphism (SNP) arrays allow to study copy-number variations in addition to genotyping measures. RESULTS:MPAgenomics, standing for multi-patient analysis (MPA) of genomic markers, is an R-package devoted to: (i) efficient segmentation and (ii) selection of genomic ma...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0394-y

    authors: Grimonprez Q,Celisse A,Blanck S,Cheok M,Figeac M,Marot G

    更新日期:2014-12-14 00:00:00