Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data.

Abstract:

BACKGROUND:Microarray technology has become very popular for globally evaluating gene expression in biological samples. However, non-linear variation associated with the technology can make data interpretation unreliable. Therefore, methods to correct this kind of technical variation are critical. Here we consider a method to reduce this type of variation applied after three common procedures for processing microarray data: MAS 5.0, RMA, and dChip. RESULTS:We commonly observe intensity-dependent technical variation between samples in a single microarray experiment. This is most common when MAS 5.0 is used to process probe level data, but we also see this type of technical variation with RMA and dChip processed data. Datasets with unbalanced numbers of up and down regulated genes seem to be particularly susceptible to this type of intensity-dependent technical variation. Unbalanced gene regulation is common when studying cancer samples or genetically manipulated animal models and preservation of this biologically relevant information, while removing technical variation has not been well addressed in the literature. We propose a method based on using rank-invariant, endogenous transcripts as reference points for normalization (GRSN). While the use of rank-invariant transcripts has been described previously, we have added to this concept by the creation of a global rank-invariant set of transcripts used to generate a robust average reference that is used to normalize all samples within a dataset. The global rank-invariant set is selected in an iterative manner so as to preserve unbalanced gene expression. Moreover, our method works well as an overlay that can be applied to data already processed with other probe set summary methods. We demonstrate that this additional normalization step at the "probe set level" effectively corrects a specific type of technical variation that often distorts samples in datasets. CONCLUSION:We have developed a simple post-processing tool to help detect and correct non-linear technical variation in microarray data and demonstrate how it can reduce technical variation and improve the results of downstream statistical gene selection and pathway identification methods.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Pelz CR,Kulesz-Martin M,Bagby G,Sears RC

doi

10.1186/1471-2105-9-520

subject

Has Abstract

pub_date

2008-12-04 00:00:00

pages

520

issn

1471-2105

pii

1471-2105-9-520

journal_volume

9

pub_type

杂志文章
  • Detecting disease-associated genotype patterns.

    abstract:BACKGROUND:In addition to single-locus (main) effects of disease variants, there is a growing consensus that gene-gene and gene-environment interactions may play important roles in disease etiology. However, for the very large numbers of genetic markers currently in use, it has proven difficult to develop suitable and ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S1-S75

    authors: Long Q,Zhang Q,Ott J

    更新日期:2009-01-30 00:00:00

  • Exploring matrix factorization techniques for significant genes identification of Alzheimer's disease microarray gene expression data.

    abstract:BACKGROUND:The wide use of high-throughput DNA microarray technology provide an increasingly detailed view of human transcriptome from hundreds to thousands of genes. Although biomedical researchers typically design microarray experiments to explore specific biological contexts, the relationships between genes are hard...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S5-S7

    authors: Kong W,Mou X,Hu X

    更新日期:2011-01-01 00:00:00

  • TGF-beta signaling proteins and the Protein Ontology.

    abstract:BACKGROUND:The Protein Ontology (PRO) is designed as a formal and principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from a classification of proteins on the basis of evolutionary relationships at the homeomorphic level to the representation of the multiple protein f...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S5-S3

    authors: Arighi CN,Liu H,Natale DA,Barker WC,Drabkin H,Blake JA,Smith B,Wu CH

    更新日期:2009-05-06 00:00:00

  • Estimation of evolutionary parameters using short, random and partial sequences from mixed samples of anonymous individuals.

    abstract:BACKGROUND:Over the last decade, next generation sequencing (NGS) has become widely available, and is now the sequencing technology of choice for most researchers. Nonetheless, NGS presents a challenge for the evolutionary biologists who wish to estimate evolutionary genetic parameters from a mixed sample of unlabelled...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0810-y

    authors: Wu SH,Rodrigo AG

    更新日期:2015-11-04 00:00:00

  • Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient.

    abstract:BACKGROUND:Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-158

    authors: Stoltzfus A,Lapp H,Matasci N,Deus H,Sidlauskas B,Zmasek CM,Vaidya G,Pontelli E,Cranston K,Vos R,Webb CO,Harmon LJ,Pirrung M,O'Meara B,Pennell MW,Mirarab S,Rosenberg MS,Balhoff JP,Bik HM,Heath TA,Midford PE,Brown

    更新日期:2013-05-13 00:00:00

  • An extensible six-step methodology to automatically generate fuzzy DSSs for diagnostic applications.

    abstract:BACKGROUND:The diagnosis of many diseases can be often formulated as a decision problem; uncertainty affects these problems so that many computerized Diagnostic Decision Support Systems (in the following, DDSSs) have been developed to aid the physician in interpreting clinical data and thus to improve the quality of th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S1-S4

    authors: d'Acierno A,Esposito M,De Pietro G

    更新日期:2013-01-01 00:00:00

  • LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network.

    abstract:BACKGROUND:Cancer is a complex disease which is characterized by the accumulation of genetic alterations during the patient's lifetime. With the development of the next-generation sequencing technology, multiple omics data, such as cancer genomic, epigenomic and transcriptomic data etc., can be measured from each indiv...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1332-y

    authors: Wei PJ,Zhang D,Xia J,Zheng CH

    更新日期:2016-12-23 00:00:00

  • Model based analysis of real-time PCR data from DNA binding dye protocols.

    abstract:BACKGROUND:Reverse transcription followed by real-time PCR is widely used for quantification of specific mRNA, and with the use of double-stranded DNA binding dyes it is becoming a standard for microarray data validation. Despite the kinetic information generated by real-time PCR, most popular analysis methods assume c...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-85

    authors: Alvarez MJ,Vila-Ortiz GJ,Salibe MC,Podhajcer OL,Pitossi FJ

    更新日期:2007-03-09 00:00:00

  • XLPM: efficient algorithm for the analysis of protein-protein contacts using chemical cross-linking mass spectrometry.

    abstract:BACKGROUND:Chemical cross-linking is used for protein-protein contacts mapping and for structural analysis. One of the difficulties in cross-linking studies is the analysis of mass-spectrometry data and the assignment of the site of cross-link incorporation. The difficulties are due to higher charges of fragment ions, ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S11-S16

    authors: Jaiswal M,Crabtree N,Bauer MA,Hall R,Raney KD,Zybailov BL

    更新日期:2014-01-01 00:00:00

  • Using affinity propagation for identifying subspecies among clonal organisms: lessons from M. tuberculosis.

    abstract:BACKGROUND:Classification and naming is a key step in the analysis, understanding and adequate management of living organisms. However, where to set limits between groups can be puzzling especially in clonal organisms. Within the Mycobacterium tuberculosis complex (MTC), the etiological agent of tuberculosis (TB), expe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-224

    authors: Borile C,Labarre M,Franz S,Sola C,Refrégier G

    更新日期:2011-06-02 00:00:00

  • Functionally specified protein signatures distinctive for each of the different blue copper proteins.

    abstract:BACKGROUND:Proteins having similar functions from different sources can be identified by the occurrence in their sequences, a conserved cluster of amino acids referred to as pattern, motif, signature or fingerprint. The wide usage of protein sequence analysis in par with the growth of databases signifies the importance...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-127

    authors: Giri AV,Anishetty S,Gautam P

    更新日期:2004-09-09 00:00:00

  • Towards an automatic classification of protein structural domains based on structural similarity.

    abstract:BACKGROUND:Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dict...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-74

    authors: Sam V,Tai CH,Garnier J,Gibrat JF,Lee B,Munson PJ

    更新日期:2008-01-31 00:00:00

  • Ab-origin: an enhanced tool to identify the sourcing gene segments in germline for rearranged antibodies.

    abstract:BACKGROUND:In the adaptive immune system, variable regions of immunoglobulin (IG) are encoded by random recombination of variable (V), diversity (D), and joining (J) gene segments in the germline. Partitioning the functional antibody sequences to their sourcing germline gene segments is vital not only for understanding...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S12-S20

    authors: Wang X,Wu D,Zheng S,Sun J,Tao L,Li Y,Cao Z

    更新日期:2008-12-12 00:00:00

  • Efficient use of unlabeled data for protein sequence classification: a comparative study.

    abstract:BACKGROUND:Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved acc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S4-S2

    authors: Kuksa P,Huang PH,Pavlovic V

    更新日期:2009-04-29 00:00:00

  • Linear predictive coding representation of correlated mutation for protein sequence alignment.

    abstract:BACKGROUND:Although both conservation and correlated mutation (CM) are important information reflecting the different sorts of context in multiple sequence alignment, most of alignment methods use sequence profiles that only represent conservation. There is no general way to represent correlated mutation and incorporat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S2-S2

    authors: Jeong CS,Kim D

    更新日期:2010-04-16 00:00:00

  • Identification of markers associated with global changes in DNA methylation regulation in cancers.

    abstract::DNA methylation exhibits different patterns in different cancers. DNA methylation rates at different genomic loci appear to be highly correlated in some samples but not in others. We call such phenomena conditional concordant relationships (CCRs). In this study, we explored DNA methylation patterns in 12 common cancer...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S13-S7

    authors: Qiu P,Zhang L

    更新日期:2012-01-01 00:00:00

  • Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to life span.

    abstract:BACKGROUND:The statistical modeling of biomedical corpora could yield integrated, coarse-to-fine views of biological phenomena that complement discoveries made from analysis of molecular sequence and profiling data. Here, the potential of such modeling is demonstrated by examining the 5,225 free-text items in the Caeno...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-250

    authors: Blei DM,Franks K,Jordan MI,Mian IS

    更新日期:2006-05-08 00:00:00

  • Bayesian detection of periodic mRNA time profiles without use of training examples.

    abstract:BACKGROUND:Detection of periodically expressed genes from microarray data without use of known periodic and non-periodic training examples is an important problem, e.g. for identifying genes regulated by the cell-cycle in poorly characterised organisms. Commonly the investigator is only interested in genes expressed at...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-63

    authors: Andersson CR,Isaksson A,Gustafsson MG

    更新日期:2006-02-09 00:00:00

  • Protein network prediction and topological analysis in Leishmania major as a tool for drug target selection.

    abstract:BACKGROUND:Leishmaniasis is a virulent parasitic infection that causes a worldwide disease burden. Most treatments have toxic side-effects and efficacy has decreased due to the emergence of resistant strains. The outlook is worsened by the absence of promising drug targets for this disease. We have taken a computationa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-484

    authors: Flórez AF,Park D,Bhak J,Kim BC,Kuchinsky A,Morris JH,Espinosa J,Muskus C

    更新日期:2010-09-27 00:00:00

  • Effective automated pipeline for 3D reconstruction of synapses based on deep learning.

    abstract:BACKGROUND:The locations and shapes of synapses are important in reconstructing connectomes and analyzing synaptic plasticity. However, current synapse detection and segmentation methods are still not adequate for accurately acquiring the synaptic connectivity, and they cannot effectively alleviate the burden of synaps...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2232-0

    authors: Xiao C,Li W,Deng H,Chen X,Yang Y,Xie Q,Han H

    更新日期:2018-07-13 00:00:00

  • Disease candidate gene identification and prioritization using protein interaction networks.

    abstract:BACKGROUND:Although most of the current disease candidate gene identification and prioritization methods depend on functional annotations, the coverage of the gene functional annotations is a limiting factor. In the current study, we describe a candidate gene prioritization method that is entirely based on protein-prot...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-73

    authors: Chen J,Aronow BJ,Jegga AG

    更新日期:2009-02-27 00:00:00

  • Nanopore-based kinetics analysis of individual antibody-channel and antibody-antigen interactions.

    abstract:BACKGROUND:The UNO/RIC Nanopore Detector provides a new way to study the binding and conformational changes of individual antibodies. Many critical questions regarding antibody function are still unresolved, questions that can be approached in a new way with the nanopore detector. RESULTS:We present evidence that diff...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S7-S20

    authors: Winters-Hilt S,Morales E,Amin I,Stoyanov A

    更新日期:2007-11-01 00:00:00

  • SEQprocess: a modularized and customizable pipeline framework for NGS processing in R package.

    abstract:BACKGROUNDS:Next-Generation Sequencing (NGS) is now widely used in biomedical research for various applications. Processing of NGS data requires multiple programs and customization of the processing pipelines according to the data platforms. However, rapid progress of the NGS applications and processing methods urgentl...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2676-x

    authors: Joo T,Choi JH,Lee JH,Park SE,Jeon Y,Jung SH,Woo HG

    更新日期:2019-02-20 00:00:00

  • DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark.

    abstract:BACKGROUND:XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts. A more scalable implementation would reduce the need for specialized computational resources and enable increased exploration of the configuration parame...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3108-7

    authors: Linderman MD,Chia D,Wallace F,Nothaft FA

    更新日期:2019-10-11 00:00:00

  • Prediction of novel long non-coding RNAs based on RNA-Seq data of mouse Klf1 knockout study.

    abstract:BACKGROUND:Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions. RESULTS:We present a computational pipeline for detecting novel lncRNAs fro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-331

    authors: Sun L,Zhang Z,Bailey TL,Perkins AC,Tallack MR,Xu Z,Liu H

    更新日期:2012-12-13 00:00:00

  • Recursive model for dose-time responses in pharmacological studies.

    abstract:BACKGROUND:Clinical studies often track dose-response curves of subjects over time. One can easily model the dose-response curve at each time point with Hill equation, but such a model fails to capture the temporal evolution of the curves. On the other hand, one can use Gompertz equation to model the temporal behaviors...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2831-4

    authors: Dhruba SR,Rahman A,Rahman R,Ghosh S,Pal R

    更新日期:2019-06-20 00:00:00

  • PhylDiag: identifying complex synteny blocks that include tandem duplications using phylogenetic gene trees.

    abstract:BACKGROUND:Extant genomes share regions where genes have the same order and orientation, which are thought to arise from the conservation of an ancestral order of genes during evolution. Such regions of so-called conserved synteny, or synteny blocks, must be precisely identified and quantified, as a prerequisite to bet...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-268

    authors: Lucas JM,Muffato M,Roest Crollius H

    更新日期:2014-08-08 00:00:00

  • Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts.

    abstract:BACKGROUND:The amount of scientific information about MicroRNAs (miRNAs) is growing exponentially, making it difficult for researchers to interpret experimental results. In this study, we present an automated text mining approach using Latent Semantic Indexing (LSI) for prioritization, clustering and functional annotat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1223-2

    authors: Roy S,Curry BC,Madahian B,Homayouni R

    更新日期:2016-10-06 00:00:00

  • Mining locus tags in PubMed Central to improve microbial gene annotation.

    abstract:BACKGROUND:The scientific literature contains millions of microbial gene identifiers within the full text and tables, but these annotations rarely get incorporated into public sequence databases. We propose to utilize the Open Access (OA) subset of PubMed Central (PMC) as a gene annotation database and have developed a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-43

    authors: Stubben CJ,Challacombe JF

    更新日期:2014-02-05 00:00:00

  • Mining differential top-k co-expression patterns from time course comparative gene expression datasets.

    abstract:BACKGROUND:Frequent pattern mining analysis applied on microarray dataset appears to be a promising strategy for identifying relationships between gene expression levels. Unfortunately, too many itemsets (co-expressed genes) are identified by this analysis method since it does not consider the importance of each gene w...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-230

    authors: Liu YC,Cheng CP,Tseng VS

    更新日期:2013-07-21 00:00:00