Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays.

Abstract:

BACKGROUND:Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. RESULTS:We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that interarray variability is small (only around 2% of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units ( 6% of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. CONCLUSIONS:This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Chain B,Bowen H,Hammond J,Posch W,Rasaiyaah J,Tsang J,Noursadeghi M

doi

10.1186/1471-2105-11-344

subject

Has Abstract

pub_date

2010-06-24 00:00:00

pages

344

issn

1471-2105

pii

1471-2105-11-344

journal_volume

11

pub_type

杂志文章
  • The EnzymeTracker: an open-source laboratory information management system for sample tracking.

    abstract:BACKGROUND:In many laboratories, researchers store experimental data on their own workstation using spreadsheets. However, this approach poses a number of problems, ranging from sharing issues to inefficient data-mining. Standard spreadsheets are also error-prone, as data do not undergo any validation process. To overc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-15

    authors: Triplet T,Butler G

    更新日期:2012-01-26 00:00:00

  • ImiRP: a computational approach to microRNA target site mutation.

    abstract:BACKGROUND:MicroRNAs (miRNAs) are small ~22 nucleotide non-coding RNAs that function as post-transcriptional regulators of messenger RNA (mRNA) through base-pairing to 6-8 nucleotide long target sites, usually located within the mRNA 3' untranslated region. A common approach to validate and probe microRNA-mRNA interact...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1057-y

    authors: Ryan BC,Werner TS,Howard PL,Chow RL

    更新日期:2016-04-27 00:00:00

  • ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction.

    abstract:BACKGROUND:Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case-control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Meth...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-61

    authors: Hajiloo M,Sapkota Y,Mackey JR,Robson P,Greiner R,Damaraju S

    更新日期:2013-02-22 00:00:00

  • Structural alignment of protein descriptors - a combinatorial model.

    abstract:BACKGROUND:Structural alignment of proteins is one of the most challenging problems in molecular biology. The tertiary structure of a protein strictly correlates with its function and computationally predicted structures are nowadays a main premise for understanding the latter. However, computationally derived 3D model...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1237-9

    authors: Antczak M,Kasprzak M,Lukasiak P,Blazewicz J

    更新日期:2016-09-17 00:00:00

  • Machine learning for discovering missing or wrong protein function annotations : A comparison using updated benchmark datasets.

    abstract:BACKGROUND:A massive amount of proteomic data is generated on a daily basis, nonetheless annotating all sequences is costly and often unfeasible. As a countermeasure, machine learning methods have been used to automatically annotate new protein functions. More specifically, many studies have investigated hierarchical m...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1186/s12859-019-3060-6

    authors: Nakano FK,Lietaert M,Vens C

    更新日期:2019-09-23 00:00:00

  • SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services.

    abstract:BACKGROUND:SSWAP (Simple Semantic Web Architecture and Protocol; pronounced "swap") is an architecture, protocol, and platform for using reasoning to semantically integrate heterogeneous disparate data and services on the web. SSWAP was developed as a hybrid semantic web services technology to overcome limitations foun...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-309

    authors: Gessler DD,Schiltz GS,May GD,Avraham S,Town CD,Grant D,Nelson RT

    更新日期:2009-09-23 00:00:00

  • Meta-aligner: long-read alignment based on genome statistics.

    abstract:BACKGROUND:Current development of sequencing technologies is towards generating longer and noisier reads. Evidently, accurate alignment of these reads play an important role in any downstream analysis. Similarly, reducing the overall cost of sequencing is related to the time consumption of the aligner. The tradeoff bet...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1518-y

    authors: Nashta-Ali D,Aliyari A,Ahmadian Moghadam A,Edrisi MA,Motahari SA,Hossein Khalaj B

    更新日期:2017-02-23 00:00:00

  • The development of a comparison approach for Illumina bead chips unravels unexpected challenges applying newest generation microarrays.

    abstract:BACKGROUND:The MAQC project demonstrated that microarrays with comparable content show inter- and intra-platform reproducibility. However, since the content of gene databases still increases, the development of new generations of microarrays covering new content is mandatory. To better understand the potential challeng...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-186

    authors: Eggle D,Debey-Pascher S,Beyer M,Schultze JL

    更新日期:2009-06-18 00:00:00

  • Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites.

    abstract:BACKGROUND:Kernel-based learning algorithms are among the most advanced machine learning methods and have been successfully applied to a variety of sequence classification tasks within the field of bioinformatics. Conventional kernels utilized so far do not provide an easy interpretation of the learnt representations i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-169

    authors: Meinicke P,Tech M,Morgenstern B,Merkl R

    更新日期:2004-10-28 00:00:00

  • SPdb--a signal peptide database.

    abstract:BACKGROUND:The signal peptide plays an important role in protein targeting and protein translocation in both prokaryotic and eukaryotic cells. This transient, short peptide sequence functions like a postal address on an envelope by targeting proteins for secretion or for transfer to specific organelles for further proc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-249

    authors: Choo KH,Tan TW,Ranganathan S

    更新日期:2005-10-13 00:00:00

  • Venn-diaNet : venn diagram based network propagation analysis framework for comparing multiple biological experiments.

    abstract:BACKGROUND:The main research topic in this paper is how to compare multiple biological experiments using transcriptome data, where each experiment is measured and designed to compare control and treated samples. Comparison of multiple biological experiments is usually performed in terms of the number of DEGs in an arbi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3302-7

    authors: Hur B,Kang D,Lee S,Moon JH,Lee G,Kim S

    更新日期:2019-12-27 00:00:00

  • Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme.

    abstract:BACKGROUND:Bioluminescent proteins (BLPs) widely exist in many living organisms. As BLPs are featured by the capability of emitting lights, they can be served as biomarkers and easily detected in biomedical research, such as gene expression analysis and signal transduction pathways. Therefore, accurate identification o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1709-6

    authors: Zhang J,Chai H,Yang G,Ma Z

    更新日期:2017-06-05 00:00:00

  • HH-suite3 for fast remote homology detection and deep protein annotation.

    abstract:BACKGROUND:HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous proteins. RESULTS:We developed a single-instructi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3019-7

    authors: Steinegger M,Meier M,Mirdita M,Vöhringer H,Haunsberger SJ,Söding J

    更新日期:2019-09-14 00:00:00

  • An iterative block-shifting approach to retention time alignment that preserves the shape and area of gas chromatography-mass spectrometry peaks.

    abstract:BACKGROUND:Metabolomics, petroleum and biodiesel chemistry, biomarker discovery, and other fields which rely on high-resolution profiling of complex chemical mixtures generate datasets which contain millions of detector intensity readings, each uniquely addressed along dimensions of time (e.g., retention time of chemic...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S9-S15

    authors: Chae M,Shmookler Reis RJ,Thaden JJ

    更新日期:2008-08-12 00:00:00

  • CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies.

    abstract:BACKGROUND:Current taxonomic classification tools use exact string matching algorithms that are effective to tackle the data from the next generation sequencing technology. However, the unique error patterns in the third generation sequencing (TGS) technologies could reduce the accuracy of these programs. RESULTS:We d...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03777-y

    authors: Bui VK,Wei C

    更新日期:2020-10-20 00:00:00

  • An efficient visualization tool for the analysis of protein mutation matrices.

    abstract:BACKGROUND:It is useful to develop a tool that would effectively describe protein mutation matrices specifically geared towards the identification of mutations that produce either wanted or unwanted effects, such as an increase or decrease in affinity, or a predisposition towards misfolding. Here, we describe a tool wh...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-218

    authors: David MP,Lapid CM,Daria VR

    更新日期:2008-04-28 00:00:00

  • A web services choreography scenario for interoperating bioinformatics applications.

    abstract:BACKGROUND:Very often genome-wide data analysis requires the interoperation of multiple databases and analytic tools. A large number of genome databases and bioinformatics applications are available through the web, but it is difficult to automate interoperation because: 1) the platforms on which the applications run a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-25

    authors: de Knikker R,Guo Y,Li JL,Kwan AK,Yip KY,Cheung DW,Cheung KH

    更新日期:2004-03-10 00:00:00

  • The identification of informative genes from multiple datasets with increasing complexity.

    abstract:BACKGROUND:In microarray data analysis, factors such as data quality, biological variation, and the increasingly multi-layered nature of more complex biological systems complicates the modelling of regulatory networks that can represent and capture the interactions among genes. We believe that the use of multiple datas...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-32

    authors: Anvar SY,'t Hoen PA,Tucker A

    更新日期:2010-01-15 00:00:00

  • Determining gene expression on a single pair of microarrays.

    abstract:BACKGROUND:In microarray experiments the numbers of replicates are often limited due to factors such as cost, availability of sample or poor hybridization. There are currently few choices for the analysis of a pair of microarrays where N = 1 in each condition. In this paper, we demonstrate the effectiveness of a new al...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-489

    authors: Reid RW,Fodor AA

    更新日期:2008-11-21 00:00:00

  • BINDER: computationally inferring a gene regulatory network for Mycobacterium abscessus.

    abstract:BACKGROUND:Although many of the genic features in Mycobacterium abscessus have been fully validated, a comprehensive understanding of the regulatory elements remains lacking. Moreover, there is little understanding of how the organism regulates its transcriptomic profile, enabling cells to survive in hostile environmen...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3042-8

    authors: Staunton PM,Miranda-CasoLuengo AA,Loftus BJ,Gormley IC

    更新日期:2019-09-10 00:00:00

  • Detecting intergene correlation changes in microarray analysis: a new approach to gene selection.

    abstract:BACKGROUND:Microarray technology is commonly used as a simple screening tool with a focus on selecting genes that exhibit extremely large differential expressions between different phenotypes. It lacks the ability to select genes that change their relationships with other genes in different biological conditions (diffe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-20

    authors: Hu R,Qiu X,Glazko G,Klebanov L,Yakovlev A

    更新日期:2009-01-15 00:00:00

  • Detecting disease-associated genotype patterns.

    abstract:BACKGROUND:In addition to single-locus (main) effects of disease variants, there is a growing consensus that gene-gene and gene-environment interactions may play important roles in disease etiology. However, for the very large numbers of genetic markers currently in use, it has proven difficult to develop suitable and ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S1-S75

    authors: Long Q,Zhang Q,Ott J

    更新日期:2009-01-30 00:00:00

  • Partition-based optimization model for generative anatomy modeling language (POM-GAML).

    abstract:BACKGROUND:This paper presents a novel approach for Generative Anatomy Modeling Language (GAML). This approach automatically detects the geometric partitions in 3D anatomy that in turn speeds up integrated non-linear optimization model in GAML for 3D anatomy modeling with constraints (e.g. joints). This integrated non-...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2626-7

    authors: Demirel D,Cetinsaya B,Halic T,Kockara S,Ahmadi S

    更新日期:2019-03-14 00:00:00

  • Linear predictive coding representation of correlated mutation for protein sequence alignment.

    abstract:BACKGROUND:Although both conservation and correlated mutation (CM) are important information reflecting the different sorts of context in multiple sequence alignment, most of alignment methods use sequence profiles that only represent conservation. There is no general way to represent correlated mutation and incorporat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S2-S2

    authors: Jeong CS,Kim D

    更新日期:2010-04-16 00:00:00

  • DNLC: differential network local consistency analysis.

    abstract:BACKGROUND:The biological network is highly dynamic. Functional relations between genes can be activated or deactivated depending on the biological conditions. On the genome-scale network, subnetworks that gain or lose local expression consistency may shed light on the regulatory mechanisms related to the changing biol...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3046-4

    authors: Lu J,Lu Y,Ding Y,Xiao Q,Liu L,Cai Q,Kong Y,Bai Y,Yu T

    更新日期:2019-12-24 00:00:00

  • Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering.

    abstract:BACKGROUND:Microarray technologies produced large amount of data. The hierarchical clustering is commonly used to identify clusters of co-expressed genes. However, microarray datasets often contain missing values (MVs) representing a major drawback for the use of the clustering methods. Usually the MVs are not treated,...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-114

    authors: de Brevern AG,Hazout S,Malpertuy A

    更新日期:2004-08-23 00:00:00

  • Bioinformatics research in the Asia Pacific: a 2007 update.

    abstract::We provide a 2007 update on the bioinformatics research in the Asia-Pacific from the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998. From 2002, APBioNet has organized the first International Conference on Bioinformatics (InCoB) bringing together scientists work...

    journal_title:BMC bioinformatics

    pub_type:

    doi:10.1186/1471-2105-9-S1-S1

    authors: Ranganathan S,Gribskov M,Tan TW

    更新日期:2008-01-01 00:00:00

  • A study on multi-omic oscillations in Escherichia coli metabolic networks.

    abstract:BACKGROUND:Two important challenges in the analysis of molecular biology information are data (multi-omic information) integration and the detection of patterns across large scale molecular networks and sequences. They are are actually coupled beause the integration of omic information may provide better means to detec...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2175-5

    authors: Bardozzo F,Lió P,Tagliaferri R

    更新日期:2018-07-09 00:00:00

  • Using affinity propagation for identifying subspecies among clonal organisms: lessons from M. tuberculosis.

    abstract:BACKGROUND:Classification and naming is a key step in the analysis, understanding and adequate management of living organisms. However, where to set limits between groups can be puzzling especially in clonal organisms. Within the Mycobacterium tuberculosis complex (MTC), the etiological agent of tuberculosis (TB), expe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-224

    authors: Borile C,Labarre M,Franz S,Sola C,Refrégier G

    更新日期:2011-06-02 00:00:00

  • From SNPs to pathways: integration of functional effect of sequence variations on models of cell signalling pathways.

    abstract:BACKGROUND:Single nucleotide polymorphisms (SNPs) are the most frequent type of sequence variation between individuals, and represent a promising tool for finding genetic determinants of complex diseases and understanding the differences in drug response. In this regard, it is of particular interest to study the effect...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S8-S6

    authors: Bauer-Mehren A,Furlong LI,Rautschka M,Sanz F

    更新日期:2009-08-27 00:00:00