A simple method for assessing sample sizes in microarray experiments.

Abstract:

BACKGROUND:In this short article, we discuss a simple method for assessing sample size requirements in microarray experiments. RESULTS:Our method starts with the output from a permutation-based analysis for a set of pilot data, e.g. from the SAM package. Then for a given hypothesized mean difference and various samples sizes, we estimate the false discovery rate and false negative rate of a list of genes; these are also interpretable as per gene power and type I error. We also discuss application of our method to other kinds of response variables, for example survival outcomes. CONCLUSION:Our method seems to be useful for sample size assessment in microarray experiments.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Tibshirani R

doi

10.1186/1471-2105-7-106

keywords:

subject

Has Abstract

pub_date

2006-03-02 00:00:00

pages

106

issn

1471-2105

pii

1471-2105-7-106

journal_volume

7

pub_type

杂志文章
  • MeDEStrand: an improved method to infer genome-wide absolute methylation levels from DNA enrichment data.

    abstract:BACKGROUND:DNA methylation of CpG dinucleotides is an essential epigenetic modification that plays a key role in transcription. Widely used DNA enrichment-based methods offer high coverage for measuring methylated CpG dinucleotides, with the lowest cost per CpG covered genome-wide. However, these methods measure the DN...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2574-7

    authors: Xu J,Liu S,Yin P,Bulun S,Dai Y

    更新日期:2018-12-22 00:00:00

  • Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model.

    abstract:BACKGROUND:Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on c...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-272

    authors: He X,Sarma MS,Ling X,Chee B,Zhai C,Schatz B

    更新日期:2010-05-20 00:00:00

  • Large scale statistical inference of signaling pathways from RNAi and microarray data.

    abstract:BACKGROUND:The advent of RNA interference techniques enables the selective silencing of biologically interesting genes in an efficient way. In combination with DNA microarray technology this enables researchers to gain insights into signaling pathways by observing downstream effects of individual knock-downs on gene ex...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-386

    authors: Froehlich H,Fellmann M,Sueltmann H,Poustka A,Beissbarth T

    更新日期:2007-10-15 00:00:00

  • Discovering motifs that induce sequencing errors.

    abstract:BACKGROUND:Elevated sequencing error rates are the most predominant obstacle in single-nucleotide polymorphism (SNP) detection, which is a major goal in the bulk of current studies using next-generation sequencing (NGS). Beyond routinely handled generic sources of errors, certain base calling errors relate to specific ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S5-S1

    authors: Allhoff M,Schönhuth A,Martin M,Costa IG,Rahmann S,Marschall T

    更新日期:2013-01-01 00:00:00

  • A weighted string kernel for protein fold recognition.

    abstract:BACKGROUND:Alignment-free methods for comparing protein sequences have proved to be viable alternatives to approaches that first rely on an alignment of the sequences to be compared. Much work however need to be done before those methods provide reliable fold recognition for proteins whose sequences share little simila...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1795-5

    authors: Nojoomi S,Koehl P

    更新日期:2017-08-25 00:00:00

  • MGOGP: a gene module-based heuristic algorithm for cancer-related gene prioritization.

    abstract:BACKGROUND:Prioritizing genes according to their associations with a cancer allows researchers to explore genes in more informed ways. By far, Gene-centric or network-centric gene prioritization methods are predominated. Genes and their protein products carry out cellular processes in the context of functional modules....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2216-0

    authors: Su L,Liu G,Bai T,Meng X,Ma Q

    更新日期:2018-06-05 00:00:00

  • A novel similarity-measure for the analysis of genetic data in complex phenotypes.

    abstract:BACKGROUND:Recent technological advances in DNA sequencing and genotyping have led to the accumulation of a remarkable quantity of data on genetic polymorphisms. However, the development of new statistical and computational tools for effective processing of these data has not been equally as fast. In particular, Machin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S6-S24

    authors: Lagani V,Montesanto A,Di Cianni F,Moreno V,Landi S,Conforti D,Rose G,Passarino G

    更新日期:2009-06-16 00:00:00

  • Protein-DNA docking with a coarse-grained force field.

    abstract:BACKGROUND:Protein-DNA interactions are important for many cellular processes, however structural knowledge for a large fraction of known and putative complexes is still lacking. Computational docking methods aim at the prediction of complex architecture given detailed structures of its constituents. They are becoming ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-228

    authors: Setny P,Bahadur RP,Zacharias M

    更新日期:2012-09-11 00:00:00

  • Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data.

    abstract:BACKGROUND:Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-282

    authors: Lopez D,Casero D,Cokus SJ,Merchant SS,Pellegrini M

    更新日期:2011-07-12 00:00:00

  • Decoding HMMs using the k best paths: algorithms and applications.

    abstract:BACKGROUND:Traditional algorithms for hidden Markov model decoding seek to maximize either the probability of a state path or the number of positions of a sequence assigned to the correct state. These algorithms provide only a single answer and in practice do not produce good results. RESULTS:We explore an alternative...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S1-S28

    authors: Brown DG,Golod D

    更新日期:2010-01-18 00:00:00

  • Hotspot Hunter: a computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes.

    abstract:BACKGROUND:T-cell epitopes that promiscuously bind to multiple alleles of a human leukocyte antigen (HLA) supertype are prime targets for development of vaccines and immunotherapies because they are relevant to a large proportion of the human population. The presence of clusters of promiscuous T-cell epitopes, immunolo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S1-S19

    authors: Zhang GL,Khan AM,Srinivasan KN,Heiny A,Lee K,Kwoh CK,August JT,Brusic V

    更新日期:2008-01-01 00:00:00

  • Snpdat: easy and rapid annotation of results from de novo snp discovery projects for model and non-model organisms.

    abstract:BACKGROUND:Single nucleotide polymorphisms (SNPs) are the most abundant genetic variant found in vertebrates and invertebrates. SNP discovery has become a highly automated, robust and relatively inexpensive process allowing the identification of many thousands of mutations for model and non-model organisms. Annotating ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-45

    authors: Doran AG,Creevey CJ

    更新日期:2013-02-08 00:00:00

  • An evidence-based approach to identify aging-related genes in Caenorhabditis elegans.

    abstract:BACKGROUND:Extensive studies have been carried out on Caenorhabditis elegans as a model organism to elucidate mechanisms of aging and the effects of perturbing known aging-related genes on lifespan and behavior. This research has generated large amounts of experimental data that is increasingly difficult to integrate a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0469-4

    authors: Callahan A,Cifuentes JJ,Dumontier M

    更新日期:2015-02-07 00:00:00

  • Evaluation of high-throughput functional categorization of human disease genes.

    abstract:BACKGROUND:Biological data that are well-organized by an ontology, such as Gene Ontology, enables high-throughput availability of the semantic web. It can also be used to facilitate high throughput classification of biomedical information. However, to our knowledge, no evaluation has been published on automating classi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S3-S7

    authors: Chen JL,Liu Y,Sam LT,Li J,Lussier YA

    更新日期:2007-05-09 00:00:00

  • Optimal sequencing depth design for whole genome re-sequencing in pigs.

    abstract:BACKGROUND:As whole-genome sequencing is becoming a routine technique, it is important to identify a cost-effective depth of sequencing for such studies. However, the relationship between sequencing depth and biological results from the aspects of whole-genome coverage, variant discovery power and the quality of varian...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3164-z

    authors: Jiang Y,Jiang Y,Wang S,Zhang Q,Ding X

    更新日期:2019-11-08 00:00:00

  • A MATLAB tool for pathway enrichment using a topology-based pathway regulation score.

    abstract:BACKGROUND:Handling the vast amount of gene expression data generated by genome-wide transcriptional profiling techniques is a challenging task, demanding an informed combination of pre-processing, filtering and analysis methods if meaningful biological conclusions are to be drawn. For example, a range of traditional s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0358-2

    authors: Ibrahim M,Jassim S,Cawthorne MA,Langlands K

    更新日期:2014-11-04 00:00:00

  • Methodology capture: discriminating between the "best" and the rest of community practice.

    abstract:BACKGROUND:The methodologies we use both enable and help define our research. However, as experimental complexity has increased the choice of appropriate methodologies has become an increasingly difficult task. This makes it difficult to keep track of available bioinformatics software, let alone the most suitable proto...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-359

    authors: Eales JM,Pinney JW,Stevens RD,Robertson DL

    更新日期:2008-09-01 00:00:00

  • Uncovering packaging features of co-regulated modules based on human protein interaction and transcriptional regulatory networks.

    abstract:BACKGROUND:Network co-regulated modules are believed to have the functionality of packaging multiple biological entities, and can thus be assumed to coordinate many biological functions in their network neighbouring regions. RESULTS:Here, we weighted edges of a human protein interaction network and a transcriptional r...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-392

    authors: Chen L,Wang H,Zhang L,Li W,Wang Q,Shang Y,He Y,He W,Li X,Tai J,Li X

    更新日期:2010-07-22 00:00:00

  • Extended analysis of benchmark datasets for Agilent two-color microarrays.

    abstract:BACKGROUND:As part of its broad and ambitious mission, the MicroArray Quality Control (MAQC) project reported the results of experiments using External RNA Controls (ERCs) on five microarray platforms. For most platforms, several different methods of data processing were considered. However, there was no similar consid...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-371

    authors: Kerr KF

    更新日期:2007-10-03 00:00:00

  • Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes.

    abstract:BACKGROUND:A major goal of the analysis of high-dimensional RNA expression data from tumor tissue is to identify prognostic signatures for discriminating patient subgroups. For this purpose genome-wide identification of bimodally expressed genes from gene array data is relevant because distinguishability of high and lo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-276

    authors: Hellwig B,Hengstler JG,Schmidt M,Gehrmann MC,Schormann W,Rahnenführer J

    更新日期:2010-05-25 00:00:00

  • Normalized N50 assembly metric using gap-restricted co-linear chaining.

    abstract:BACKGROUND:For the development of genome assembly tools, some comprehensive and efficiently computable validation measures are required to assess the quality of the assembly. The mostly used N50 measure summarizes the assembly results by the length of the scaffold (or contig) overlapping the midpoint of the length-orde...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-255

    authors: Mäkinen V,Salmela L,Ylinen J

    更新日期:2012-10-03 00:00:00

  • CellProfiler Tracer: exploring and validating high-throughput, time-lapse microscopy image data.

    abstract:BACKGROUND:Time-lapse analysis of cellular images is an important and growing need in biology. Algorithms for cell tracking are widely available; what researchers have been missing is a single open-source software package to visualize standard tracking output (from software like CellProfiler) in a way that allows conve...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0759-x

    authors: Bray MA,Carpenter AE

    更新日期:2015-11-04 00:00:00

  • Performance of a genetic algorithm for mass spectrometry proteomics.

    abstract:BACKGROUND:Recently, mass spectrometry data have been mined using a genetic algorithm to produce discriminatory models that distinguish healthy individuals from those with cancer. This algorithm is the basis for claims of 100% sensitivity and specificity in two related publicly available datasets. To date, no detailed ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-180

    authors: Jeffries NO

    更新日期:2004-11-19 00:00:00

  • Missing genes in the annotation of prokaryotic genomes.

    abstract:BACKGROUND:Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting). Therefore the question ari...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-131

    authors: Warren AS,Archuleta J,Feng WC,Setubal JC

    更新日期:2010-03-15 00:00:00

  • Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins.

    abstract:BACKGROUND:Molecular docking is a widely-employed method in structure-based drug design. An essential component of molecular docking programs is a scoring function (SF) that can be used to identify the most stable binding pose of a ligand, when bound to a receptor protein, from among a large set of candidate poses. Des...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S6-S3

    authors: Ashtawy HM,Mahapatra NR

    更新日期:2015-01-01 00:00:00

  • Use of a multi-way method to analyze the amino acid composition of a conserved group of orthologous proteins in prokaryotes.

    abstract:BACKGROUND:Amino acids in proteins are not used equally. Some of the differences in the amino acid composition of proteins are between species (mainly due to nucleotide composition and lifestyle) and some are between proteins from the same species (related to protein function, expression or subcellular localization, fo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-257

    authors: Pasamontes A,Garcia-Vallve S

    更新日期:2006-05-18 00:00:00

  • Finite mixture clustering of human tissues with different levels of IGF-1 splice variants mRNA transcripts.

    abstract:BACKGROUND:This study addresses a recurrent biological problem, that is to define a formal clustering structure for a set of tissues on the basis of the relative abundance of multiple alternatively spliced isoforms mRNAs generated by the same gene. To this aim, we have used a model-based clustering approach, based on a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0689-7

    authors: Pelosi M,Alfò M,Martella F,Pappalardo E,Musarò A

    更新日期:2015-09-15 00:00:00

  • GraphDNA: a Java program for graphical display of DNA composition analyses.

    abstract:BACKGROUND:Under conditions of no strand bias the number of Gs is equal to that of Cs for each DNA strand; similarly, the total number of Ts is equal to that of As. However, within each strand there are considerable local deviations from the A = T and G = C equality. These asymmetries in nucleotide composition have bee...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-21

    authors: Thomas JM,Horspool D,Brown G,Tcherepanov V,Upton C

    更新日期:2007-01-23 00:00:00

  • SAlign-a structure aware method for global PPI network alignment.

    abstract:BACKGROUND:High throughput experiments have generated a significantly large amount of protein interaction data, which is being used to study protein networks. Studying complete protein networks can reveal more insight about healthy/disease states than studying proteins in isolation. Similarly, a comparative study of pr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03827-5

    authors: Ayub U,Haider I,Naveed H

    更新日期:2020-11-04 00:00:00

  • Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data.

    abstract:BACKGROUND:Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published da...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-203

    authors: Zhang Y,Xuan J,de los Reyes BG,Clarke R,Ressom HW

    更新日期:2008-04-21 00:00:00