A prediction-based resampling method for estimating the number of clusters in a dataset.

Abstract:

BACKGROUND:Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-expression profiles. Two essential aspects of this clustering problem are: to estimate the number of clusters, if any, in a dataset; and to allocate tumor samples to these clusters, and assess the confidence of cluster assignments for individual samples. Here we address the first of these problems. RESULTS:We have developed a new prediction-based resampling method, Clest, to estimate the number of clusters in a dataset. The performance of the new and existing methods were compared using simulated data and gene-expression data from four recently published cancer microarray studies. Clest was generally found to be more accurate and robust than the six existing methods considered in the study. CONCLUSIONS:Focusing on prediction accuracy in conjunction with resampling produces accurate and robust estimates of the number of clusters.

journal_name

Genome Biol

journal_title

Genome biology

authors

Dudoit S,Fridlyand J

doi

10.1186/gb-2002-3-7-research0036

keywords:

subject

Has Abstract

pub_date

2002-06-25 00:00:00

pages

RESEARCH0036

issue

7

eissn

1474-7596

issn

1474-760X

journal_volume

3

pub_type

杂志文章
  • FAN-C: a feature-rich framework for the analysis and visualisation of chromosome conformation capture data.

    abstract::Chromosome conformation capture data, particularly from high-throughput approaches such as Hi-C, are typically very complex to analyse. Existing analysis tools are often single-purpose, or limited in compatibility to a small number of data formats, frequently making Hi-C analyses tedious and time-consuming. Here, we p...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-020-02215-9

    authors: Kruse K,Hug CB,Vaquerizas JM

    更新日期:2020-12-17 00:00:00

  • Systematic identification of genetic influences on methylation across the human life course.

    abstract:BACKGROUND:The influence of genetic variation on complex diseases is potentially mediated through a range of highly dynamic epigenetic processes exhibiting temporal variation during development and later life. Here we present a catalogue of the genetic influences on DNA methylation (methylation quantitative trait loci ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-016-0926-z

    authors: Gaunt TR,Shihab HA,Hemani G,Min JL,Woodward G,Lyttleton O,Zheng J,Duggirala A,McArdle WL,Ho K,Ring SM,Evans DM,Davey Smith G,Relton CL

    更新日期:2016-03-31 00:00:00

  • Investigating enhancer evolution with massively parallel reporter assays.

    abstract::A recent study in Genome Biology has characterized the evolution of candidate hominoid-specific liver enhancers by using massively parallel reporter assays (MPRAs). ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-018-1502-5

    authors: Kwon SB,Ernst J

    更新日期:2018-08-14 00:00:00

  • Clustered CTCF binding is an evolutionary mechanism to maintain topologically associating domains.

    abstract:BACKGROUND:CTCF binding contributes to the establishment of a higher-order genome structure by demarcating the boundaries of large-scale topologically associating domains (TADs). However, despite the importance and conservation of TADs, the role of CTCF binding in their evolution and stability remains elusive. RESULTS...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-019-1894-x

    authors: Kentepozidou E,Aitken SJ,Feig C,Stefflova K,Ibarra-Soria X,Odom DT,Roller M,Flicek P

    更新日期:2020-01-07 00:00:00

  • Discovery and functional prioritization of Parkinson's disease candidate genes from large-scale whole exome sequencing.

    abstract:BACKGROUND:Whole-exome sequencing (WES) has been successful in identifying genes that cause familial Parkinson's disease (PD). However, until now this approach has not been deployed to study large cohorts of unrelated participants. To discover rare PD susceptibility variants, we performed WES in 1148 unrelated cases an...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-017-1147-9

    authors: Jansen IE,Ye H,Heetveld S,Lechler MC,Michels H,Seinstra RI,Lubbe SJ,Drouet V,Lesage S,Majounie E,Gibbs JR,Nalls MA,Ryten M,Botia JA,Vandrovcova J,Simon-Sanchez J,Castillo-Lizardo M,Rizzu P,Blauwendraat C,Chouhan AK

    更新日期:2017-01-30 00:00:00

  • Genetic analysis of the human infective trypanosome Trypanosoma brucei gambiense: chromosomal segregation, crossing over, and the construction of a genetic map.

    abstract:BACKGROUND:Trypanosoma brucei is the causative agent of human sleeping sickness and animal trypanosomiasis in sub-Saharan Africa, and it has been subdivided into three subspecies: Trypanosoma brucei gambiense and Trypanosoma brucei rhodesiense, which cause sleeping sickness in humans, and the nonhuman infective Trypano...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2008-9-6-r103

    authors: Cooper A,Tait A,Sweeney L,Tweedie A,Morrison L,Turner CM,MacLeod A

    更新日期:2008-01-01 00:00:00

  • Bayesian analysis of gene expression levels: statistical quantification of relative mRNA level across multiple strains or treatments.

    abstract:BACKGROUND:Methods of microarray analysis that suit experimentalists using the technology are vital. Many methodologies discard the quantitative results inherent in cDNA microarray comparisons or cannot be flexibly applied to multifactorial experimental design. Here we present a flexible, quantitative Bayesian framewor...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2002-3-12-research0071

    authors: Townsend JP,Hartl DL

    更新日期:2002-01-01 00:00:00

  • Comparative genomics reveals the distinct evolutionary trajectories of the robust and complex coral lineages.

    abstract:BACKGROUND:Despite the biological and economic significance of scleractinian reef-building corals, the lack of large molecular datasets for a representative range of species limits understanding of many aspects of their biology. Within the Scleractinia, based on molecular evidence, it is generally recognised that there...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-018-1552-8

    authors: Ying H,Cooke I,Sprungala S,Wang W,Hayward DC,Tang Y,Huttley G,Ball EE,Forêt S,Miller DJ

    更新日期:2018-11-02 00:00:00

  • Supervised harvesting of expression trees.

    abstract:BACKGROUND:We propose a new method for supervised learning from gene expression data. We call it 'tree harvesting'. This technique starts with a hierarchical clustering of genes, then models the outcome variable as a sum of the average expression profiles of chosen clusters and their products. It can be applied to many...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2001-2-1-research0003

    authors: Hastie T,Tibshirani R,Botstein D,Brown P

    更新日期:2001-01-01 00:00:00

  • Multiclass classification of microarray data with repeated measurements: application to cancer.

    abstract::Prediction of the diagnostic category of a tissue sample from its gene-expression profile and selection of relevant genes for class prediction have important applications in cancer research. We have developed the uncorrelated shrunken centroid (USC) and error-weighted, uncorrelated shrunken centroid (EWUSC) algorithms...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2003-4-12-r83

    authors: Yeung KY,Bumgarner RE

    更新日期:2003-01-01 00:00:00

  • Mapping-by-sequencing accelerates forward genetics in barley.

    abstract::Mapping-by-sequencing has emerged as a powerful technique for genetic mapping in several plant and animal species. As this resequencing-based method requires a reference genome, its application to complex plant genomes with incomplete and fragmented sequence resources remains challenging. We perform exome sequencing o...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2014-15-6-r78

    authors: Mascher M,Jost M,Kuon JE,Himmelbach A,Aßfalg A,Beier S,Scholz U,Graner A,Stein N

    更新日期:2014-06-10 00:00:00

  • Comprehensive miRNA sequence analysis reveals survival differences in diffuse large B-cell lymphoma patients.

    abstract:BACKGROUND:Diffuse large B-cell lymphoma (DLBCL) is an aggressive disease, with 30% to 40% of patients failing to be cured with available primary therapy. microRNAs (miRNAs) are RNA molecules that attenuate expression of their mRNA targets. To characterize the DLBCL miRNome, we sequenced miRNAs from 92 DLBCL and 15 ben...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-014-0568-y

    authors: Lim EL,Trinh DL,Scott DW,Chu A,Krzywinski M,Zhao Y,Robertson AG,Mungall AJ,Schein J,Boyle M,Mottok A,Ennishi D,Johnson NA,Steidl C,Connors JM,Morin RD,Gascoyne RD,Marra MA

    更新日期:2015-01-29 00:00:00

  • Using ontologies to describe mouse phenotypes.

    abstract::The mouse is an important model of human genetic disease. Describing phenotypes of mutant mice in a standard, structured manner that will facilitate data mining is a major challenge for bioinformatics. Here we describe a novel, compositional approach to this problem which combines core ontologies from a variety of sou...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2004-6-1-r8

    authors: Gkoutos GV,Green EC,Mallon AM,Hancock JM,Davidson D

    更新日期:2005-01-01 00:00:00

  • Comparative genomics of mutualistic viruses of Glyptapanteles parasitic wasps.

    abstract:BACKGROUND:Polydnaviruses, double-stranded DNA viruses with segmented genomes, have evolved as obligate endosymbionts of parasitoid wasps. Virus particles are replication deficient and produced by female wasps from proviral sequences integrated into the wasp genome. These particles are co-injected with eggs into caterp...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2008-9-12-r183

    authors: Desjardins CA,Gundersen-Rindal DE,Hostetler JB,Tallon LJ,Fadrosh DW,Fuester RW,Pedroni MJ,Haas BJ,Schatz MC,Jones KM,Crabtree J,Forberger H,Nene V

    更新日期:2008-01-01 00:00:00

  • Prediction of synergistic transcription factors by function conservation.

    abstract:BACKGROUND:Previous methods employed for the identification of synergistic transcription factors (TFs) are based on either TF enrichment from co-regulated genes or phylogenetic footprinting. Despite the success of these methods, both have limitations. RESULTS:We propose a new strategy to identify synergistic TFs by fu...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2007-8-12-r257

    authors: Hu Z,Hu B,Collins JF

    更新日期:2007-01-01 00:00:00

  • The Adult Mouse Anatomical Dictionary: a tool for annotating and integrating data.

    abstract::We have developed an ontology to provide standardized nomenclature for anatomical terms in the postnatal mouse. The Adult Mouse Anatomical Dictionary is structured as a directed acyclic graph, and is organized hierarchically both spatially and functionally. The ontology will be used to annotate and integrate different...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2005-6-3-r29

    authors: Hayamizu TF,Mangan M,Corradi JP,Kadin JA,Ringwald M

    更新日期:2005-01-01 00:00:00

  • iRegNet3D: three-dimensional integrated regulatory network for the genomic analysis of coding and non-coding disease mutations.

    abstract::The mechanistic details of most disease-causing mutations remain poorly explored within the context of regulatory networks. We present a high-resolution three-dimensional integrated regulatory network (iRegNet3D) in the form of a web tool, where we resolve the interfaces of all known transcription factor (TF)-TF, TF-D...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-016-1138-2

    authors: Liang S,Tippens ND,Zhou Y,Mort M,Stenson PD,Cooper DN,Yu H

    更新日期:2017-01-18 00:00:00

  • THoR: a tool for domain discovery and curation of multiple alignments.

    abstract::We describe a tool, THoR, that automatically creates and curates multiple sequence alignments representing protein domains. This exploits both PSI-BLAST and HMMER algorithms and provides an accurate and comprehensive alignment for any domain family. The entire process is designed for use via a web-browser, with simple...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2003-4-8-r52

    authors: Dickens NJ,Ponting CP

    更新日期:2003-01-01 00:00:00

  • Assessing taxonomic metagenome profilers with OPAL.

    abstract::The explosive growth in taxonomic metagenome profiling methods over the past years has created a need for systematic comparisons using relevant performance criteria. The Open-community Profiling Assessment tooL (OPAL) implements commonly used performance metrics, including those of the first challenge of the initiativ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-019-1646-y

    authors: Meyer F,Bremges A,Belmann P,Janssen S,McHardy AC,Koslicki D

    更新日期:2019-03-04 00:00:00

  • Genomic analysis of the domestication and post-Spanish conquest evolution of the llama and alpaca.

    abstract:BACKGROUND:Despite their regional economic importance and being increasingly reared globally, the origins and evolution of the llama and alpaca remain poorly understood. Here we report reference genomes for the llama, and for the guanaco and vicuña (their putative wild progenitors), compare these with the published alp...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-020-02080-6

    authors: Fan R,Gu Z,Guang X,Marín JC,Varas V,González BA,Wheeler JC,Hu Y,Li E,Sun X,Yang X,Zhang C,Gao W,He J,Munch K,Corbett-Detig R,Barbato M,Pan S,Zhan X,Bruford MW,Dong C

    更新日期:2020-07-02 00:00:00

  • Minimal genome-wide human CRISPR-Cas9 library.

    abstract::CRISPR guide RNA libraries have been iteratively improved to provide increasingly efficient reagents, although their large size is a barrier for many applications. We design an optimised minimal genome-wide human CRISPR-Cas9 library (MinLibCas9) by mining existing large-scale gene loss-of-function datasets, resulting ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-021-02268-4

    authors: Gonçalves E,Thomas M,Behan FM,Picco G,Pacini C,Allen F,Vinceti A,Sharma M,Jackson DA,Price S,Beaver CM,Dovey O,Parry-Smith D,Iorio F,Parts L,Yusa K,Garnett MJ

    更新日期:2021-01-21 00:00:00

  • quantro: a data-driven approach to guide the choice of an appropriate normalization method.

    abstract::Normalization is an essential step in the analysis of high-throughput data. Multi-sample global normalization methods, such as quantile normalization, have been successfully used to remove technical variation. However, these methods rely on the assumption that observed global changes across samples are due to unwanted...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-015-0679-0

    authors: Hicks SC,Irizarry RA

    更新日期:2015-06-04 00:00:00

  • SEPATH: benchmarking the search for pathogens in human tissue whole genome sequence data leads to template pipelines.

    abstract:BACKGROUND:Human tissue is increasingly being whole genome sequenced as we transition into an era of genomic medicine. With this arises the potential to detect sequences originating from microorganisms, including pathogens amid the plethora of human sequencing reads. In cancer research, the tumorigenic ability of patho...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-019-1819-8

    authors: Gihawi A,Rallapalli G,Hurst R,Cooper CS,Leggett RM,Brewer DS

    更新日期:2019-10-22 00:00:00

  • The draft genome of the C3 panicoid grass species Dichanthelium oligosanthes.

    abstract:BACKGROUND:Comparisons between C3 and C4 grasses often utilize C3 species from the subfamilies Ehrhartoideae or Pooideae and C4 species from the subfamily Panicoideae, two clades that diverged over 50 million years ago. The divergence of the C3 panicoid grass Dichanthelium oligosanthes from the independent C4 lineages ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-016-1080-3

    authors: Studer AJ,Schnable JC,Weissmann S,Kolbe AR,McKain MR,Shao Y,Cousins AB,Kellogg EA,Brutnell TP

    更新日期:2016-10-28 00:00:00

  • Nucleosome deposition and DNA methylation at coding region boundaries.

    abstract:BACKGROUND:Nucleosome deposition downstream of transcription initiation and DNA methylation in the gene body suggest that control of transcription elongation is a key aspect of epigenetic regulation. RESULTS:Here we report a genome-wide observation of distinct peaks of nucleosomes and methylation at both ends of a pro...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2009-10-9-r89

    authors: Choi JK,Bae JB,Lyu J,Kim TY,Kim YJ

    更新日期:2009-01-01 00:00:00

  • The promise and limitations of population exomics for human evolution studies.

    abstract::Exome sequencing is poised to yield substantial insights into human genetic variation and evolutionary history, but there are significant challenges to overcome before this becomes a reality. ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2011-12-9-127

    authors: Tennessen JA,O'Connor TD,Bamshad MJ,Akey JM

    更新日期:2011-09-14 00:00:00

  • Where is genomics going next?

    abstract::We polled the Editorial Board of Genome Biology to ask where they see genomics going in the next few years. Here are some of their responses. ...

    journal_title:Genome biology

    pub_type: 社论,面试

    doi:10.1186/s13059-019-1626-2

    authors: Cheifet B

    更新日期:2019-01-22 00:00:00

  • Cytoscape Automation: empowering workflow-based network analysis.

    abstract::Cytoscape is one of the most successful network biology analysis and visualization tools, but because of its interactive nature, its role in creating reproducible, scalable, and novel workflows has been limited. We describe Cytoscape Automation (CA), which marries Cytoscape to highly productive workflow systems, for e...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-019-1758-4

    authors: Otasek D,Morris JH,Bouças J,Pico AR,Demchak B

    更新日期:2019-09-02 00:00:00

  • Methylome evolution in plants.

    abstract::Despite major progress in dissecting the molecular pathways that control DNA methylation patterns in plants, little is known about the mechanisms that shape plant methylomes over evolutionary time. Drawing on recent intra- and interspecific epigenomic studies, we show that methylome evolution over long timescales is l...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/s13059-016-1127-5

    authors: Vidalis A,Živković D,Wardenaar R,Roquis D,Tellier A,Johannes F

    更新日期:2016-12-20 00:00:00

  • Co-binding by YY1 identifies the transcriptionally active, highly conserved set of CTCF-bound regions in primate genomes.

    abstract:BACKGROUND:The genomic binding of CTCF is highly conserved across mammals, but the mechanisms that underlie its stability are poorly understood. One transcription factor known to functionally interact with CTCF in the context of X-chromosome inactivation is the ubiquitously expressed YY1. Because combinatorial transcri...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2013-14-12-r148

    authors: Schwalie PC,Ward MC,Cain CE,Faure AJ,Gilad Y,Odom DT,Flicek P

    更新日期:2013-12-31 00:00:00