Judging the quality of gene expression-based clustering methods using gene annotation.

Abstract:

:We compare several commonly used expression-based gene clustering algorithms using a figure of merit based on the mutual information between cluster membership and known gene attributes. By studying various publicly available expression data sets we conclude that enrichment of clusters for biological function is, in general, highest at rather low cluster numbers. As a measure of dissimilarity between the expression patterns of two genes, no method outperforms Euclidean distance for ratio-based measurements, or Pearson distance for non-ratio-based measurements at the optimal choice of cluster number. We show the self-organized-map approach to be best for both measurement types at higher numbers of clusters. Clusters of genes derived from single- and average-linkage hierarchical clustering tend to produce worse-than-random results.

journal_name

Genome Res

journal_title

Genome research

authors

Gibbons FD,Roth FP

doi

10.1101/gr.397002

subject

Has Abstract

pub_date

2002-10-01 00:00:00

pages

1574-81

issue

10

eissn

1088-9051

issn

1549-5469

journal_volume

12

pub_type

杂志文章
  • Arboretum: reconstruction and analysis of the evolutionary history of condition-specific transcriptional modules.

    abstract::Comparative functional genomics studies the evolution of biological processes by analyzing functional data, such as gene expression profiles, across species. A major challenge is to compare profiles collected in a complex phylogeny. Here, we present Arboretum, a novel scalable computational algorithm that integrates e...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.146233.112

    authors: Roy S,Wapinski I,Pfiffner J,French C,Socha A,Konieczka J,Habib N,Kellis M,Thompson D,Regev A

    更新日期:2013-06-01 00:00:00

  • Spatial enhancer clustering and regulation of enhancer-proximal genes by cohesin.

    abstract::In addition to mediating sister chromatid cohesion during the cell cycle, the cohesin complex associates with CTCF and with active gene regulatory elements to form long-range interactions between its binding sites. Genome-wide chromosome conformation capture had shown that cohesin's main role in interphase genome orga...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.184986.114

    authors: Ing-Simmons E,Seitan VC,Faure AJ,Flicek P,Carroll T,Dekker J,Fisher AG,Lenhard B,Merkenschlager M

    更新日期:2015-04-01 00:00:00

  • Integrative functional genomics identifies an enhancer looping to the SOX9 gene disrupted by the 17q24.3 prostate cancer risk locus.

    abstract::Genome-wide association studies (GWAS) are identifying genetic predisposition to various diseases. The 17q24.3 locus harbors the single nucleotide polymorphism (SNP) rs1859962 that is statistically associated with prostate cancer (PCa). It defines a 130-kb linkage disequilibrium (LD) block that lies in an ∼2-Mb gene d...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.135665.111

    authors: Zhang X,Cowper-Sal lari R,Bailey SD,Moore JH,Lupien M

    更新日期:2012-08-01 00:00:00

  • A unified model for yeast transcript definition.

    abstract::Identifying genes in the genomic context is central to a cell's ability to interpret the genome. Yet, in general, the signals used to define eukaryotic genes are poorly described. Here, we derived simple classifiers that identify where transcription will initiate and terminate using nucleic acid sequence features dete...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.164327.113

    authors: de Boer CG,van Bakel H,Tsui K,Li J,Morris QD,Nislow C,Greenblatt JF,Hughes TR

    更新日期:2014-01-01 00:00:00

  • Caenorhabditis elegans has scores of hedgehog-related genes: sequence and expression analysis.

    abstract::Previously, we have described novel families of genes, warthog (wrt) and groundhog (grd), in Caenorhabditis elegans. They are related to Hedgehog (Hh) through the carboxy-terminal autoprocessing domain (called Hog or Hint). A comprehensive survey revealed 10 genes with Hog/Hint modules in C. elegans. Five of these are...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.9.10.909

    authors: Aspöck G,Kagoshima H,Niklaus G,Bürglin TR

    更新日期:1999-10-01 00:00:00

  • Telomeric organization of a variable and inducible toxin gene family in the ancient eukaryote Giardia duodenalis.

    abstract::Giardia duodenalis is the best-characterized example of the most ancient eukaryotes, which are primitively amitochondrial and anaerobic. The surface of Giardia is coated with cysteine-rich proteins. One family of these proteins, CRP136, varies among isolates and upon environmental stress. A repeat region within the CR...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.7.1.37

    authors: Upcroft P,Chen N,Upcroft JA

    更新日期:1997-01-01 00:00:00

  • RNA expression profiling at the single molecule level.

    abstract::We developed a microarray platform for PCR amplification-independent expression profiling of minute samples. A novel scanning system combined with specialized biochips enables detection down to individual fluorescent oligonucleotide molecules specifically hybridized to their complementary sequence over the entire bioc...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4999906

    authors: Hesse J,Jacak J,Kasper M,Regl G,Eichberger T,Winklmayr M,Aberger F,Sonnleitner M,Schlapak R,Howorka S,Muresan L,Frischauf AM,Schütz GJ

    更新日期:2006-08-01 00:00:00

  • Copy number variation at the breakpoint region of isochromosome 17q.

    abstract::Isochromosome 17q, or i(17q), is one of the most frequent nonrandom changes occurring in human neoplasia. Most of the i(17q) breakpoints cluster within a approximately 240-kb interval located in the Smith-Magenis syndrome common deletion region in 17p11.2. The breakpoint cluster region is characterized by a complex ar...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.080697.108

    authors: Carvalho CM,Lupski JR

    更新日期:2008-11-01 00:00:00

  • The mouse Aire gene: comparative genomic sequencing, gene organization, and expression.

    abstract::Mutations in the human AIRE gene (hAIRE) result in the development of an autoimmune disease named APECED (autoimmune polyendocrinopathy candidiasis ectodermal dystrophy; OMIM 240300). Previously, we have cloned hAIRE and shown that it codes for a putative transcription-associated factor. Here we report the cloning and...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:

    authors: Blechschmidt K,Schweiger M,Wertz K,Poulson R,Christensen HM,Rosenthal A,Lehrach H,Yaspo ML

    更新日期:1999-02-01 00:00:00

  • A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation.

    abstract::Recent advances in genome research have accelerated the process of locating candidate genes and the variable sites within them and have simplified the task of genotype measurement. The development of statistical and computational strategies to utilize information on hundreds -- soon thousands -- of variable loci to in...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.172901

    authors: Nelson MR,Kardia SL,Ferrell RE,Sing CF

    更新日期:2001-03-01 00:00:00

  • Comparative analysis of gene-expression patterns in human and African great ape cultured fibroblasts.

    abstract::Although much is known about genetic variation in human and African great ape (chimpanzee, bonobo, and gorilla) genomes, substantially less is known about variation in gene-expression profiles within and among these species. This information is necessary for defining transcriptional regulatory networks that contribute...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1289803

    authors: Karaman MW,Houck ML,Chemnick LG,Nagpal S,Chawannakul D,Sudano D,Pike BL,Ho VV,Ryder OA,Hacia JG

    更新日期:2003-07-01 00:00:00

  • A nuclear matrix attachment site in the 4q35 locus has an enhancer-blocking activity in vivo: implications for the facio-scapulo-humeral dystrophy.

    abstract::Facio-scapulo-humeral dystrophy (FSHD), a muscular hereditary disease with a prevalence of 1 in 20,000, is caused by a partial deletion of a subtelomeric repeat array on chromosome 4q. Earlier, we demonstrated the existence in the vicinity of the D4Z4 repeat of a nuclear matrix attachment site, FR-MAR, efficient in no...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6620908

    authors: Petrov A,Allinne J,Pirozhkova I,Laoudj D,Lipinski M,Vassetzky YS

    更新日期:2008-01-01 00:00:00

  • Gene loss and movement in the maize genome.

    abstract::Maize (Zea mays L. ssp. mays), one of the most important agricultural crops in the world, originated by hybridization of two closely related progenitors. To investigate the fate of its genes after tetraploidization, we analyzed the sequence of five duplicated regions from different chromosomal locations. We also compa...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2701104

    authors: Lai J,Ma J,Swigonová Z,Ramakrishna W,Linton E,Llaca V,Tanyolac B,Park YJ,Jeong OY,Bennetzen JL,Messing J

    更新日期:2004-10-01 00:00:00

  • Pervasive, genome-wide positive selection leading to functional divergence in the bacterial genus Campylobacter.

    abstract::An open question in bacterial genomics is the role that adaptive evolution of the core genome plays in diversification and adaptation of bacterial species, and how this might differ between groups of bacteria occupying different environmental circumstances. The genus Campylobacter encompasses several important human a...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.089250.108

    authors: Lefébure T,Stanhope MJ

    更新日期:2009-07-01 00:00:00

  • HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient.

    abstract::Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance dependence. We present HiCRep, a framework for assessin...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.220640.117

    authors: Yang T,Zhang F,Yardımcı GG,Song F,Hardison RC,Noble WS,Yue F,Li Q

    更新日期:2017-11-01 00:00:00

  • Reprogramming of the human intestinal epigenome by surgical tissue transposition.

    abstract::Extracellular cues play critical roles in the establishment of the epigenome during development and may also contribute to epigenetic perturbations found in disease states. The direct role of the local tissue environment on the post-development human epigenome, however, remains unclear due to limitations in studies of...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.166439.113

    authors: Lay FD,Triche TJ Jr,Tsai YC,Su SF,Martin SE,Daneshmand S,Skinner EC,Liang G,Chihara Y,Jones PA

    更新日期:2014-04-01 00:00:00

  • CBX3 regulates efficient RNA processing genome-wide.

    abstract::CBX5, CBX1, and CBX3 (HP1α, β, and γ, respectively) play an evolutionarily conserved role in the formation and maintenance of heterochromatin. In addition, CBX5, CBX1, and CBX3 may also participate in transcriptional regulation of genes. Recently, CBX3 binding to the bodies of a subset of genes has been observed in hu...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.124818.111

    authors: Smallwood A,Hon GC,Jin F,Henry RE,Espinosa JM,Ren B

    更新日期:2012-08-01 00:00:00

  • Murine single-cell RNA-seq reveals cell-identity- and tissue-specific trajectories of aging.

    abstract::Aging is a pleiotropic process affecting many aspects of mammalian physiology. Mammals are composed of distinct cell type identities and tissue environments, but the influence of these cell identities and environments on the trajectory of aging in individual cells remains unclear. Here, we performed single-cell RNA-se...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.253880.119

    authors: Kimmel JC,Penland L,Rubinstein ND,Hendrickson DG,Kelley DR,Rosenthal AZ

    更新日期:2019-12-01 00:00:00

  • Antisense transcripts with FANTOM2 clone set and their implications for gene regulation.

    abstract::We have used the FANTOM2 mouse cDNA set (60,770 clones), public mRNA data, and mouse genome sequence data to identify 2481 pairs of sense-antisense transcripts and 899 further pairs of nonantisense bidirectional transcription based upon genomic mapping. The analysis greatly expands the number of known examples of sens...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.982903

    authors: Kiyosawa H,Yamanaka I,Osato N,Kondo S,Hayashizaki Y,RIKEN GER Group.,GSL Members.

    更新日期:2003-06-01 00:00:00

  • The mRNA-bound proteome of the early fly embryo.

    abstract::Early embryogenesis is characterized by the maternal to zygotic transition (MZT), in which maternally deposited messenger RNAs are degraded while zygotic transcription begins. Before the MZT, post-transcriptional gene regulation by RNA-binding proteins (RBPs) is the dominant force in embryo patterning. We used two mRN...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.200386.115

    authors: Wessels HH,Imami K,Baltz AG,Kolinski M,Beldovskaya A,Selbach M,Small S,Ohler U,Landthaler M

    更新日期:2016-07-01 00:00:00

  • Relationship between histone modifications and transcription factor binding is protein family specific.

    abstract::The very small fraction of putative binding sites (BSs) that are occupied by transcription factors (TFs) in vivo can be highly variable across different cell types. This observation has been partly attributed to changes in chromatin accessibility and histone modification (HM) patterns surrounding BSs. Previous studies...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.220079.116

    authors: Xin B,Rohs R

    更新日期:2018-01-11 00:00:00

  • Identification and analysis of internal promoters in Caenorhabditis elegans operons.

    abstract::The current Caenorhabditis elegans genomic annotation has many genes organized in operons. Using directionally stitched promoterGFP methodology, we have conducted the largest survey to date on the regulatory regions of annotated C. elegans operons and identified 65, over 25% of those studied, with internal promoters. ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6824707

    authors: Huang P,Pleasance ED,Maydan JS,Hunt-Newbury R,O'Neil NJ,Mah A,Baillie DL,Marra MA,Moerman DG,Jones SJ

    更新日期:2007-10-01 00:00:00

  • The landscape of histone modifications across 1% of the human genome in five human cell lines.

    abstract::We generated high-resolution maps of histone H3 lysine 9/14 acetylation (H3ac), histone H4 lysine 5/8/12/16 acetylation (H4ac), and histone H3 at lysine 4 mono-, di-, and trimethylation (H3K4me1, H3K4me2, H3K4me3, respectively) across the ENCODE regions. Studying each modification in five human cell lines including th...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5704207

    authors: Koch CM,Andrews RM,Flicek P,Dillon SC,Karaöz U,Clelland GK,Wilcox S,Beare DM,Fowler JC,Couttet P,James KD,Lefebvre GC,Bruce AW,Dovey OM,Ellis PD,Dhami P,Langford CF,Weng Z,Birney E,Carter NP,Vetrie D,Dunham I

    更新日期:2007-06-01 00:00:00

  • BLAT--the BLAST-like alignment tool.

    abstract::Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments. A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. B...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.229202

    authors: Kent WJ

    更新日期:2002-04-01 00:00:00

  • Uncovering cis-regulatory sequence requirements for context-specific transcription factor binding.

    abstract::The regulation of gene expression is mediated at the transcriptional level by enhancer regions that are bound by sequence-specific transcription factors (TFs). Recent studies have shown that the in vivo binding sites of single TFs differ between developmental or cellular contexts. How this context-specific binding is ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.132811.111

    authors: Yáñez-Cuna JO,Dinh HQ,Kvon EZ,Shlyueva D,Stark A

    更新日期:2012-10-01 00:00:00

  • Adenoviral vectors expressing siRNAs for discovery and validation of gene function.

    abstract::RNA interference is a powerful tool for studying gene function and for drug target discovery in diverse organisms and cell types. In mammalian systems, small interfering RNAs (siRNAs), or DNA plasmids expressing these siRNAs, have been used to down-modulate gene expression. However, inefficient transfection protocols,...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1332603

    authors: Arts GJ,Langemeijer E,Tissingh R,Ma L,Pavliska H,Dokic K,Dooijes R,Mesić E,Clasen R,Michiels F,van der Schueren J,Lambrecht M,Herman S,Brys R,Thys K,Hoffmann M,Tomme P,van Es H

    更新日期:2003-10-01 00:00:00

  • Spidey: a tool for mRNA-to-genomic alignments.

    abstract::We have developed a computer program that aligns spliced sequences to genomic sequences, using local alignment algorithms and heuristics to put together a global spliced alignment. Spidey can produce reliable alignments quickly, even when confronted with noise from alternative splicing, polymorphisms, sequencing error...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.195301

    authors: Wheelan SJ,Church DM,Ostell JM

    更新日期:2001-11-01 00:00:00

  • Functional genomic analysis of chromosomal aberrations in a compendium of 8000 cancer genomes.

    abstract::A large database of copy number profiles from cancer genomes can facilitate the identification of recurrent chromosomal alterations that often contain key cancer-related genes. It can also be used to explore low-prevalence genomic events such as chromothripsis. In this study, we report an analysis of 8227 human cancer...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.140301.112

    authors: Kim TM,Xi R,Luquette LJ,Park RW,Johnson MD,Park PJ

    更新日期:2013-02-01 00:00:00

  • Dynamic effects of interacting genes underlying rice flowering-time phenotypic plasticity and global adaptation.

    abstract::The phenotypic variation of living organisms is shaped by genetics, environment, and their interaction. Understanding phenotypic plasticity under natural conditions is hindered by the apparently complex environment and the interacting genes and pathways. Herein, we report findings from the dissection of rice flowering...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.255703.119

    authors: Guo T,Mu Q,Wang J,Vanous AE,Onogi A,Iwata H,Li X,Yu J

    更新日期:2020-05-01 00:00:00

  • Capture of a functionally active methyl-CpG binding domain by an arthropod retrotransposon family.

    abstract::The repressive capacity of cytosine DNA methylation is mediated by recruitment of silencing complexes by methyl-CpG binding domain (MBD) proteins. Despite MBD proteins being associated with silencing, we discovered that a family of arthropod Copia retrotransposons have incorporated a host-derived MBD. We functionally ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.243774.118

    authors: de Mendoza A,Pflueger J,Lister R

    更新日期:2019-08-01 00:00:00