A mathematical and computational framework for quantitative comparison and integration of large-scale gene expression data.

Abstract:

:Analysis of large-scale gene expression studies usually begins with gene clustering. A ubiquitous problem is that different algorithms applied to the same data inevitably give different results, and the differences are often substantial, involving a quarter or more of the genes analyzed. This raises a series of important but nettlesome questions: How are different clustering results related to each other and to the underlying data structure? Is one clustering objectively superior to another? Which differences, if any, are likely candidates to be biologically important? A systematic and quantitative way to address these questions is needed, together with an effective way to integrate and leverage expression results with other kinds of large-scale data and annotations. We developed a mathematical and computational framework to help quantify, compare, visualize and interactively mine clusterings. We show that by coupling confusion matrices with appropriate metrics (linear assignment and normalized mutual information scores), one can quantify and map differences between clusterings. A version of receiver operator characteristic analysis proved effective for quantifying and visualizing cluster quality and overlap. These methods, plus a flexible library of clustering algorithms, can be called from a new expandable set of software tools called CompClust 1.0 (http://woldlab.caltech.edu/compClust/). CompClust also makes it possible to relate expression clustering patterns to DNA sequence motif occurrences, protein-DNA interaction measurements and various kinds of functional annotations. Test analyses used yeast cell cycle data and revealed data structure not obvious under all algorithms. These results were then integrated with transcription motif and global protein-DNA interaction data to identify G1 regulatory modules.

journal_name

Nucleic Acids Res

journal_title

Nucleic acids research

authors

Hart CE,Sharenbroich L,Bornstein BJ,Trout D,King B,Mjolsness E,Wold BJ

doi

10.1093/nar/gki536

keywords:

subject

Has Abstract

pub_date

2005-05-10 00:00:00

pages

2580-94

issue

8

eissn

0305-1048

issn

1362-4962

pii

33/8/2580

journal_volume

33

pub_type

杂志文章
  • Human repair gene restores normal pattern of preferential DNA repair in repair defective CHO cells.

    abstract::The pattern of preferential DNA repair of UV-induced pyrimidine dimers was studied in repair-deficient Chinese hamster ovary (CHO) cells transfected with the human excision repair gene, ERCC-1. Repair efficiency was measured in the active dihydrofolate reductase (DHFR) gene and in its flanking, non-transcribed sequenc...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/16.15.7397

    authors: Bohr VA,Chu EH,van Duin M,Hanawalt PC,Okumoto DS

    更新日期:1988-08-11 00:00:00

  • Mitochondrial transcription termination factor 1 directs polar replication fork pausing.

    abstract::During replication of nuclear ribosomal DNA (rDNA), clashes with the transcription apparatus can cause replication fork collapse and genomic instability. To avoid this problem, a replication fork barrier protein is situated downstream of rDNA, there preventing replication in the direction opposite rDNA transcription. ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw302

    authors: Shi Y,Posse V,Zhu X,Hyvärinen AK,Jacobs HT,Falkenberg M,Gustafsson CM

    更新日期:2016-07-08 00:00:00

  • Molecular basis of artifacts in the detection of telomerase activity and a modified primer for a more robust 'TRAP' assay.

    abstract::Human somatic cells have essentially no telomerase activity. Telomerase is linked to tumor genesis and is a valuable marker for malignant growth. Extreme paucity of the enzyme neccessitated development of a PCR-based assay, 'telomeric repeat amplification protocol' (TRAP). Unfortunately, this method is not without dif...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/25.4.919

    authors: Krupp G,Kühne K,Tamm S,Klapper W,Heidorn K,Rott A,Parwaresch R

    更新日期:1997-02-15 00:00:00

  • MetaQC: objective quality control and inclusion/exclusion criteria for genomic meta-analysis.

    abstract::Genomic meta-analysis to combine relevant and homogeneous studies has been widely applied, but the quality control (QC) and objective inclusion/exclusion criteria have been largely overlooked. Currently, the inclusion/exclusion criteria mostly depend on ad-hoc expert opinion or naïve threshold by sample size or platfo...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr1071

    authors: Kang DD,Sibille E,Kaminski N,Tseng GC

    更新日期:2012-01-01 00:00:00

  • Complete disproportionation of duplex poly(dT)*poly(dA) into triplex poly(dT)*poly(dA)*poly(dT) and poly(dA) by coralyne.

    abstract::Coralyne is a small crescent-shaped molecule known to intercalate duplex and triplex DNA. We report that coralyne can cause the complete and irreversible disproportionation of duplex poly(dT)*poly(dA). That is, coralyne causes the strands of duplex poly(dT)*poly(dA) to repartition into equal molar equivalents of tripl...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/30.4.983

    authors: Polak M,Hud NV

    更新日期:2002-02-15 00:00:00

  • DAP-like kinase interacts with the rat homolog of Schizosaccharomyces pombe CDC5 protein, a factor involved in pre-mRNA splicing and required for G2/M phase transition.

    abstract::DAP-like kinase (Dlk, also termed ZIP kinase) is a leucine zipper-containing serine/threonine-specific protein kinase with as yet unknown biological function(s). Interaction partners so far identified are either transcription factors or proteins that can support or counteract apoptosis. Thus, Dlk might be involved in ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/30.6.1408

    authors: Engemann H,Heinzel V,Page G,Preuss U,Scheidtmann KH

    更新日期:2002-03-15 00:00:00

  • Partial inhibition of histone deacetylase in active chromatin by HMG 14 and HMG 17.

    abstract::Digestion of isolated Friend erythroleukemic cell nuclei with DNase I under conditions which selectively destroy the DNA of transcriptionally "active" genes releases into the supernatant fraction proteins of the non-histone "High Mobility Group" (HMGs). Two of these, HMG-14 and HMG-17(identified by solubility in trich...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/8.9.1947

    authors: Reeves R,Candido EP

    更新日期:1980-05-10 00:00:00

  • Changes in transfer ribonucleic acids of Bacillus subtilis during different growth phases.

    abstract::The transfer ribonucleic acids (tRNAs) of B. subtilis at different growth phases are examined for changes in the composition and the methylation of minor constituents. The composition of the tRNAs indicates about equal amounts of adenosine and uridine, and of guanosine and cytidine. About 3-4 residues are present as m...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/3.5.1249

    authors: Singhal RP,Vold B

    更新日期:1976-05-01 00:00:00

  • The high diversity of snoRNAs in plants: identification and comparative study of 120 snoRNA genes from Oryza sativa.

    abstract::Using a powerful computer-assisted analysis strategy, a large-scale search of small nucleolar RNA (snoRNA) genes in the recently released draft sequence of the rice genome was carried out. This analysis identified 120 different box C/D snoRNA genes with a total of 346 gene variants, which were predicted to guide 135 2...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkg373

    authors: Chen CL,Liang D,Zhou H,Zhuo M,Chen YQ,Qu LH

    更新日期:2003-05-15 00:00:00

  • Modulation of glutathione peroxidase expression by selenium: effect on human MCF-7 breast cancer cell transfectants expressing a cellular glutathione peroxidase cDNA and doxorubicin-resistant MCF-7 cells.

    abstract::We have studied the effect of selenium on the expression of a cellular glutathione peroxidase, GSHPx-1, in transfected MCF-7 cells and in doxorubicin-resistant (Adrr) MCF-7 cells. A GSHPx-1 cDNA with a Rous Sarcoma virus promoter was transfected into a human mammary carcinoma cell line, MCF-7, which has very low endog...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/18.6.1531

    authors: Chu FF,Esworthy RS,Akman S,Doroshow JH

    更新日期:1990-03-25 00:00:00

  • The p16INK4a tumor suppressor controls p21WAF1 induction in response to ultraviolet light.

    abstract::p16INK4a and p21WAF1, two major cyclin-dependent kinase inhibitors, are the products of two tumor suppressor genes that play important roles in various cellular metabolic pathways. p21WAF1 is up-regulated in response to different DNA damaging agents. While the activation of p21WAF1 is p53-dependent following -rays, th...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkl1075

    authors: Al-Mohanna MA,Al-Khalaf HH,Al-Yousef N,Aboussekhra A

    更新日期:2007-01-01 00:00:00

  • The putative promoter of a Xenopus laevis ribosomal gene is reduplicated.

    abstract::With the aid of a novel poly-dA tailing-partial restriction technique and S1-protection mapping, the 5' terminal coding sequence for the 40S precursor ribosomal RNA of Xenopus laevis has been exactly identified. Since the promoter sequence for the 40S RNA should lie close to its 5' terminal coding sequence, we are abl...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/6.12.3733

    authors: Moss T,Birnstiel ML

    更新日期:1979-08-24 00:00:00

  • A scale-space method for detecting recurrent DNA copy number changes with analytical false discovery rate control.

    abstract::Tumor formation is partially driven by DNA copy number changes, which are typically measured using array comparative genomic hybridization, SNP arrays and DNA sequencing platforms. Many techniques are available for detecting recurring aberrations across multiple tumor samples, including CMAR, STAC, GISTIC and KC-SMART...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkt155

    authors: van Dyk E,Reinders MJ,Wessels LF

    更新日期:2013-05-01 00:00:00

  • Triple helix formation at distant sites: hybrid oligonucleotides containing a polymeric linker.

    abstract::An oligonucleotide hybrid is described which possesses two triple helix forming oligonucleotides which have been connected by a flexible polymeric linker chain. As a prototype, binding of this class of oligonucleotide to duplex DNA has been studied using a segment of the HSV-1 D-glycoprotein promoter, which possesses ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/21.20.4810

    authors: Kessler DJ,Pettitt BM,Cheng YK,Smith SR,Jayaraman K,Vu HM,Hogan ME

    更新日期:1993-10-11 00:00:00

  • Chicken liver TGGCA protein purified by preparative mobility shift electrophoresis (PMSE) shows a 36.8 to 29.8 kd microheterogeneity.

    abstract::The TGGCA protein, the chicken homologue of HeLa cell NF-I, was purified to homogeneity from liver tissue by a procedure which includes preparative mobility shift electrophoresis (PMSE) as the final step. PMSE was here adjusted for the isolation of the TGGCA protein, but can be used as a general method to characterize...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/15.23.9707

    authors: Rupp RA,Sippel AE

    更新日期:1987-12-10 00:00:00

  • SUMO conjugation to spliceosomal proteins is required for efficient pre-mRNA splicing.

    abstract::Pre-mRNA splicing is catalyzed by the spliceosome, a multi-megadalton ribonucleoprotein machine. Previous work from our laboratory revealed the splicing factor SRSF1 as a regulator of the SUMO pathway, leading us to explore a connection between this pathway and the splicing machinery. We show here that addition of a r...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx213

    authors: Pozzi B,Bragado L,Will CL,Mammi P,Risso G,Urlaub H,Lührmann R,Srebrow A

    更新日期:2017-06-20 00:00:00

  • The effect of hybridization-induced secondary structure alterations on RNA detection using backscattering interferometry.

    abstract::Backscattering interferometry (BSI) has been used to successfully monitor molecular interactions without labeling and with high sensitivity. These properties suggest that this approach might be useful for detecting biomarkers of infection. In this report, we identify interactions and characteristics of nucleic acid pr...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkt165

    authors: Adams NM,Olmsted IR,Haselton FR,Bornhop DJ,Wright DW

    更新日期:2013-05-01 00:00:00

  • Molecular determinants of the hpa regulatory system of Escherichia coli: the HpaR repressor.

    abstract::The HpaR-mediated regulation of the hpa-meta operon (Pg promoter) of the 4-hydroxyphenylacetic acid catabolic pathway of Escherichia coli has been studied. The HpaR regulator was purified to homogeneity showing that it is able to bind selectively to 4-hydroxyphenylacetic, 3-hydroxyphenylacetic and 3,4-dihydroxyphenyla...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkg851

    authors: Galán B,Kolb A,Sanz JM,García JL,Prieto MA

    更新日期:2003-11-15 00:00:00

  • A trans-splicing group I intron and tRNA-hyperediting in the mitochondrial genome of the lycophyte Isoetes engelmannii.

    abstract::Plant mitochondrial genomes show much more evolutionary plasticity than those of animals. We analysed the first mitochondrial DNA (mtDNA) of a lycophyte, the quillwort Isoetes engelmannii, which is separated from seed plants by more than 350 million years of evolution. The Isoetes mtDNA is particularly rich in recombi...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkp532

    authors: Grewe F,Viehoever P,Weisshaar B,Knoop V

    更新日期:2009-08-01 00:00:00

  • POGO-DB--a database of pairwise-comparisons of genomes and conserved orthologous genes.

    abstract::POGO-DB (http://pogo.ece.drexel.edu/) provides an easy platform for comparative microbial genomics. POGO-DB allows users to compare genomes using pre-computed metrics that were derived from extensive computationally intensive BLAST comparisons of >2000 microbes. These metrics include (i) average protein sequence ident...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkt1094

    authors: Lan Y,Morrison JC,Hershberg R,Rosen GL

    更新日期:2014-01-01 00:00:00

  • Sinefungin resistance of Saccharomyces cerevisiae arising from Sam3 mutations that inactivate the AdoMet transporter or from increased expression of AdoMet synthase plus mRNA cap guanine-N7 methyltransferase.

    abstract::The S-adenosylmethionine (AdoMet) analog sinefungin is a natural product antibiotic that inhibits nucleic acid methyltransferases and arrests the growth of unicellular eukarya and eukaryal viruses. The basis for the particular sensitivity of fungi and protozoa to sinefungin is not known. Here we report the isolation a...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkm817

    authors: Zheng S,Shuman S,Schwer B

    更新日期:2007-01-01 00:00:00

  • The siRNA suppressor RTL1 is redox-regulated through glutathionylation of a conserved cysteine in the double-stranded-RNA-binding domain.

    abstract::RNase III enzymes cleave double stranded (ds)RNA. This is an essential step for regulating the processing of mRNA, rRNA, snoRNA and other small RNAs, including siRNA and miRNA. Arabidopsis thaliana encodes nine RNase III: four DICER-LIKE (DCL) and five RNASE THREE LIKE (RTL). To better understand the molecular functio...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx820

    authors: Charbonnel C,Niazi AK,Elvira-Matelot E,Nowak E,Zytnicki M,de Bures A,Jobet E,Opsomer A,Shamandi N,Nowotny M,Carapito C,Reichheld JP,Vaucheret H,Sáez-Vásquez J

    更新日期:2017-11-16 00:00:00

  • Detection of base analogs incorporated during DNA replication by nanopore sequencing.

    abstract::DNA synthesis is a fundamental requirement for cell proliferation and DNA repair, but no single method can identify the location, direction and speed of replication forks with high resolution. Mammalian cells have the ability to incorporate thymidine analogs along with the natural A, T, G and C bases during DNA synthe...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkaa517

    authors: Georgieva D,Liu Q,Wang K,Egli D

    更新日期:2020-09-04 00:00:00

  • Synthesis, thermal stability and resistance to enzymatic hydrolysis of the oligonucleotides containing 5-(N-aminohexyl)carbamoyl-2'-O-methyluridines.

    abstract::The synthesis of oligonucleotides (ODNs) containing 5-(N-aminohexyl)carbamoyl-2'-O-methyluridine (D) is described, and thermal stability and resistance to enzymatic hydrolysis of the ODNs are compared with ODNs containing 5-(N-aminohexyl)carbamoyl-2'-deoxyuridine (H). The ODNs containing D and the complementary RNA de...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkg374

    authors: Ito T,Ueno Y,Komatsu Y,Matsuda A

    更新日期:2003-05-15 00:00:00

  • Increased G + C content of DNA stabilizes methyl CpG dinucleotides.

    abstract::The vertebrate genome is a mosaic of regions differing dramatically in their G + C content. Those regions with a high G + C content contain the expected number of CpG dinucleotides and we propose that following methylation these have been protected from deamination by the increased stability of the surrounding DNA dup...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/12.14.5869

    authors: Adams RL,Eason R

    更新日期:1984-07-25 00:00:00

  • The fidelity of template-directed oligonucleotide ligation and its relevance to DNA computation.

    abstract::Several different computational problems have been solved using DNA as a medium. However, the DNA computations that have so far been carried out have examined a relatively small number of possible sequence solutions in order to find correct sequence solutions. We have encoded a search algorithm in DNA that required th...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/26.22.5203

    authors: James KD,Boles AR,Henckel D,Ellington AD

    更新日期:1998-11-15 00:00:00

  • Crystal structure of trioxacarcin A covalently bound to DNA.

    abstract::We report a crystal structure that shows an antibiotic that extracts a nucleobase from a DNA molecule 'caught in the act' after forming a covalent bond but before departing with the base. The structure of trioxacarcin A covalently bound to double-stranded d(AACCGGTT) was determined to 1.78 A resolution by MAD phasing ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkn245

    authors: Pfoh R,Laatsch H,Sheldrick GM

    更新日期:2008-06-01 00:00:00

  • CRISPR interference and priming varies with individual spacer sequences.

    abstract::CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR associated) systems allow bacteria to adapt to infection by acquiring 'spacer' sequences from invader DNA into genomic CRISPR loci. Cas proteins use RNAs derived from these loci to target cognate sequences for destruction through CRISPR inter...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv1259

    authors: Xue C,Seetharam AS,Musharova O,Severinov K,Brouns SJ,Severin AJ,Sashital DG

    更新日期:2015-12-15 00:00:00

  • Rapid and efficient construction of markerless deletions in the Escherichia coli genome.

    abstract::We have developed an improved and rapid genomic engineering procedure for the construction of custom-designed microorganisms. This method, which can be performed in 2 days, permits restructuring of the Escherichia coli genome via markerless deletion of selected genomic regions. The deletion process was mediated by a s...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkn359

    authors: Yu BJ,Kang KH,Lee JH,Sung BH,Kim MS,Kim SC

    更新日期:2008-08-01 00:00:00

  • The CUG codon is decoded in vivo as serine and not leucine in Candida albicans.

    abstract::Previous studies have shown that the yeast Candida albicans encodes a unique seryl-tRNA(CAG) that should decode the leucine codon CUG as serine. However, in vitro translation of several different CUG-containing mRNAs in the presence of this unusual seryl-tRNA(CAG) result in an apparent increase in the molecular weight...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/23.9.1481

    authors: Santos MA,Tuite MF

    更新日期:1995-05-11 00:00:00