Spectral clustering of protein sequences.

Abstract:

:An important problem in genomics is automatically clustering homologous proteins when only sequence information is available. Most methods for clustering proteins are local, and are based on simply thresholding a measure related to sequence distance. We first show how locality limits the performance of such methods by analysing the distribution of distances between protein sequences. We then present a global method based on spectral clustering and provide theoretical justification of why it will have a remarkable improvement over local methods. We extensively tested our method and compared its performance with other local methods on several subsets of the SCOP (Structural Classification of Proteins) database, a gold standard for protein structure classification. We consistently observed that, the number of clusters that we obtain for a given set of proteins is close to the number of superfamilies in that set; there are fewer singletons; and the method correctly groups most remote homologs. In our experiments, the quality of the clusters as quantified by a measure that combines sensitivity and specificity was consistently better [on average, improvements were 84% over hierarchical clustering, 34% over Connected Component Analysis (CCA) (similar to GeneRAGE) and 72% over another global method, TribeMCL].

journal_name

Nucleic Acids Res

journal_title

Nucleic acids research

authors

Paccanaro A,Casbon JA,Saqi MA

doi

10.1093/nar/gkj515

keywords:

subject

Has Abstract

pub_date

2006-03-17 00:00:00

pages

1571-80

issue

5

eissn

0305-1048

issn

1362-4962

pii

34/5/1571

journal_volume

34

pub_type

杂志文章
  • A conserved U-rich RNA region implicated in regulation of translation in Plasmodium female gametocytes.

    abstract::Translational repression (TR) plays an important role in post-transcriptional regulation of gene expression and embryonic development in metazoans. TR also regulates the expression of a subset of the cytoplasmic mRNA population during development of fertilized female gametes of the unicellular malaria parasite, Plasmo...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkm1142

    authors: Braks JA,Mair GR,Franke-Fayard B,Janse CJ,Waters AP

    更新日期:2008-03-01 00:00:00

  • Purification by DNA affinity precipitation of the cellular factors HEB1-p67 and HEB1-p94 which bind specifically to the human T-cell leukemia virus type-I 21 bp enhancer.

    abstract::Transcription driven by the proviral promoter of the Human T-cell Leukemia Virus type I (HTLV-I) is tightly regulated by the Tax1 transactivator. This viral protein potently induces the enhancer activity of a 21 bp motif repeated three times in the promoter. We have previously shown that this induction results from th...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/21.17.3935

    authors: Lombard-Platet G,Jalinot P

    更新日期:1993-08-25 00:00:00

  • A new telomerase RNA element that is critical for telomere elongation.

    abstract::The stability of chromosome ends, the telomeres, is dependent on the ribonucleoprotein telomerase. In vitro, telomerase requires at least one RNA molecule and a reverse transcriptase-like protein. However, for telomere homeostasis in vivo, additional proteins are required. Telomerase RNAs of different species vary in ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkt514

    authors: Laterreur N,Eschbach SH,Lafontaine DA,Wellinger RJ

    更新日期:2013-09-01 00:00:00

  • DNA compaction by the bacteriophage protein Cox studied on the single DNA molecule level using nanofluidic channels.

    abstract::The Cox protein from bacteriophage P2 forms oligomeric filaments and it has been proposed that DNA can be wound up around these filaments, similar to how histones condense DNA. We here use fluorescence microscopy to study single DNA-Cox complexes in nanofluidic channels and compare how the Cox homologs from phages P2 ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw352

    authors: Frykholm K,Berntsson RP,Claesson M,de Battice L,Odegrip R,Stenmark P,Westerlund F

    更新日期:2016-09-06 00:00:00

  • Multiplex strand displacement amplification (SDA) and detection of DNA sequences from Mycobacterium tuberculosis and other mycobacteria.

    abstract::Strand Displacement Amplification (SDA) is an isothermal, in vitro method of amplifying a DNA target sequence prior to detection [Walker et al (1992) Nucleic Acids Res., 20, 1691-1693]. Here we describe a multiplex form of SDA that allows two target sequences and an internal amplification control to be co-amplified by...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/22.13.2670

    authors: Walker GT,Nadeau JG,Spears PA,Schram JL,Nycz CM,Shank DD

    更新日期:1994-07-11 00:00:00

  • MPromDb update 2010: an integrated resource for annotation and visualization of mammalian gene promoters and ChIP-seq experimental data.

    abstract::MPromDb (Mammalian Promoter Database) is a curated database that strives to annotate gene promoters identified from ChIP-seq results with the goal of providing an integrated resource for mammalian transcriptional regulation and epigenetics. We analyzed 507 million uniquely aligned RNAP-II ChIP-seq reads from 26 differ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq1171

    authors: Gupta R,Bhattacharyya A,Agosto-Perez FJ,Wickramasinghe P,Davuluri RV

    更新日期:2011-01-01 00:00:00

  • Characterization of human Spartan/C1orf124, an ubiquitin-PCNA interacting regulator of DNA damage tolerance.

    abstract::Unrepaired DNA damage may arrest ongoing replication forks, potentially resulting in fork collapse, increased mutagenesis and genomic instability. Replication through DNA lesions depends on mono- and polyubiquitylation of proliferating cell nuclear antigen (PCNA), which enable translesion synthesis (TLS) and template ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gks850

    authors: Juhasz S,Balogh D,Hajdu I,Burkovics P,Villamil MA,Zhuang Z,Haracska L

    更新日期:2012-11-01 00:00:00

  • Tissue-dependent isoforms of mammalian Fox-1 homologs are associated with tissue-specific splicing activities.

    abstract::An intronic hexanucleotide UGCAUG has been shown to play a critical role in the regulation of tissue-specific alternative splicing of pre-mRNAs in a wide range of tissues. Vertebrate Fox-1 has been shown to bind to this element, in a highly sequence-specific manner, through its RNA recognition motif (RRM). In mammals,...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gki338

    authors: Nakahata S,Kawamoto S

    更新日期:2005-04-11 00:00:00

  • Contributions of discrete tRNA(Ser) domains to aminoacylation by E.coli seryl-tRNA synthetase: a kinetic analysis using model RNA substrates.

    abstract::The aminoacylation kinetics of T7 transcripts representing defined regions of Escherichia coli serine tRNAs were determined using purified E.coli seryl-tRNA synthetase (SerRS) and the kinetic values were used to estimate the relative contribution of various tRNA(Ser) domains to recognition by SerRS. The analysis revea...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/21.19.4467

    authors: Sampson JR,Saks ME

    更新日期:1993-09-25 00:00:00

  • Analysis of the DNA joining repertoire of Chlorella virus DNA ligase and a new crystal structure of the ligase-adenylate intermediate.

    abstract::Chlorella virus DNA ligase is the smallest eukaryotic ATP-dependent DNA ligase known; it suffices for yeast cell growth in lieu of the essential yeast DNA ligase Cdc9. The Chlorella virus ligase-adenylate intermediate has an intrinsic nick sensing function and its DNA footprint extends 8-9 nt on the 3'-hydroxyl (3'-OH...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkg665

    authors: Odell M,Malinina L,Sriskanda V,Teplova M,Shuman S

    更新日期:2003-09-01 00:00:00

  • Characterization of a DNA binding domain in the C-terminus of HIV-1 integrase by deletion mutagenesis.

    abstract::The integrase (IN) protein of human immunodeficiency virus type 1 (HIV-1) catalyzes site-specific cleavage of 2 bases from the viral long terminal repeat (LTR) sequence yet it binds DNA with little DNA sequence specificity. We have previously demonstrated that the C-terminal half of IN (amino acids 154-288) possesses ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/21.15.3507

    authors: Woerner AM,Marcus-Sekura CJ

    更新日期:1993-07-25 00:00:00

  • PatternQuery: web application for fast detection of biomacromolecular structural patterns in the entire Protein Data Bank.

    abstract::Well defined biomacromolecular patterns such as binding sites, catalytic sites, specific protein or nucleic acid sequences, etc. precisely modulate many important biological phenomena. We introduce PatternQuery, a web-based application designed for detection and fast extraction of such patterns. The application uses a...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv561

    authors: Sehnal D,Pravda L,Svobodová Vařeková R,Ionescu CM,Koča J

    更新日期:2015-07-01 00:00:00

  • LocDB: experimental annotations of localization for Homo sapiens and Arabidopsis thaliana.

    abstract::LocDB is a manually curated database with experimental annotations for the subcellular localizations of proteins in Homo sapiens (HS, human) and Arabidopsis thaliana (AT, thale cress). Currently, it contains entries for 19,604 UniProt proteins (HS: 13,342; AT: 6262). Each database entry contains the experimentally der...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq927

    authors: Rastogi S,Rost B

    更新日期:2011-01-01 00:00:00

  • Metal-binding, nucleic acid-binding finger sequences in the CDC16 gene of Saccharomyces cerevisiae.

    abstract::The CDC16 gene is involved in the process of chromosome segregation in mitosis and a cdc16ts mutant accumulates the predominant microtubule-associated protein at the nonpermissive temperature. We find that the CDC16 gene open reading frame (ORF) is capable of encoding a protein whose calculated molecular weight and pI...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/15.20.8439

    authors: Icho T,Wickner RB

    更新日期:1987-10-26 00:00:00

  • Restriction endonuclease isoschizomers ItaI, BsoFI and Fsp4HI are characterised by differences in their sensitivities to CpG methylation.

    abstract::BsoFI , ItaI and Fsp4HI are isoshizomers of Fnu4HI (5'-GC NGC-3'). Both Fnu4HI and BsoFI have previously been shown to be inhibited by cytosine-specific methylation within the recognition sequence. Fnu4HI is inhibited if either the internal cytosine at position 2 or the external cytosine at position 5 of the restricti...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/25.16.3196

    authors: Ramsahoye BH,Burnett AK,Taylor C

    更新日期:1997-08-15 00:00:00

  • RNA-binding protein DDX1 is responsible for fatty acid-mediated repression of insulin translation.

    abstract::The molecular mechanism in pancreatic β cells underlying hyperlipidemia and insulin insufficiency remains unclear. Here, we find that the fatty acid-induced decrease in insulin levels occurs due to a decrease in insulin translation. Since regulation at the translational level is generally mediated through RNA-binding ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gky867

    authors: Li Z,Zhou M,Cai Z,Liu H,Zhong W,Hao Q,Cheng D,Hu X,Hou J,Xu P,Xue Y,Zhou Y,Xu T

    更新日期:2018-12-14 00:00:00

  • Sequence-specific interaction of Hoechst 33258 with the minor groove of an adenine-tract DNA duplex studied in solution by 1H NMR spectroscopy.

    abstract::The interaction of Hoechst 33258 with the minor groove of the adenine-tract DNA duplex d(CTTTTGCAAAAG)2 has been studied in both D2O and H2O solutions by 1D and 2D 1H NMR spectroscopy. Thirty-one nuclear Overhauser effects between drug and nucleotide protons within the minor groove of the duplex, together with ring-cu...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/18.13.3753

    authors: Searle MS,Embrey KJ

    更新日期:1990-07-11 00:00:00

  • Replisome stall events have shaped the distribution of replication origins in the genomes of yeasts.

    abstract::During S phase, the entire genome must be precisely duplicated, with no sections of DNA left unreplicated. Here, we develop a simple mathematical model to describe the probability of replication failing due to the irreversible stalling of replication forks. We show that the probability of complete genome replication i...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkt728

    authors: Newman TJ,Mamun MA,Nieduszynski CA,Blow JJ

    更新日期:2013-11-01 00:00:00

  • Post-transcriptional modification of the poly(A) length of galactose-1-phosphate uridyl transferase mRNA in Saccharomyces cerevisiae.

    abstract::Thermal elution poly(U)-Sepharose chromatography was utilized to fractionate yeast mRNA based on poly(A) size. Analysis of the in vitro translation products of the fractionated RNAs in a wheat-embryo cell-free protein synthesis system shows a heterogeneous but equal distribution of these abundant translatable mRNAs in...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/8.17.3841

    authors: Saunders CA,Bostian KA,Halvorson HO

    更新日期:1980-09-11 00:00:00

  • Pharos: Collating protein information to shed light on the druggable genome.

    abstract::The 'druggable genome' encompasses several protein families, but only a subset of targets within them have attracted significant research attention and thus have information about them publicly available. The Illuminating the Druggable Genome (IDG) program was initiated in 2014, has the goal of developing experimental...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw1072

    authors: Nguyen DT,Mathias S,Bologa C,Brunak S,Fernandez N,Gaulton A,Hersey A,Holmes J,Jensen LJ,Karlsson A,Liu G,Ma'ayan A,Mandava G,Mani S,Mehta S,Overington J,Patel J,Rouillard AD,Schürer S,Sheils T,Simeonov A,Sklar L

    更新日期:2017-01-04 00:00:00

  • Detection of uracil within DNA using a sensitive labeling method for in vitro and cellular applications.

    abstract::The role of uracil in genomic DNA has been recently re-evaluated. It is now widely accepted to be a physiologically important DNA element in diverse systems from specific phages to antibody maturation and Drosophila development. Further relevant investigations would largely benefit from a novel reliable and fast metho...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv977

    authors: Róna G,Scheer I,Nagy K,Pálinkás HL,Tihanyi G,Borsos M,Békési A,Vértessy BG

    更新日期:2016-02-18 00:00:00

  • Homologous DNA strand exchange activity of the human mitochondrial DNA helicase TWINKLE.

    abstract::A crucial component of the human mitochondrial DNA replisome is the ring-shaped helicase TWINKLE-a phage T7-gene 4-like protein expressed in the nucleus and localized in the human mitochondria. Our previous studies showed that despite being a helicase, TWINKLE has unique DNA annealing activity. At the time, the implic...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw098

    authors: Sen D,Patel G,Patel SS

    更新日期:2016-05-19 00:00:00

  • Ribosomal 5S genes in relation to C-value in amphibians.

    abstract::We have measured the amount of 5S-ribosomal DNA in the genomes of Xenopus laevis, Triturus cristatus carnifex and Ambystoma mexicanum, three species of Amphibians which have widely different C-values. Our best estimate is that these organisms have about 24,000, 32,000 and 61,000 5S-genes per haploid genome respectivel...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/11.8.2381

    authors: Hilder VA,Dawson GA,Vlad MT

    更新日期:1983-04-25 00:00:00

  • Biocuration of functional annotation at the European nucleotide archive.

    abstract::The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is a repository for the submission, maintenance and presentation of nucleotide sequence data and related sample and experimental information. In this article we report on ENA in 2015 regarding general activity, notable published data sets and major achiev...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv1311

    authors: Gibson R,Alako B,Amid C,Cerdeño-Tárraga A,Cleland I,Goodgame N,Ten Hoopen P,Jayathilaka S,Kay S,Leinonen R,Liu X,Pallreddy S,Pakseresht N,Rajan J,Rosselló M,Silvester N,Smirnov D,Toribio AL,Vaughan D,Zalunin V,Coc

    更新日期:2016-01-04 00:00:00

  • The proofreading exonuclease subunit epsilon of Escherichia coli DNA polymerase III is tethered to the polymerase subunit alpha via a flexible linker.

    abstract::Escherichia coli DNA polymerase III holoenzyme is composed of 10 different subunits linked by noncovalent interactions. The polymerase activity resides in the alpha-subunit. The epsilon-subunit, which contains the proofreading exonuclease site within its N-terminal 185 residues, binds to alpha via a segment of 57 addi...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkn489

    authors: Ozawa K,Jergic S,Park AY,Dixon NE,Otting G

    更新日期:2008-09-01 00:00:00

  • Identification of psiB genes of plasmids F and R6-5. Molecular basis for psiB enhanced expression in plasmid R6-5.

    abstract::PsiB protein of plasmid R6-5 inhibits the induction of the SOS pathway. The F sex factor also carries a psiB gene homologous to that of R6-5. Yet, it fails to inhibit SOS induction. In order to solve this difference, we characterized the psiB genes of R6-5 and F. We found that (i) the sequences of the two psiB genes s...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/16.22.10669

    authors: Dutreix M,Bäckman A,Célérier J,Bagdasarian MM,Sommer S,Bailone A,Devoret R,Bagdasarian M

    更新日期:1988-11-25 00:00:00

  • How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets.

    abstract::The sequencing of the full transcriptome (RNA-seq) has become the preferred choice for the measurement of genome-wide gene expression. Despite its widespread use, challenges remain in RNA-seq data analysis. One often-overlooked aspect is normalization. Despite the fact that a variety of factors or 'batch effects' can ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv736

    authors: Peixoto L,Risso D,Poplawski SG,Wimmer ME,Speed TP,Wood MA,Abel T

    更新日期:2015-09-18 00:00:00

  • Selective binding of actinomycin D and distamycin A to DNA.

    abstract::The exact sites at which a number of drugs inhibit the nick translation of DNA by E.coli DNA polymerase-I have been pinpointed. In order to do this, a method has been developed for sequencing double-stranded plasmid DNA from the site of a specifically induced nick. The initial experiments have concentrated on analysis...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/10.22.7273

    authors: Wilkins RJ

    更新日期:1982-11-25 00:00:00

  • Transcriptome signature of cellular senescence.

    abstract::Cellular senescence, an integral component of aging and cancer, arises in response to diverse triggers, including telomere attrition, macromolecular damage and signaling from activated oncogenes. At present, senescent cells are identified by the combined presence of multiple traits, such as senescence-associated prote...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz555

    authors: Casella G,Munk R,Kim KM,Piao Y,De S,Abdelmohsen K,Gorospe M

    更新日期:2019-08-22 00:00:00

  • Flexibility and stabilization of HgII-mediated C:T and T:T base pairs in DNA duplex.

    abstract::Owing to their great potentials in genetic code extension and the development of nucleic acid-based functional nanodevices, DNA duplexes containing HgII-mediated base pairs have been extensively studied during the past 60 years. However, structural basis underlying these base pairs remains poorly understood. Herein, w...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw1296

    authors: Liu H,Cai C,Haruehanroengra P,Yao Q,Chen Y,Yang C,Luo Q,Wu B,Li J,Ma J,Sheng J,Gan J

    更新日期:2017-03-17 00:00:00