Abstract:
:An important problem in genomics is automatically clustering homologous proteins when only sequence information is available. Most methods for clustering proteins are local, and are based on simply thresholding a measure related to sequence distance. We first show how locality limits the performance of such methods by analysing the distribution of distances between protein sequences. We then present a global method based on spectral clustering and provide theoretical justification of why it will have a remarkable improvement over local methods. We extensively tested our method and compared its performance with other local methods on several subsets of the SCOP (Structural Classification of Proteins) database, a gold standard for protein structure classification. We consistently observed that, the number of clusters that we obtain for a given set of proteins is close to the number of superfamilies in that set; there are fewer singletons; and the method correctly groups most remote homologs. In our experiments, the quality of the clusters as quantified by a measure that combines sensitivity and specificity was consistently better [on average, improvements were 84% over hierarchical clustering, 34% over Connected Component Analysis (CCA) (similar to GeneRAGE) and 72% over another global method, TribeMCL].
journal_name
Nucleic Acids Resjournal_title
Nucleic acids researchauthors
Paccanaro A,Casbon JA,Saqi MAdoi
10.1093/nar/gkj515keywords:
subject
Has Abstractpub_date
2006-03-17 00:00:00pages
1571-80issue
5eissn
0305-1048issn
1362-4962pii
34/5/1571journal_volume
34pub_type
杂志文章abstract::Translational repression (TR) plays an important role in post-transcriptional regulation of gene expression and embryonic development in metazoans. TR also regulates the expression of a subset of the cytoplasmic mRNA population during development of fertilized female gametes of the unicellular malaria parasite, Plasmo...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkm1142
更新日期:2008-03-01 00:00:00
abstract::Transcription driven by the proviral promoter of the Human T-cell Leukemia Virus type I (HTLV-I) is tightly regulated by the Tax1 transactivator. This viral protein potently induces the enhancer activity of a 21 bp motif repeated three times in the promoter. We have previously shown that this induction results from th...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/21.17.3935
更新日期:1993-08-25 00:00:00
abstract::The stability of chromosome ends, the telomeres, is dependent on the ribonucleoprotein telomerase. In vitro, telomerase requires at least one RNA molecule and a reverse transcriptase-like protein. However, for telomere homeostasis in vivo, additional proteins are required. Telomerase RNAs of different species vary in ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkt514
更新日期:2013-09-01 00:00:00
abstract::The Cox protein from bacteriophage P2 forms oligomeric filaments and it has been proposed that DNA can be wound up around these filaments, similar to how histones condense DNA. We here use fluorescence microscopy to study single DNA-Cox complexes in nanofluidic channels and compare how the Cox homologs from phages P2 ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkw352
更新日期:2016-09-06 00:00:00
abstract::Strand Displacement Amplification (SDA) is an isothermal, in vitro method of amplifying a DNA target sequence prior to detection [Walker et al (1992) Nucleic Acids Res., 20, 1691-1693]. Here we describe a multiplex form of SDA that allows two target sequences and an internal amplification control to be co-amplified by...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/22.13.2670
更新日期:1994-07-11 00:00:00
abstract::MPromDb (Mammalian Promoter Database) is a curated database that strives to annotate gene promoters identified from ChIP-seq results with the goal of providing an integrated resource for mammalian transcriptional regulation and epigenetics. We analyzed 507 million uniquely aligned RNAP-II ChIP-seq reads from 26 differ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkq1171
更新日期:2011-01-01 00:00:00
abstract::Unrepaired DNA damage may arrest ongoing replication forks, potentially resulting in fork collapse, increased mutagenesis and genomic instability. Replication through DNA lesions depends on mono- and polyubiquitylation of proliferating cell nuclear antigen (PCNA), which enable translesion synthesis (TLS) and template ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gks850
更新日期:2012-11-01 00:00:00
abstract::An intronic hexanucleotide UGCAUG has been shown to play a critical role in the regulation of tissue-specific alternative splicing of pre-mRNAs in a wide range of tissues. Vertebrate Fox-1 has been shown to bind to this element, in a highly sequence-specific manner, through its RNA recognition motif (RRM). In mammals,...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gki338
更新日期:2005-04-11 00:00:00
abstract::The aminoacylation kinetics of T7 transcripts representing defined regions of Escherichia coli serine tRNAs were determined using purified E.coli seryl-tRNA synthetase (SerRS) and the kinetic values were used to estimate the relative contribution of various tRNA(Ser) domains to recognition by SerRS. The analysis revea...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/21.19.4467
更新日期:1993-09-25 00:00:00
abstract::Chlorella virus DNA ligase is the smallest eukaryotic ATP-dependent DNA ligase known; it suffices for yeast cell growth in lieu of the essential yeast DNA ligase Cdc9. The Chlorella virus ligase-adenylate intermediate has an intrinsic nick sensing function and its DNA footprint extends 8-9 nt on the 3'-hydroxyl (3'-OH...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkg665
更新日期:2003-09-01 00:00:00
abstract::The integrase (IN) protein of human immunodeficiency virus type 1 (HIV-1) catalyzes site-specific cleavage of 2 bases from the viral long terminal repeat (LTR) sequence yet it binds DNA with little DNA sequence specificity. We have previously demonstrated that the C-terminal half of IN (amino acids 154-288) possesses ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/21.15.3507
更新日期:1993-07-25 00:00:00
abstract::Well defined biomacromolecular patterns such as binding sites, catalytic sites, specific protein or nucleic acid sequences, etc. precisely modulate many important biological phenomena. We introduce PatternQuery, a web-based application designed for detection and fast extraction of such patterns. The application uses a...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkv561
更新日期:2015-07-01 00:00:00
abstract::LocDB is a manually curated database with experimental annotations for the subcellular localizations of proteins in Homo sapiens (HS, human) and Arabidopsis thaliana (AT, thale cress). Currently, it contains entries for 19,604 UniProt proteins (HS: 13,342; AT: 6262). Each database entry contains the experimentally der...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkq927
更新日期:2011-01-01 00:00:00
abstract::The CDC16 gene is involved in the process of chromosome segregation in mitosis and a cdc16ts mutant accumulates the predominant microtubule-associated protein at the nonpermissive temperature. We find that the CDC16 gene open reading frame (ORF) is capable of encoding a protein whose calculated molecular weight and pI...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/15.20.8439
更新日期:1987-10-26 00:00:00
abstract::BsoFI , ItaI and Fsp4HI are isoshizomers of Fnu4HI (5'-GC NGC-3'). Both Fnu4HI and BsoFI have previously been shown to be inhibited by cytosine-specific methylation within the recognition sequence. Fnu4HI is inhibited if either the internal cytosine at position 2 or the external cytosine at position 5 of the restricti...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/25.16.3196
更新日期:1997-08-15 00:00:00
abstract::The molecular mechanism in pancreatic β cells underlying hyperlipidemia and insulin insufficiency remains unclear. Here, we find that the fatty acid-induced decrease in insulin levels occurs due to a decrease in insulin translation. Since regulation at the translational level is generally mediated through RNA-binding ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gky867
更新日期:2018-12-14 00:00:00
abstract::The interaction of Hoechst 33258 with the minor groove of the adenine-tract DNA duplex d(CTTTTGCAAAAG)2 has been studied in both D2O and H2O solutions by 1D and 2D 1H NMR spectroscopy. Thirty-one nuclear Overhauser effects between drug and nucleotide protons within the minor groove of the duplex, together with ring-cu...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/18.13.3753
更新日期:1990-07-11 00:00:00
abstract::During S phase, the entire genome must be precisely duplicated, with no sections of DNA left unreplicated. Here, we develop a simple mathematical model to describe the probability of replication failing due to the irreversible stalling of replication forks. We show that the probability of complete genome replication i...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkt728
更新日期:2013-11-01 00:00:00
abstract::Thermal elution poly(U)-Sepharose chromatography was utilized to fractionate yeast mRNA based on poly(A) size. Analysis of the in vitro translation products of the fractionated RNAs in a wheat-embryo cell-free protein synthesis system shows a heterogeneous but equal distribution of these abundant translatable mRNAs in...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/8.17.3841
更新日期:1980-09-11 00:00:00
abstract::The 'druggable genome' encompasses several protein families, but only a subset of targets within them have attracted significant research attention and thus have information about them publicly available. The Illuminating the Druggable Genome (IDG) program was initiated in 2014, has the goal of developing experimental...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkw1072
更新日期:2017-01-04 00:00:00
abstract::The role of uracil in genomic DNA has been recently re-evaluated. It is now widely accepted to be a physiologically important DNA element in diverse systems from specific phages to antibody maturation and Drosophila development. Further relevant investigations would largely benefit from a novel reliable and fast metho...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkv977
更新日期:2016-02-18 00:00:00
abstract::A crucial component of the human mitochondrial DNA replisome is the ring-shaped helicase TWINKLE-a phage T7-gene 4-like protein expressed in the nucleus and localized in the human mitochondria. Our previous studies showed that despite being a helicase, TWINKLE has unique DNA annealing activity. At the time, the implic...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkw098
更新日期:2016-05-19 00:00:00
abstract::We have measured the amount of 5S-ribosomal DNA in the genomes of Xenopus laevis, Triturus cristatus carnifex and Ambystoma mexicanum, three species of Amphibians which have widely different C-values. Our best estimate is that these organisms have about 24,000, 32,000 and 61,000 5S-genes per haploid genome respectivel...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/11.8.2381
更新日期:1983-04-25 00:00:00
abstract::The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is a repository for the submission, maintenance and presentation of nucleotide sequence data and related sample and experimental information. In this article we report on ENA in 2015 regarding general activity, notable published data sets and major achiev...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkv1311
更新日期:2016-01-04 00:00:00
abstract::Escherichia coli DNA polymerase III holoenzyme is composed of 10 different subunits linked by noncovalent interactions. The polymerase activity resides in the alpha-subunit. The epsilon-subunit, which contains the proofreading exonuclease site within its N-terminal 185 residues, binds to alpha via a segment of 57 addi...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkn489
更新日期:2008-09-01 00:00:00
abstract::PsiB protein of plasmid R6-5 inhibits the induction of the SOS pathway. The F sex factor also carries a psiB gene homologous to that of R6-5. Yet, it fails to inhibit SOS induction. In order to solve this difference, we characterized the psiB genes of R6-5 and F. We found that (i) the sequences of the two psiB genes s...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/16.22.10669
更新日期:1988-11-25 00:00:00
abstract::The sequencing of the full transcriptome (RNA-seq) has become the preferred choice for the measurement of genome-wide gene expression. Despite its widespread use, challenges remain in RNA-seq data analysis. One often-overlooked aspect is normalization. Despite the fact that a variety of factors or 'batch effects' can ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkv736
更新日期:2015-09-18 00:00:00
abstract::The exact sites at which a number of drugs inhibit the nick translation of DNA by E.coli DNA polymerase-I have been pinpointed. In order to do this, a method has been developed for sequencing double-stranded plasmid DNA from the site of a specifically induced nick. The initial experiments have concentrated on analysis...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/10.22.7273
更新日期:1982-11-25 00:00:00
abstract::Cellular senescence, an integral component of aging and cancer, arises in response to diverse triggers, including telomere attrition, macromolecular damage and signaling from activated oncogenes. At present, senescent cells are identified by the combined presence of multiple traits, such as senescence-associated prote...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkz555
更新日期:2019-08-22 00:00:00
abstract::Owing to their great potentials in genetic code extension and the development of nucleic acid-based functional nanodevices, DNA duplexes containing HgII-mediated base pairs have been extensively studied during the past 60 years. However, structural basis underlying these base pairs remains poorly understood. Herein, w...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkw1296
更新日期:2017-03-17 00:00:00