A Protein Classification Benchmark collection for machine learning.

Abstract:

:Protein classification by machine learning algorithms is now widely used in structural and functional annotation of proteins. The Protein Classification Benchmark collection (http://hydra.icgeb.trieste.it/benchmark) was created in order to provide standard datasets on which the performance of machine learning methods can be compared. It is primarily meant for method developers and users interested in comparing methods under standardized conditions. The collection contains datasets of sequences and structures, and each set is subdivided into positive/negative, training/test sets in several ways. There is a total of 6405 classification tasks, 3297 on protein sequences, 3095 on protein structures and 10 on protein coding regions in DNA. Typical tasks include the classification of structural domains in the SCOP and CATH databases based on their sequences or structures, as well as various functional and taxonomic classification problems. In the case of hierarchical classification schemes, the classification tasks can be defined at various levels of the hierarchy (such as classes, folds, superfamilies, etc.). For each dataset there are distance matrices available that contain all vs. all comparison of the data, based on various sequence or structure comparison methods, as well as a set of classification performance measures computed with various classifier algorithms.

journal_name

Nucleic Acids Res

journal_title

Nucleic acids research

authors

Sonego P,Pacurar M,Dhir S,Kertész-Farkas A,Kocsor A,Gáspári Z,Leunissen JA,Pongor S

doi

10.1093/nar/gkl812

subject

Has Abstract

pub_date

2007-01-01 00:00:00

pages

D232-6

issue

Database issue

eissn

0305-1048

issn

1362-4962

pii

gkl812

journal_volume

35

pub_type

杂志文章
  • Somatotroph- and lactotroph-specific interactions with the homeobox protein binding sites in the rat growth hormone gene promoter.

    abstract::Nuclear extracts prepared from growth hormone-secreting (GC) and prolactin-secreting (235-1) rat anterior pituitary cell lines were compared for their ability to bind to the DNA sequences conferring tissue-specificity to the expression of the rat growth hormone (rGH) gene promoter. Cell-specific differences in the int...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/18.17.5235

    authors: Schaufele F,West BL,Reudelhuber T

    更新日期:1990-09-11 00:00:00

  • ProtoNet 4.0: a hierarchical classification of one million protein sequences.

    abstract::ProtoNet is an automatic hierarchical classification of the protein sequence space. In 2004, the ProtoNet (version 4.0) presents the analysis of over one million proteins merged from SwissProt and TrEMBL databases. In addition to rich visualization and analysis tools to navigate the clustering hierarchy, we incorporat...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gki007

    authors: Kaplan N,Sasson O,Inbar U,Friedlich M,Fromer M,Fleischer H,Portugaly E,Linial N,Linial M

    更新日期:2005-01-01 00:00:00

  • Profiling the transcription factor regulatory networks of human cell types.

    abstract::Neph et al. (2012) (Circuitry and dynamics of human transcription factor regulatory networks. Cell, 150: 1274-1286) reported the transcription factor (TF) regulatory networks of 41 human cell types using the DNaseI footprinting technique. This provides a valuable resource for uncovering regulation principles in differ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gku923

    authors: Zhang S,Tian D,Tran NH,Choi KP,Zhang L

    更新日期:2014-11-10 00:00:00

  • TcSNP: a database of genetic variation in Trypanosoma cruzi.

    abstract::The TcSNP database (http://snps.tcruzi.org) integrates information on genetic variation (polymorphisms and mutations) for different stocks, strains and isolates of Trypanosoma cruzi, the causative agent of Chagas disease. The database incorporates sequences (genes from the T. cruzi reference genome, mRNAs, ESTs and ge...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkn874

    authors: Ackermann AA,Carmona SJ,Agüero F

    更新日期:2009-01-01 00:00:00

  • Analysis of recombination in mammalian cells using SV40 genome segments having homologous overlapping termini.

    abstract::Segments of SV40 DNA having homologous overlapping termini recombine to produce viable genomes in monkey cells. Frequencies of recombination on either side of a deletion marker are non-random; replication and palindromes do not appear to be essential. Since recombination involves host enzymes, a suitable system has be...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/8.12.2725

    authors: Upcroft P,Carter B,Kidson C

    更新日期:1980-06-25 00:00:00

  • Contributions of DNA interstrand cross-links to aging of cells and organisms.

    abstract::Impaired DNA damage repair, especially deficient transcription-coupled nucleotide excision repair, leads to segmental progeroid syndromes in human patients as well as in rodent models. Furthermore, DNA double-strand break signalling has been pinpointed as a key inducer of cellular senescence. Several recent findings s...

    journal_title:Nucleic acids research

    pub_type: 杂志文章,评审

    doi:10.1093/nar/gkm1065

    authors: Grillari J,Katinger H,Voglauer R

    更新日期:2007-01-01 00:00:00

  • Influence of ground-state structure and Mg2+ binding on folding kinetics of the guanine-sensing riboswitch aptamer domain.

    abstract::Riboswitch RNAs fold into complex tertiary structures upon binding to their cognate ligand. Ligand recognition is accomplished by key residues in the binding pocket. In addition, it often crucially depends on the stability of peripheral structural elements. The ligand-bound complex of the guanine-sensing riboswitch fr...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr664

    authors: Buck J,Wacker A,Warkentin E,Wöhnert J,Wirmer-Bartoschek J,Schwalbe H

    更新日期:2011-12-01 00:00:00

  • Invadolysin acts genetically via the SAGA complex to modulate chromosome structure.

    abstract::Identification of components essential to chromosome structure and behaviour remains a vibrant area of study. We have previously shown that invadolysin is essential in Drosophila, with roles in cell division and cell migration. Mitotic chromosomes are hypercondensed in length, but display an aberrant fuzzy appearance....

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv211

    authors: Rao SG,Janiszewski MM,Duca E,Nelson B,Abhinav K,Panagakou I,Vass S,Heck MM

    更新日期:2015-04-20 00:00:00

  • The H19/let-7 double-negative feedback loop contributes to glucose metabolism in muscle cells.

    abstract::The H19 lncRNA has been implicated in development and growth control and is associated with human genetic disorders and cancer. Acting as a molecular sponge, H19 inhibits microRNA (miRNA) let-7. Here we report that H19 is significantly decreased in muscle of human subjects with type-2 diabetes and insulin resistant ro...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gku1160

    authors: Gao Y,Wu F,Zhou J,Yan L,Jurczak MJ,Lee HY,Yang L,Mueller M,Zhou XB,Dandolo L,Szendroedi J,Roden M,Flannery C,Taylor H,Carmichael GG,Shulman GI,Huang Y

    更新日期:2014-12-16 00:00:00

  • A Thermus phage protein inhibits host RNA polymerase by preventing template DNA strand loading during open promoter complex formation.

    abstract::RNA polymerase (RNAP) is a major target of gene regulation. Thermus thermophilus bacteriophage P23-45 encodes two RNAP binding proteins, gp39 and gp76, which shut off host gene transcription while allowing orderly transcription of phage genes. We previously reported the structure of the T. thermophilus RNAP•σA holoenz...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx1162

    authors: Ooi WY,Murayama Y,Mekler V,Minakhin L,Severinov K,Yokoyama S,Sekine SI

    更新日期:2018-01-09 00:00:00

  • An on-bead tailing/ligation approach for sequencing resin-bound RNA libraries.

    abstract::Nucleic acids possess the unique property of being enzymatically amplifiable, and have therefore been a popular choice for the combinatorial selection of functional sequences, such as aptamers or ribozymes. However, amplification typically requires known sequence segments that serve as primer binding sites, which can ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gks004

    authors: Wiesmayr A,Fournier P,Jäschke A

    更新日期:2012-05-01 00:00:00

  • Purification by DNA affinity precipitation of the cellular factors HEB1-p67 and HEB1-p94 which bind specifically to the human T-cell leukemia virus type-I 21 bp enhancer.

    abstract::Transcription driven by the proviral promoter of the Human T-cell Leukemia Virus type I (HTLV-I) is tightly regulated by the Tax1 transactivator. This viral protein potently induces the enhancer activity of a 21 bp motif repeated three times in the promoter. We have previously shown that this induction results from th...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/21.17.3935

    authors: Lombard-Platet G,Jalinot P

    更新日期:1993-08-25 00:00:00

  • One size does not fit all: on how Markov model order dictates performance of genomic sequence analyses.

    abstract::The structural simplicity and ability to capture serial correlations make Markov models a popular modeling choice in several genomic analyses, such as identification of motifs, genes and regulatory elements. A critical, yet relatively unexplored, issue is the determination of the order of the Markov model. Most biolog...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gks1285

    authors: Narlikar L,Mehta N,Galande S,Arjunwadkar M

    更新日期:2013-02-01 00:00:00

  • Identification of the minor guanylated tRNA of rabbit reticulocytes.

    abstract::Two of the tRNA's found in rabbit reticulocytes are substrates for a post-transcriptional modification leading to the incorporation of guanine into the polynucleotide chain. The major guanylated tRNA was previously identified as tRNA (His). In the present report we show that the minor guanylated tRNA is tRNA (Asn), an...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/3.10.2521

    authors: Farkas WR,Chernoff D

    更新日期:1976-10-01 00:00:00

  • Ab initio gene identification in metagenomic sequences.

    abstract::We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed t...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq275

    authors: Zhu W,Lomsadze A,Borodovsky M

    更新日期:2010-07-01 00:00:00

  • Fine structure mapping of an avian tumor virus RNA by immunoelectron microscopy.

    abstract::The RNA of a deleted strain (lacking Src gene) of an avian sarcoma virus (ASV) was examined by a newly developed immunoelectron microscopic procedure which uses anti-nucleotide antibodies as probes. After denaturation of the RNA and reaction with a high affinity, highly specific anti-7-methylguanosine-5'-phosphate (an...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/8.19.4485

    authors: Castleman H,Meredith RD,Erlanger BF

    更新日期:1980-10-10 00:00:00

  • EVEREST: a collection of evolutionary conserved protein domains.

    abstract::Protein domains are subunits of proteins that recur throughout the protein world. There are many definitions attempting to capture the essence of a protein domain, and several systems that identify protein domains and classify them into families. EVEREST, recently described in Portugaly et al. (2006) BMC Bioinformatic...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkl850

    authors: Portugaly E,Linial N,Linial M

    更新日期:2007-01-01 00:00:00

  • Specific interactions of distamycin with G-quadruplex DNA.

    abstract::Distamycin binds the minor groove of duplex DNA at AT-rich regions and has been a valuable probe of protein interactions with double-stranded DNA. We find that distamycin can also inhibit protein interactions with G-quadruplex (G4) DNA, a stable four-stranded structure in which the repeating unit is a G-quartet. Using...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkg392

    authors: Cocco MJ,Hanakahi LA,Huber MD,Maizels N

    更新日期:2003-06-01 00:00:00

  • The influence of tertiary structural restraints on conformational transitions in superhelical DNA.

    abstract::This paper examines theoretically the effects that restraints on the tertiary structure of a superhelical DNA domain exert on the energetics of linking and the onset of conformational transitions. The most important tertiary constraint arises from the nucleosomal winding of genomic DNA in vivo. Conformational transiti...

    journal_title:Nucleic acids research

    pub_type: 杂志文章,评审

    doi:10.1093/nar/15.23.9985

    authors: Benham CJ

    更新日期:1987-12-10 00:00:00

  • Identification of endoribonuclease specific cleavage positions reveals novel targets of RNase III in Streptococcus pyogenes.

    abstract::A better understanding of transcriptional and post-transcriptional regulation of gene expression in bacteria relies on studying their transcriptome. RNA sequencing methods are used not only to assess RNA abundance but also the exact boundaries of primary and processed transcripts. Here, we developed a method, called i...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw1316

    authors: Le Rhun A,Lécrivain AL,Reimegård J,Proux-Wéra E,Broglia L,Della Beffa C,Charpentier E

    更新日期:2017-03-17 00:00:00

  • Characterisation of a genomic clone covering the structural mouse MyoD1 gene and its promoter region.

    abstract::We have isolated the mouse MyoD1 gene flanked by its promoter region by screening a genomic library with synthetic oligonucleotides. The structural gene is interrupted by two G + C rich introns. Transfection of the cloned gene inserted into an expression vector converts fibroblasts to myoblasts. Sequence analysis of a...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/19.23.6433

    authors: Zingg JM,Alva GP,Jost JP

    更新日期:1991-12-11 00:00:00

  • Nucleotide sequence of satellite DNA contained in the eliminated genome of Ascaris lumbricoides.

    abstract::Several restriction endonuclease fragments isolated from highly repetitive satellite DNA of the chromatin eliminating nematode Ascaris lumbricoides var. suum have been cloned. Each type of restriction fragment corresponds to a different variant of the same related ancestral sequence. These variants differ by small del...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/10.23.7493

    authors: Müller F,Walker P,Aeby P,Neuhaus H,Felder H,Back E,Tobler H

    更新日期:1982-12-11 00:00:00

  • Selection of template initiation sites and the lengths of RNA primers synthesized by DNA primase are strongly affected by its organization in a multiprotein DNA polymerase alpha complex.

    abstract::Synthesis of (p)ppRNA-DNA chains by purified HeLa cell DNA primase-DNA polymerase alpha (pol alpha-primase) was compared with those synthesized by a multiprotein form of DNA polymerase alpha (pol alpha 2) using unique single-stranded DNA templates containing the origin of replication for simian virus 40 (SV40) DNA. Th...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/14.18.7305

    authors: Vishwanatha JK,Yamaguchi M,DePamphilis ML,Baril EF

    更新日期:1986-09-25 00:00:00

  • Comparative calorimetric studies on the dynamic conformation of plant 5S rRNA. I. Thermal unfolding pattern of lupin seeds and wheat germ 5S rRNAs, also in the presence of magnesium and sperminium cations.

    abstract::An attempt has been made to correlate differential scanning calorimetry melting profiles of 5S rRNAs from lupin seeds (L.s.) and wheat germ (W.g.) with their structure. It is suggested that the observed differences in thermal unfolding are due to differences in RNA nucleotide sequence and as a consequence in higher or...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/16.2.685

    authors: Barciszewski J,Bratek-Wiewiórowska MD,Górnicki P,Naskret-Barciszewska M,Wiewiórowski M,Zielenkiewicz A,Zielenkiewicz W

    更新日期:1988-01-25 00:00:00

  • Recent improvements to Binding MOAD: a resource for protein-ligand binding affinities and structures.

    abstract::For over 10 years, Binding MOAD (Mother of All Databases; http://www.BindingMOAD.org) has been one of the largest resources for high-quality protein-ligand complexes and associated binding affinity data. Binding MOAD has grown at the rate of 1994 complexes per year, on average. Currently, it contains 23,269 complexes ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gku1088

    authors: Ahmed A,Smith RD,Clark JJ,Dunbar JB Jr,Carlson HA

    更新日期:2015-01-01 00:00:00

  • A Cas9-based toolkit to program gene expression in Saccharomyces cerevisiae.

    abstract::Despite the extensive use of Saccharomyces cerevisiae as a platform for synthetic biology, strain engineering remains slow and laborious. Here, we employ CRISPR/Cas9 technology to build a cloning-free toolkit that addresses commonly encountered obstacles in metabolic engineering, including chromosomal integration locu...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw1023

    authors: Reider Apel A,d'Espaux L,Wehrs M,Sachs D,Li RA,Tong GJ,Garber M,Nnadi O,Zhuang W,Hillson NJ,Keasling JD,Mukhopadhyay A

    更新日期:2017-01-09 00:00:00

  • Activation of the Bcl-2 promoter by nerve growth factor is mediated by the p42/p44 MAPK cascade.

    abstract::The Bcl-2 protein has an anti-apoptotic effect in neuronal and other cell types. We show for the first time that the Bcl-2 promoter is activated by the neuronal survival factor nerve growth factor (NGF) and that this effect is dependent on a region of the promoter from -1472 to -1414. This activation requires the Rap-...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/27.10.2086

    authors: Liu YZ,Boxer LM,Latchman DS

    更新日期:1999-05-15 00:00:00

  • Europe PMC: a full-text literature database for the life sciences and platform for innovation.

    abstract::This article describes recent developments of Europe PMC (http://europepmc.org), the leading database for life science literature. Formerly known as UKPMC, the service was rebranded in November 2012 as Europe PMC to reflect the scope of the funding agencies that support it. Several new developments have enriched Europ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gku1061

    authors: Europe PMC Consortium.

    更新日期:2015-01-01 00:00:00

  • Quantitative quality control in microarray image processing and data acquisition.

    abstract::A new integrated image analysis package with quantitative quality control schemes is described for cDNA microarray technology. The package employs an iterative algorithm that utilizes both intensity characteristics and spatial information of the spots on a microarray image for signal-background segmentation and define...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/29.15.e75

    authors: Wang X,Ghosh S,Guo SW

    更新日期:2001-08-01 00:00:00

  • High-throughput single-molecule mapping links subtelomeric variants and long-range haplotypes with specific telomeres.

    abstract::Accurate maps and DNA sequences for human subtelomere regions, along with detailed knowledge of subtelomere variation and long-range telomere-terminal haplotypes in individuals, are critical for understanding telomere function and its roles in human biology. Here, we use a highly automated whole genome mapping technol...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx017

    authors: Young E,Pastor S,Rajagopalan R,McCaffrey J,Sibert J,Mak ACY,Kwok PY,Riethman H,Xiao M

    更新日期:2017-05-19 00:00:00