Abstract:
BACKGROUND:Caspases are a family of proteases that have central functions in programmed cell death (apoptosis) and inflammation. Caspases mediate their effects through aspartate-specific cleavage of their target proteins, and at present almost 400 caspase substrates are known. There are several methods developed to predict caspase cleavage sites from individual proteins, but currently none of them can be used to predict caspase cleavage sites from multiple proteins or entire proteomes, or to use several classifiers in combination. The possibility to create a database from predicted caspase cleavage products for the whole genome could significantly aid in identifying novel caspase targets from tandem mass spectrometry based proteomic experiments. RESULTS:Three different pattern recognition classifiers were developed for predicting caspase cleavage sites from protein sequences. Evaluation of the classifiers with quality measures indicated that all of the three classifiers performed well in predicting caspase cleavage sites, and when combining different classifiers the accuracy increased further. A new tool, Pripper, was developed to utilize the classifiers and predict the caspase cut sites from an arbitrary number of input sequences. A database was constructed with the developed tool, and it was used to identify caspase target proteins from tandem mass spectrometry data from two different proteomic experiments. Both known caspase cleavage products as well as novel cleavage products were identified using the database demonstrating the usefulness of the tool. Pripper is not restricted to predicting only caspase cut sites, but it gives the possibility to scan protein sequences for any given motif(s) and predict cut sites once a suitable cut site prediction model for any other protease has been developed. Pripper is freely available and can be downloaded from http://users.utu.fi/mijopi/Pripper. CONCLUSIONS:We have developed Pripper, a tool for reading an arbitrary number of proteins in FASTA format, predicting their caspase cleavage sites and outputting the cleaved sequences to a new FASTA format sequence file. We show that Pripper is a valuable tool in identifying novel caspase target proteins from modern proteomics experiments.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Piippo M,Lietzén N,Nevalainen OS,Salmi J,Nyman TAdoi
10.1186/1471-2105-11-320subject
Has Abstractpub_date
2010-06-15 00:00:00pages
320issn
1471-2105pii
1471-2105-11-320journal_volume
11pub_type
杂志文章abstract:BACKGROUND:One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to th...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-410
更新日期:2006-09-14 00:00:00
abstract:BACKGROUND:Detecting local correlations in expression between neighboring genes along the genome has proved to be an effective strategy to identify possible causes of transcriptional deregulation in cancer. It has been successfully used to illustrate the role of mechanisms such as copy number variation (CNV) or epigene...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1742-5
更新日期:2017-07-11 00:00:00
abstract:BACKGROUND:Nearly all cellular processes involve proteins structurally rearranging to accommodate molecular partners. The energy landscape underscores the inherent nature of proteins as dynamic molecules interconverting between structures with varying energies. In principle, reconstructing a protein's energy landscape ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2822-5
更新日期:2019-06-06 00:00:00
abstract:BACKGROUND:Numerous models for use in interpreting quantitative PCR (qPCR) data are present in recent literature. The most commonly used models assume the amplification in qPCR is exponential and fit an exponential model with a constant rate of increase to a select part of the curve. Kinetic theory may be used to model...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-203
更新日期:2012-08-16 00:00:00
abstract:BACKGROUND:In real-time PCR, it is necessary to consider the efficiency of amplification (EA) of amplicons in order to determine initial target levels properly. EAs can be deduced from standard curves, but these involve extra effort and cost and may yield invalid EAs. Alternatively, EA can be extracted from individual ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-95
更新日期:2008-02-12 00:00:00
abstract:BACKGROUND:Vast progress in sequencing projects has called for annotation on a large scale. A Number of methods have been developed to address this challenging task. These methods, however, either apply to specific subsets, or their predictions are not formalised, or they do not provide precise confidence values for th...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-161
更新日期:2006-03-20 00:00:00
abstract:BACKGROUND:Since their first commercialization, the diversity of taxa and the genetic composition of transgene sequences in genetically modified plants (GMOs) are constantly increasing. To date, the detection of GMOs and derived products is commonly performed by PCR-based methods targeting specific DNA sequences introd...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-256
更新日期:2013-08-22 00:00:00
abstract:BACKGROUND:New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bio...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S14-S4
更新日期:2009-11-10 00:00:00
abstract:BACKGROUND:The binding of regulatory proteins to their specific DNA targets determines the accurate expression of the neighboring genes. The in silico prediction of new binding sites in completely sequenced genomes is a key aspect in the deeper understanding of gene regulatory networks. Several algorithms have been des...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-554
更新日期:2010-11-11 00:00:00
abstract:BACKGROUND:Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-433
更新日期:2008-10-14 00:00:00
abstract:BACKGROUND:As a consequence of the evolutionary process, data collected from related species tend to be similar. This similarity by descent can obscure subtler signals in the data such as the evidence of constraint on variation due to shared selective pressures. In comparative sequence analysis, for example, sequence s...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-222
更新日期:2007-06-26 00:00:00
abstract:BACKGROUND:Protein function in eukaryotic cells is often controlled in a cell cycle-dependent manner. Therefore, the correct assignment of cellular phenotypes to cell cycle phases is a crucial task in cell biology research. Nuclear proteins whose localization varies during the cell cycle are valuable and frequently use...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0618-9
更新日期:2015-05-29 00:00:00
abstract:BACKGROUND:Modules of interacting components arranged in specific network topologies have evolved to perform a diverse array of cellular functions. For a network with a constant topological structure, its function within a cell may still be tuned by changing the number of instances of a particular component (e.g., gene...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2866-6
更新日期:2019-05-14 00:00:00
abstract:BACKGROUND:Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted cont...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2032-6
更新日期:2018-01-25 00:00:00
abstract:BACKGROUND:Cryo-electron microscopy (Cryo-EM) is widely used in the determination of the three-dimensional (3D) structures of macromolecules. Particle picking from 2D micrographs remains a challenging early step in the Cryo-EM pipeline due to the diversity of particle shapes and the extremely low signal-to-noise ratio ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03809-7
更新日期:2020-11-09 00:00:00
abstract:BACKGROUND:Many readers will sympathize with the following story. You are viewing a gene sequence in Entrez, and you want to find whether it contains a particular sequence motif. You reach for the browser's "find in page" button, but those darn spaces every 10 bp get in the way. And what if the motif is on the opposite...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-4-1
更新日期:2003-01-04 00:00:00
abstract:BACKGROUND:Wise et al. introduced a rank-based statistical technique for meta-analysis of genome scans, the Genome Scan Meta-Analysis (GSMA) method. Levinson et al. recently described two generalizations of the GSMA statistic: (i) a weighted version of the GSMA statistic, so that different studies could be ascribed dif...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-32
更新日期:2005-02-17 00:00:00
abstract:BACKGROUND:The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-525
更新日期:2008-12-08 00:00:00
abstract:BACKGROUND:Protein aggregation is a significant problem in the biopharmaceutical industry (protein drug stability) and is associated medically with over 40 human diseases. Although a number of computational models have been developed for predicting aggregation propensity and identifying aggregation-prone regions in pro...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-314
更新日期:2013-10-28 00:00:00
abstract:BACKGROUND:Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1419-5
更新日期:2017-01-03 00:00:00
abstract:BACKGROUND:Lineage rate heterogeneity can be a major source of bias, especially in multi-gene phylogeny inference. We had previously tackled this issue by developing LS3, a data subselection algorithm that, by removing fast-evolving sequences in a gene-specific manner, identifies subsets of sequences that evolve at a r...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3020-1
更新日期:2019-08-13 00:00:00
abstract:BACKGROUND:In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure. The width of the Gaus...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0526-z
更新日期:2015-03-21 00:00:00
abstract:BACKGROUND:Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability. DESCRIPTION:Here we present...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-70
更新日期:2006-02-15 00:00:00
abstract:BACKGROUND:Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experiments is how to estimate the sample size required for good statistical power. What is the projected sample size or number of replicate chips need...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-84
更新日期:2006-02-22 00:00:00
abstract:BACKGROUND:Stable isotope tracing can follow individual atoms through metabolic transformations through the detection of the incorporation of stable isotope within metabolites. This resulting data can be interpreted in terms related to metabolic flux. However, detection of a stable isotope in metabolites by mass spectr...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3096-7
更新日期:2019-10-28 00:00:00
abstract:BACKGROUND:Protein sequence motifs are by definition short fragments of conserved amino acids, often associated with a specific function. Accordingly protein sequence profiles derived from multiple sequence alignments provide an alternative description of functional motifs characterizing families of related sequences. ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-164
更新日期:2005-06-29 00:00:00
abstract:BACKGROUND:In the adaptive immune system, variable regions of immunoglobulin (IG) are encoded by random recombination of variable (V), diversity (D), and joining (J) gene segments in the germline. Partitioning the functional antibody sequences to their sourcing germline gene segments is vital not only for understanding...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S12-S20
更新日期:2008-12-12 00:00:00
abstract:BACKGROUND:MHC/HLA class II molecules are important components of the immune system and play a critical role in processes such as phagocytosis. Understanding peptide recognition properties of the hundreds of MHC class II alleles is essential to appreciate determinants of antigenicity and ultimately to predict epitopes....
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S1-S55
更新日期:2010-01-18 00:00:00
abstract:BACKGROUND:The Acel_2062 protein from Acidothermus cellulolyticus is a protein of unknown function. Initial sequence analysis predicted that it was a metallopeptidase from the presence of a motif conserved amongst the Asp-zincins, which are peptidases that contain a single, catalytic zinc ion ligated by the histidines ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-1
更新日期:2014-01-03 00:00:00
abstract:BACKGROUND:Distance-based methods are popular for reconstructing evolutionary trees thanks to their speed and generality. A number of methods exist for estimating distances from sequence alignments, which often involves some sort of correction for multiple substitutions. The problem is to accurately estimate the number...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-108
更新日期:2005-04-27 00:00:00