ProbPS: a new model for peak selection based on quantifying the dependence of the existence of derivative peaks on primary ion intensity.

Abstract:

BACKGROUND:The analysis of mass spectra suggests that the existence of derivative peaks is strongly dependent on the intensity of the primary peaks. Peak selection from tandem mass spectrum is used to filter out noise and contaminant peaks. It is widely accepted that a valid primary peak tends to have high intensity and is accompanied by derivative peaks, including isotopic peaks, neutral loss peaks, and complementary peaks. Existing models for peak selection ignore the dependence between the existence of the derivative peaks and the intensity of the primary peaks. Simple models for peak selection assume that these two attributes are independent; however, this assumption is contrary to real data and prone to error. RESULTS:In this paper, we present a statistical model to quantitatively measure the dependence of the derivative peak's existence on the primary peak's intensity. Here, we propose a statistical model, named ProbPS, to capture the dependence in a quantitative manner and describe a statistical model for peak selection. Our results show that the quantitative understanding can successfully guide the peak selection process. By comparing ProbPS with AuDeNS we demonstrate the advantages of our method in both filtering out noise peaks and in improving de novo identification. In addition, we present a tag identification approach based on our peak selection method. Our results, using a test data set, suggest that our tag identification method (876 correct tags in 1000 spectra) outperforms PepNovoTag (790 correct tags in 1000 spectra). CONCLUSIONS:We have shown that ProbPS improves the accuracy of peak selection which further enhances the performance of de novo sequencing and tag identification. Thus, our model saves valuable computation time and improving the accuracy of the results.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Zhang S,Wang Y,Bu D,Zhang H,Sun S

doi

10.1186/1471-2105-12-346

subject

Has Abstract

pub_date

2011-08-17 00:00:00

pages

346

issn

1471-2105

pii

1471-2105-12-346

journal_volume

12

pub_type

杂志文章
  • Toward an interactive article: integrating journals and biological databases.

    abstract:BACKGROUND:Journal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the rise over the last five...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-175

    authors: Rangarajan A,Schedl T,Yook K,Chan J,Haenel S,Otis L,Faelten S,DePellegrin-Connelly T,Isaacson R,Skrzypek MS,Marygold SJ,Stefancsik R,Cherry JM,Sternberg PW,Müller HM

    更新日期:2011-05-19 00:00:00

  • ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark.

    abstract:BACKGROUND:The advance of next generation sequencing enables higher throughput with lower price, and as the basic of high-throughput sequencing data analysis, variant calling is widely used in disease research, clinical treatment and medicine research. However, current mainstream variant caller tools have a serious pro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2665-0

    authors: Xiao A,Wu Z,Dong S

    更新日期:2019-02-14 00:00:00

  • A multifaceted analysis of HIV-1 protease multidrug resistance phenotypes.

    abstract:BACKGROUND:Great strides have been made in the effective treatment of HIV-1 with the development of second-generation protease inhibitors (PIs) that are effective against historically multi-PI-resistant HIV-1 variants. Nevertheless, mutation patterns that confer decreasing susceptibility to available PIs continue to ar...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-477

    authors: Doherty KM,Nakka P,King BM,Rhee SY,Holmes SP,Shafer RW,Radhakrishnan ML

    更新日期:2011-12-15 00:00:00

  • fastJT: An R package for robust and efficient feature selection for machine learning and genome-wide association studies.

    abstract:BACKGROUND:Parametric feature selection methods for machine learning and association studies based on genetic data are not robust with respect to outliers or influential observations. While rank-based, distribution-free statistics offer a robust alternative to parametric methods, their practical utility can be limited,...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2869-3

    authors: Lin J,Sibley A,Shterev I,Nixon A,Innocenti F,Chan C,Owzar K

    更新日期:2019-06-13 00:00:00

  • Quantiprot - a Python package for quantitative analysis of protein sequences.

    abstract:BACKGROUND:The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where seq...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1751-4

    authors: Konopka BM,Marciniak M,Dyrka W

    更新日期:2017-07-17 00:00:00

  • The COG database: an updated version includes eukaryotes.

    abstract:BACKGROUND:The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appea...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-4-41

    authors: Tatusov RL,Fedorova ND,Jackson JD,Jacobs AR,Kiryutin B,Koonin EV,Krylov DM,Mazumder R,Mekhedov SL,Nikolskaya AN,Rao BS,Smirnov S,Sverdlov AV,Vasudevan S,Wolf YI,Yin JJ,Natale DA

    更新日期:2003-09-11 00:00:00

  • Cell subset prediction for blood genomic studies.

    abstract:BACKGROUND:Genome-wide transcriptional profiling of patient blood samples offers a powerful tool to investigate underlying disease mechanisms and personalized treatment decisions. Most studies are based on analysis of total peripheral blood mononuclear cells (PBMCs), a mixed population. In this case, accuracy is inhere...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-258

    authors: Bolen CR,Uduman M,Kleinstein SH

    更新日期:2011-06-24 00:00:00

  • COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project.

    abstract:BACKGROUND:With the ever increasing use of computational models in the biosciences, the need to share models and reproduce the results of published studies efficiently and easily is becoming more important. To this end, various standards have been proposed that can be used to describe models, simulations, data or other...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0369-z

    authors: Bergmann FT,Adams R,Moodie S,Cooper J,Glont M,Golebiewski M,Hucka M,Laibe C,Miller AK,Nickerson DP,Olivier BG,Rodriguez N,Sauro HM,Scharm M,Soiland-Reyes S,Waltemath D,Yvon F,Le Novère N

    更新日期:2014-12-14 00:00:00

  • Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method.

    abstract:BACKGROUND:Many processes in molecular biology involve the recognition of short sequences of nucleic-or amino acids, such as the binding of immunogenic peptides to major histocompatibility complex (MHC) molecules. From experimental data, a model of the sequence specificity of these processes can be constructed, such as...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-132

    authors: Peters B,Sette A

    更新日期:2005-05-31 00:00:00

  • Finite mixture clustering of human tissues with different levels of IGF-1 splice variants mRNA transcripts.

    abstract:BACKGROUND:This study addresses a recurrent biological problem, that is to define a formal clustering structure for a set of tissues on the basis of the relative abundance of multiple alternatively spliced isoforms mRNAs generated by the same gene. To this aim, we have used a model-based clustering approach, based on a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0689-7

    authors: Pelosi M,Alfò M,Martella F,Pappalardo E,Musarò A

    更新日期:2015-09-15 00:00:00

  • Repliscan: a tool for classifying replication timing regions.

    abstract:BACKGROUND:Replication timing experiments that use label incorporation and high throughput sequencing produce peaked data similar to ChIP-Seq experiments. However, the differences in experimental design, coverage density, and possible results make traditional ChIP-Seq analysis methods inappropriate for use with replica...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1774-x

    authors: Zynda GJ,Song J,Concia L,Wear EE,Hanley-Bowdoin L,Thompson WF,Vaughn MW

    更新日期:2017-08-07 00:00:00

  • Combining techniques for screening and evaluating interaction terms on high-dimensional time-to-event data.

    abstract:BACKGROUND:Molecular data, e.g. arising from microarray technology, is often used for predicting survival probabilities of patients. For multivariate risk prediction models on such high-dimensional data, there are established techniques that combine parameter estimation and variable selection. One big challenge is to i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-58

    authors: Sariyar M,Hoffmann I,Binder H

    更新日期:2014-02-26 00:00:00

  • Network hub-node prioritization of gene regulation with intra-network association.

    abstract:BACKGROUND:To identify and prioritize the influential hub genes in a gene-set or biological pathway, most analyses rely on calculation of marginal effects or tests of statistical significance. These procedures may be inappropriate since hub nodes are common connection points and therefore may interact with other nodes ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3444-7

    authors: Chang HC,Chu CP,Lin SJ,Hsiao CK

    更新日期:2020-03-12 00:00:00

  • SPIDer: Saccharomyces protein-protein interaction database.

    abstract:BACKGROUND:Since proteins perform their functions by interacting with one another and with other biomolecules, reconstructing a map of the protein-protein interactions of a cell, experimentally or computationally, is an important first step toward understanding cellular function and machinery of a proteome. Solely deri...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-S5-S16

    authors: Wu X,Zhu L,Guo J,Fu C,Zhou H,Dong D,Li Z,Zhang DY,Lin K

    更新日期:2006-12-18 00:00:00

  • imputeqc: an R package for assessing imputation quality of genotypes and optimizing imputation parameters.

    abstract:BACKGROUND:The imputation of genotypes increases the power of genome-wide association studies. However, the imputation quality should be assessed in each particular case. Nevertheless, not all imputation softwares control the error of output, e.g., the last release of fastPHASE program (1.4.8) lacks such an option. In ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03589-0

    authors: Khvorykh GV,Khrunin AV

    更新日期:2020-07-24 00:00:00

  • Inferring gene expression dynamics via functional regression analysis.

    abstract:BACKGROUND:Temporal gene expression profiles characterize the time-dynamics of expression of specific genes and are increasingly collected in current gene expression experiments. In the analysis of experiments where gene expression is obtained over the life cycle, it is of interest to relate temporal patterns of gene e...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-60

    authors: Müller HG,Chiou JM,Leng X

    更新日期:2008-01-28 00:00:00

  • Homology modeling, molecular docking, and molecular dynamics simulations elucidated α-fetoprotein binding modes.

    abstract:BACKGROUND:An important mechanism of endocrine activity is chemicals entering target cells via transport proteins and then interacting with hormone receptors such as the estrogen receptor (ER). α-Fetoprotein (AFP) is a major transport protein in rodent serum that can bind and sequester estrogens, thus preventing entry ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S14-S6

    authors: Shen J,Zhang W,Fang H,Perkins R,Tong W,Hong H

    更新日期:2013-01-01 00:00:00

  • Protein-DNA docking with a coarse-grained force field.

    abstract:BACKGROUND:Protein-DNA interactions are important for many cellular processes, however structural knowledge for a large fraction of known and putative complexes is still lacking. Computational docking methods aim at the prediction of complex architecture given detailed structures of its constituents. They are becoming ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-228

    authors: Setny P,Bahadur RP,Zacharias M

    更新日期:2012-09-11 00:00:00

  • Rearrangement analysis of multiple bacterial genomes.

    abstract:BACKGROUND:Genomes are subjected to rearrangements that change the orientation and ordering of genes during evolution. The most common rearrangements that occur in uni-chromosomal genomes are inversions (or reversals) to adapt to the changing environment. Since genome rearrangements are rarer than point mutations, gene...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3293-4

    authors: Noureen M,Tada I,Kawashima T,Arita M

    更新日期:2019-12-27 00:00:00

  • CoryneRegNet 4.0 - A reference database for corynebacterial gene regulatory networks.

    abstract:BACKGROUND:Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression) and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-429

    authors: Baumbach J

    更新日期:2007-11-06 00:00:00

  • Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.

    abstract:BACKGROUND:Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing inf...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0418-7

    authors: Bansal V,Libiger O

    更新日期:2015-01-16 00:00:00

  • Deconvolution of gene expression from cell populations across the C. elegans lineage.

    abstract:BACKGROUND:Knowledge of when and in which cells each gene is expressed across multicellular organisms is critical in understanding both gene function and regulation of cell type diversity. However, methods for measuring expression typically involve a trade-off between imaging-based methods, which give the precise locat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-204

    authors: Burdick JT,Murray JI

    更新日期:2013-06-22 00:00:00

  • A comprehensive comparison of comparative RNA structure prediction approaches.

    abstract:BACKGROUND:An increasing number of researchers have released novel RNA structure analysis and prediction algorithms for comparative approaches to structure prediction. Yet, independent benchmarking of these algorithms is rarely performed as is now common practice for protein-folding, gene-finding and multiple-sequence-...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-140

    authors: Gardner PP,Giegerich R

    更新日期:2004-09-30 00:00:00

  • Use of physiological constraints to identify quantitative design principles for gene expression in yeast adaptation to heat shock.

    abstract:BACKGROUND:Understanding the relationship between gene expression changes, enzyme activity shifts, and the corresponding physiological adaptive response of organisms to environmental cues is crucial in explaining how cells cope with stress. For example, adaptation of yeast to heat shock involves a characteristic profil...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-184

    authors: Vilaprinyo E,Alves R,Sorribas A

    更新日期:2006-04-03 00:00:00

  • SCOPA and META-SCOPA: software for the analysis and aggregation of genome-wide association studies of multiple correlated phenotypes.

    abstract:BACKGROUND:Genome-wide association studies (GWAS) of single nucleotide polymorphisms (SNPs) have been successful in identifying loci contributing genetic effects to a wide range of complex human diseases and quantitative traits. The traditional approach to GWAS analysis is to consider each phenotype separately, despite...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1437-3

    authors: Mägi R,Suleimanov YV,Clarke GM,Kaakinen M,Fischer K,Prokopenko I,Morris AP

    更新日期:2017-01-11 00:00:00

  • Improving ontologies by automatic reasoning and evaluation of logical definitions.

    abstract:BACKGROUND:Ontologies are widely used to represent knowledge in biomedicine. Systematic approaches for detecting errors and disagreements are needed for large ontologies with hundreds or thousands of terms and semantic relationships. A recent approach of defining terms using logical definitions is now increasingly bein...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-418

    authors: Köhler S,Bauer S,Mungall CJ,Carletti G,Smith CL,Schofield P,Gkoutos GV,Robinson PN

    更新日期:2011-10-27 00:00:00

  • HAT: hypergeometric analysis of tiling-arrays with application to promoter-GeneChip data.

    abstract:BACKGROUND:Tiling-arrays are applicable to multiple types of biological research questions. Due to its advantages (high sensitivity, resolution, unbiased), the technology is often employed in genome-wide investigations. A major challenge in the analysis of tiling-array data is to define regions-of-interest, i.e., conti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-275

    authors: Taskesen E,Beekman R,de Ridder J,Wouters BJ,Peeters JK,Touw IP,Reinders MJ,Delwel R

    更新日期:2010-05-21 00:00:00

  • 'Unite and conquer': enhanced prediction of protein subcellular localization by integrating multiple specialized tools.

    abstract:BACKGROUND:Knowing the subcellular location of proteins provides clues to their function as well as the interconnectivity of biological processes. Dozens of tools are available for predicting protein location in the eukaryotic cell. Each tool performs well on certain data sets, but their predictions often disagree for ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-420

    authors: Shen YQ,Burger G

    更新日期:2007-10-29 00:00:00

  • VITCOMIC: visualization tool for taxonomic compositions of microbial communities based on 16S rRNA gene sequences.

    abstract:BACKGROUND:Understanding the community structure of microbes is typically accomplished by sequencing 16S ribosomal RNA (16S rRNA) genes. These community data can be represented by constructing a phylogenetic tree and comparing it with other samples using statistical methods. However, owing to high computational complex...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-332

    authors: Mori H,Maruyama F,Kurokawa K

    更新日期:2010-06-18 00:00:00

  • Graph based fusion of miRNA and mRNA expression data improves clinical outcome prediction in prostate cancer.

    abstract:BACKGROUND:One of the main goals in cancer studies including high-throughput microRNA (miRNA) and mRNA data is to find and assess prognostic signatures capable of predicting clinical outcome. Both mRNA and miRNA expression changes in cancer diseases are described to reflect clinical characteristics like staging and pro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-488

    authors: Gade S,Porzelius C,Fälth M,Brase JC,Wuttig D,Kuner R,Binder H,Sültmann H,Beissbarth T

    更新日期:2011-12-21 00:00:00