Abstract:
:Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programming and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Li YF,Radivojac Pdoi
10.1186/1471-2105-13-S16-S4subject
Has Abstractpub_date
2012-01-01 00:00:00pages
S4issn
1471-2105pii
1471-2105-13-S16-S4journal_volume
13 Suppl 16pub_type
杂志文章,评审abstract:BACKGROUND:Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction h...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-330
更新日期:2007-09-08 00:00:00
abstract:BACKGROUND:The cost efficient two-stage design is often used in genome-wide association studies (GWASs) in searching for genetic loci underlying the susceptibility for complex diseases. Replication-based analysis, which considers data from each stage separately, often suffers from loss of efficiency. Joint test that co...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-9
更新日期:2011-01-07 00:00:00
abstract:BACKGROUND:Guide-trees are used as part of an essential heuristic to enable the calculation of multiple sequence alignments. They have been the focus of much method development but there has been little effort at determining systematically, which guide-trees, if any, give the best alignments. Some guide-tree constructi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-338
更新日期:2014-10-04 00:00:00
abstract:BACKGROUND:Several features are known to correlate with the GC-content in the human genome, including recombination rate, gene density and distance to telomere. However, by testing for pairwise correlation only, it is impossible to distinguish direct associations from indirect ones and to distinguish between causes and...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S1-S66
更新日期:2009-01-30 00:00:00
abstract:BACKGROUND:Genome scale data on protein interactions are generally represented as large networks, or graphs, where hundreds or thousands of proteins are linked to one another. Since proteins tend to function in groups, or complexes, an important goal has been to reliably identify protein complexes from these graphs. Th...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-99
更新日期:2009-03-30 00:00:00
abstract:BACKGROUND:Finishing is the process of improving the quality and utility of draft genome sequences generated by shotgun sequencing and computational assembly. Finishing can involve targeted sequencing. Finishing reads may be incorporated by manual or automated means. One automated method uses targeted addition by local...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-457
更新日期:2010-09-10 00:00:00
abstract:BACKGROUND:Phosphorylated histone H2AX, also known as γH2AX, forms μm-sized nuclear foci at the sites of DNA double-strand breaks (DSBs) induced by ionizing radiation and other agents. Due to their specificity and sensitivity, γH2AX immunoassays have become the gold standard for studying DSB induction and repair. One o...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3370-8
更新日期:2020-01-28 00:00:00
abstract:BACKGROUND:Clustering techniques are routinely used in gene expression data analysis to organize the massive data. Clustering techniques arrange a large number of genes or assays into a few clusters while maximizing the intra-cluster similarity and inter-cluster separation. While clustering of genes facilitates learnin...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-40
更新日期:2009-01-30 00:00:00
abstract:BACKGROUND:Reliable prediction of antibody, or B-cell, epitopes remains challenging yet highly desirable for the design of vaccines and immunodiagnostics. A correlation between antigenicity, solvent accessibility, and flexibility in proteins was demonstrated. Subsequently, Thornton and colleagues proposed a method for ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-514
更新日期:2008-12-02 00:00:00
abstract:BACKGROUND:The rapid pace of bioscience research makes it very challenging to track relevant articles in one's area of interest. MEDLINE, a primary source for biomedical literature, offers access to more than 20 million citations with three-quarters of a million new ones added each year. Thus it is not surprising to se...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0630-0
更新日期:2015-06-20 00:00:00
abstract:BACKGROUND:Microarray technologies produced large amount of data. The hierarchical clustering is commonly used to identify clusters of co-expressed genes. However, microarray datasets often contain missing values (MVs) representing a major drawback for the use of the clustering methods. Usually the MVs are not treated,...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-114
更新日期:2004-08-23 00:00:00
abstract:BACKGROUND:Inferring gene regulatory network (GRN) has been an important topic in Bioinformatics. Many computational methods infer the GRN from high-throughput expression data. Due to the presence of time delays in the regulatory relationships, High-Order Dynamic Bayesian Network (HO-DBN) is a good model of GRN. Howeve...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0823-6
更新日期:2015-11-25 00:00:00
abstract:BACKGROUND:Transcriptome sequencing is a powerful tool for measuring gene expression, but as well as some other technologies, various artifacts and biases affect the quantification. In order to correct some of them, several normalization approaches have emerged, differing both in the statistical strategy employed and i...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-188
更新日期:2014-06-14 00:00:00
abstract:BACKGROUND:Journal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the rise over the last five...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-175
更新日期:2011-05-19 00:00:00
abstract:BACKGROUND:A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sop...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1146-y
更新日期:2016-07-07 00:00:00
abstract:BACKGROUND:A cross-correlation (XCorr) score function is one of the most popular score functions utilized to search peptide identifications in databases, and many computer programs, such as SEQUEST, Comet, and Tide, currently use this score function. Recently, the HiXCorr algorithm was developed to speed up this score ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2559-6
更新日期:2018-12-12 00:00:00
abstract:BACKGROUND:Stable isotope tracing can follow individual atoms through metabolic transformations through the detection of the incorporation of stable isotope within metabolites. This resulting data can be interpreted in terms related to metabolic flux. However, detection of a stable isotope in metabolites by mass spectr...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3096-7
更新日期:2019-10-28 00:00:00
abstract:BACKGROUND:Quadruplexes are specific structure motifs occurring, e.g., in telomeres and transcriptional regulatory regions. Recent discoveries confirmed their importance in biomedicine and led to an intensified examination of their properties. So far, the study of these motifs has focused mainly on the sequence and the...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3385-1
更新日期:2020-01-31 00:00:00
abstract::Many different approaches have been developed to model and simulate gene regulatory networks. We proposed the following categories for gene regulatory network models: network parts lists, network topology models, network control logic models, and dynamic models. Here we will describe some examples for each of these ca...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-S6-S9
更新日期:2007-09-27 00:00:00
abstract:BACKGROUND:The program InDeVal was originally developed to help researchers find known regions of insertion/deletion activity (with the exception of isolated single-base indels) in newly determined Poaceae trnL-F sequences and compare them with 533 previously determined sequences. It is supplied with input files design...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-173
更新日期:2004-10-29 00:00:00
abstract:BACKGROUND:Complex networks are studied across many fields of science and are particularly important to understand biological processes. Motifs in networks are small connected sub-graphs that occur significantly in higher frequencies than in random networks. They have recently gathered much attention as a useful concep...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-318
更新日期:2009-10-04 00:00:00
abstract:BACKGROUND:Different human responses to the same vaccine were frequently observed. For example, independent studies identified overlapping but different transcriptomic gene expression profiles in Yellow Fever vaccine 17D (YF-17D) immunized human subjects. Different experimental and analysis conditions were likely contr...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3194-6
更新日期:2019-12-23 00:00:00
abstract:BACKGROUND:Understanding the relationship between gene expression changes, enzyme activity shifts, and the corresponding physiological adaptive response of organisms to environmental cues is crucial in explaining how cells cope with stress. For example, adaptation of yeast to heat shock involves a characteristic profil...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-184
更新日期:2006-04-03 00:00:00
abstract:BACKGROUND:As whole-genome sequencing is becoming a routine technique, it is important to identify a cost-effective depth of sequencing for such studies. However, the relationship between sequencing depth and biological results from the aspects of whole-genome coverage, variant discovery power and the quality of varian...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3164-z
更新日期:2019-11-08 00:00:00
abstract:BACKGROUND:As numerous diseases involve errors in signal transduction, modern therapeutics often target proteins involved in cellular signaling. Interpretation of the activity of signaling pathways during disease development or therapeutic intervention would assist in drug development, design of therapy, and target ide...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-99
更新日期:2006-02-28 00:00:00
abstract:BACKGROUND:Genes encoding transcription factors that constitute gene-regulatory networks and maternal factors accumulating in egg cytoplasm are two classes of essential genes that play crucial roles in developmental processes. Transcription factors control the expression of their downstream target genes by interacting ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0552-x
更新日期:2015-04-10 00:00:00
abstract:BACKGROUND:Most single stranded RNA (ssRNA) viruses mutate rapidly to generate large number of strains having highly divergent capsid sequences. Accurate strain recognition in uncharacterized target capsid sequences is essential for epidemiology, diagnostics, and vaccine development. Strain recognition based on similar...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-379
更新日期:2007-10-10 00:00:00
abstract:BACKGROUND:Bacterial pan-genomes, comprised of conserved and variable genes across multiple sequenced bacterial genomes, allow for identification of genomic regions that are phylogenetically discriminating or functionally important. Pan-genomes consist of large amounts of data, which can restrict researchers ability to...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2250-y
更新日期:2018-06-27 00:00:00
abstract:BACKGROUND:In the last decade, techniques were established for the large scale genome-wide analysis of proteins, RNA, and metabolites, and database solutions have been developed to manage the generated data sets. The Golm Metabolome Database for metabolite data (GMD) represents one such effort to make these data broadl...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-216
更新日期:2007-06-23 00:00:00
abstract:BACKGROUND:Analyzing the amino acid sequence of an intrinsically disordered protein (IDP) in an evolutionary context can yield novel insights on the functional role of disordered regions and sequence element(s). However, in the case of many IDPs, the lack of evolutionary conservation of the primary sequence can hamper ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0592-2
更新日期:2015-05-13 00:00:00