Abstract:
BACKGROUND:Metabolomics is one of most recent omics technologies. It has been applied on fields such as food science, nutrition, drug discovery and systems biology. For this, gas chromatography-mass spectrometry (GC-MS) has been largely applied and many computational tools have been developed to support the analysis of metabolomics data. Among them, AMDIS is perhaps the most used tool for identifying and quantifying metabolites. However, AMDIS generates a high number of false-positives and does not have an interface amenable for high-throughput data analysis. Although additional computational tools have been developed for processing AMDIS results and to perform normalisations and statistical analysis of metabolomics data, there is not yet a single free software or package able to reliably identify and quantify metabolites analysed by GC-MS. RESULTS:Here we introduce a new algorithm, PScore, able to score peaks according to their likelihood of representing metabolites defined in a mass spectral library. We implemented PScore in a R package called MetaBox and evaluated the applicability and potential of MetaBox by comparing its performance against AMDIS results when analysing volatile organic compounds (VOC) from standard mixtures of metabolites and from female and male mice faecal samples. MetaBox reported lower percentages of false positives and false negatives, and was able to report a higher number of potential biomarkers associated to the metabolism of female and male mice. CONCLUSIONS:Identification and quantification of metabolites is among the most critical and time-consuming steps in GC-MS metabolome analysis. Here we present an algorithm implemented in a R package, which allows users to construct flexible pipelines and analyse metabolomics data in a high-throughput manner.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Aggio RB,Mayor A,Reade S,Probert CS,Ruggiero Kdoi
10.1186/s12859-014-0374-2subject
Has Abstractpub_date
2014-12-10 00:00:00pages
374issn
1471-2105pii
s12859-014-0374-2journal_volume
15pub_type
杂志文章abstract:BACKGROUND:We have previously described an approach to predicting the substrate specificity of serine-threonine protein kinases. The method, named Predikin, identifies key conserved substrate-determining residues in the kinase catalytic domain that contact the substrate in the region of the phosphorylation site and so ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-245
更新日期:2008-05-26 00:00:00
abstract:BACKGROUND:A number of software packages are available to generate DNA multiple sequence alignments (MSAs) evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-216
更新日期:2012-08-28 00:00:00
abstract:BACKGROUND:Many readers will sympathize with the following story. You are viewing a gene sequence in Entrez, and you want to find whether it contains a particular sequence motif. You reach for the browser's "find in page" button, but those darn spaces every 10 bp get in the way. And what if the motif is on the opposite...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-4-1
更新日期:2003-01-04 00:00:00
abstract:BACKGROUND:Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting). Therefore the question ari...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-131
更新日期:2010-03-15 00:00:00
abstract:BACKGROUND:Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. RESULTS:We present an R/Bioconductor port of a fast novel algorithm for...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-242
更新日期:2009-08-06 00:00:00
abstract:BACKGROUND:Protein-protein interactions (PPIs) are of great importance in cellular systems of organisms, since they are the basis of cellular structure and function and many essential cellular processes are related to that. Most proteins perform their functions by interacting with other proteins, so predicting PPIs acc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03896-6
更新日期:2020-12-16 00:00:00
abstract:BACKGROUND:The knowledge base-driven pathway analysis is becoming the first choice for many investigators, in that it not only can reduce the complexity of functional analysis by grouping thousands of genes into just several hundred pathways, but also can increase the explanatory power for the experiment by identifying...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1285-1
更新日期:2016-10-06 00:00:00
abstract:BACKGROUND:Protein function is closely intertwined with protein structure. Discovery of meaningful structure-function relationships is of utmost importance in protein biochemistry and has led to creation of high-quality, manually curated classification databases, such as the gold-standard SCOP (Structural Classificatio...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S2-S1
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:Phylogenetic trees are an important tool to study the evolutionary relationships among organisms. The huge amount of available taxa poses difficulties in their interactive visualization. This hampers the interaction with the users to provide feedback for the further improvement of the taxonomic framework. R...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1841-3
更新日期:2017-09-30 00:00:00
abstract:BACKGROUND:The Immunoglobulins (IG) and the T cell receptors (TR) play the key role in antigen recognition during the adaptive immune response. Recent progress in next-generation sequencing technologies has provided an opportunity for the deep T cell receptor repertoire profiling. However, a specialised software is req...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0613-1
更新日期:2015-05-28 00:00:00
abstract:BACKGROUND:Improvements in protein sequence annotation and an increase in the number of annotated protein databases has fueled development of an increasing number of software tools to predict secreted proteins. Six software programs capable of high throughput and employing a wide range of prediction methods, SignalP 3....
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-256
更新日期:2005-10-14 00:00:00
abstract:BACKGROUND:Despite marked recent improvements in long-read sequencing technology, the assembly of diploid genomes remains a difficult task. A major obstacle is distinguishing between alternative contigs that represent highly heterozygous regions. If primary and secondary contigs are not properly identified, the primary...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03939-y
更新日期:2021-01-06 00:00:00
abstract:BACKGROUND:Proteins are dynamic molecules with motions ranging from picoseconds to longer than seconds. Many protein functions, however, appear to occur on the micro to millisecond timescale and therefore there has been intense research of the importance of these motions in catalysis and molecular interactions. Nuclear...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-421
更新日期:2011-10-27 00:00:00
abstract:BACKGROUND:Visualization tools for deep learning models typically focus on discovering key input features without considering how such low level features are combined in intermediate layers to make decisions. Moreover, many of these methods examine a network's response to specific input examples that may be insufficien...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2957-4
更新日期:2019-07-19 00:00:00
abstract:BACKGROUND:Logic Learning Machine (LLM) is an innovative method of supervised analysis capable of constructing models based on simple and intelligible rules. In this investigation the performance of LLM in classifying patients with cancer was evaluated using a set of eight publicly available gene expression databases f...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2953-8
更新日期:2019-11-22 00:00:00
abstract:BACKGROUND:New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-100
更新日期:2010-02-22 00:00:00
abstract:BACKGROUND:Understanding the relationship between the protein sequence and the 3D structure is a major research area in bioinformatics. The prediction of complete protein tertiary structure based only on sequence information is still an impractical work. This paper aims at revealing the hidden knowledge of the sequence...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S11-S15
更新日期:2009-10-08 00:00:00
abstract:BACKGROUND:Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now poss...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0377-z
更新日期:2014-11-19 00:00:00
abstract:BACKGROUND:A recently proposed method for estimating qPCR amplification efficiency E analyzes fluorescence intensity ratios from pairs of points deemed to lie in the exponential growth region on the amplification curves for all reactions in a dilution series. This method suffers from a serious problem: The resulting ra...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03604-4
更新日期:2020-07-08 00:00:00
abstract:BACKGROUND:One of the most crucial steps in high-throughput sequence-based microbiome studies is the taxonomic assignment of sequences belonging to operational taxonomic units (OTUs). Without taxonomic classification, functional and biological information of microbial communities cannot be inferred or interpreted. The ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1952-x
更新日期:2017-12-06 00:00:00
abstract:BACKGROUND:Gene expression experiments are common in molecular biology, for example in order to identify genes which play a certain role in a specified biological framework. For that purpose expression levels of several thousand genes are measured simultaneously using DNA microarrays. Comparing two distinct groups of t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-288
更新日期:2011-07-15 00:00:00
abstract:BACKGROUND:Transposable elements (TEs) are DNA sequences able to mobilize themselves and to increase their copy-number in the host genome. In the past, they have been considered mainly selfish DNA without evident functions. Nevertheless, currently they are believed to have been extensively involved in the evolution of ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3113-x
更新日期:2019-11-22 00:00:00
abstract:BACKGROUND:For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significant...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-471
更新日期:2007-12-03 00:00:00
abstract:BACKGROUND:Microarray experiments, as well as other genomic analyses, often result in large gene sets containing up to several hundred genes. The biological significance of such sets of genes is, usually, not readily apparent. Identification of the functions of the genes in the set can help highlight features of intere...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-189
更新日期:2005-07-25 00:00:00
abstract:BACKGROUND:Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dict...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-74
更新日期:2008-01-31 00:00:00
abstract:BACKGROUND:High-throughput experiments, such as with DNA microarrays, typically result in hundreds of genes potentially relevant to the process under study, rendering the interpretation of these experiments problematic. Here, we propose and evaluate an approach to find functional associations between large numbers of g...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-14
更新日期:2007-01-18 00:00:00
abstract:BACKGROUND:Patient records contain valuable information regarding explanation of diagnosis, progression of disease, prescription and/or effectiveness of treatment, and more. Automatic recognition of clinically important concepts and the identification of relationships between those concepts in patient records are preli...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S3-S3
更新日期:2011-06-09 00:00:00
abstract:BACKGROUND:Current development of sequencing technologies is towards generating longer and noisier reads. Evidently, accurate alignment of these reads play an important role in any downstream analysis. Similarly, reducing the overall cost of sequencing is related to the time consumption of the aligner. The tradeoff bet...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1518-y
更新日期:2017-02-23 00:00:00
abstract:BACKGROUND:Since their introduction in 2009, the BioNLP Shared Task events have been instrumental in advancing the development of methods and resources for the automatic extraction of information from the biomedical literature. In this paper, we present the Cancer Genetics (CG) and Pathway Curation (PC) tasks, two even...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-16-S10-S2
更新日期:2015-01-01 00:00:00
abstract:BACKGROUND:Elucidating gene regulatory networks is crucial for understanding normal cell physiology and complex pathologic phenotypes. Existing computational methods for the genome-wide "reverse engineering" of such networks have been successful only for lower eukaryotes with simple genomes. Here we present ARACNE, a n...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-S1-S7
更新日期:2006-03-20 00:00:00