Identifying and quantifying metabolites by scoring peaks of GC-MS data.

Abstract:

BACKGROUND:Metabolomics is one of most recent omics technologies. It has been applied on fields such as food science, nutrition, drug discovery and systems biology. For this, gas chromatography-mass spectrometry (GC-MS) has been largely applied and many computational tools have been developed to support the analysis of metabolomics data. Among them, AMDIS is perhaps the most used tool for identifying and quantifying metabolites. However, AMDIS generates a high number of false-positives and does not have an interface amenable for high-throughput data analysis. Although additional computational tools have been developed for processing AMDIS results and to perform normalisations and statistical analysis of metabolomics data, there is not yet a single free software or package able to reliably identify and quantify metabolites analysed by GC-MS. RESULTS:Here we introduce a new algorithm, PScore, able to score peaks according to their likelihood of representing metabolites defined in a mass spectral library. We implemented PScore in a R package called MetaBox and evaluated the applicability and potential of MetaBox by comparing its performance against AMDIS results when analysing volatile organic compounds (VOC) from standard mixtures of metabolites and from female and male mice faecal samples. MetaBox reported lower percentages of false positives and false negatives, and was able to report a higher number of potential biomarkers associated to the metabolism of female and male mice. CONCLUSIONS:Identification and quantification of metabolites is among the most critical and time-consuming steps in GC-MS metabolome analysis. Here we present an algorithm implemented in a R package, which allows users to construct flexible pipelines and analyse metabolomics data in a high-throughput manner.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Aggio RB,Mayor A,Reade S,Probert CS,Ruggiero K

doi

10.1186/s12859-014-0374-2

subject

Has Abstract

pub_date

2014-12-10 00:00:00

pages

374

issn

1471-2105

pii

s12859-014-0374-2

journal_volume

15

pub_type

杂志文章
  • Predikin and PredikinDB: a computational framework for the prediction of protein kinase peptide specificity and an associated database of phosphorylation sites.

    abstract:BACKGROUND:We have previously described an approach to predicting the substrate specificity of serine-threonine protein kinases. The method, named Predikin, identifies key conserved substrate-determining residues in the kinase catalytic domain that contact the substrate in the region of the phosphorylation site and so ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-245

    authors: Saunders NF,Brinkworth RI,Huber T,Kemp BE,Kobe B

    更新日期:2008-05-26 00:00:00

  • GenNon-h: generating multiple sequence alignments on nonhomogeneous phylogenetic trees.

    abstract:BACKGROUND:A number of software packages are available to generate DNA multiple sequence alignments (MSAs) evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-216

    authors: Kedzierska AM,Casanellas M

    更新日期:2012-08-28 00:00:00

  • SeqVISTA: a graphical tool for sequence feature visualization and comparison.

    abstract:BACKGROUND:Many readers will sympathize with the following story. You are viewing a gene sequence in Entrez, and you want to find whether it contains a particular sequence motif. You reach for the browser's "find in page" button, but those darn spaces every 10 bp get in the way. And what if the motif is on the opposite...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-4-1

    authors: Hu Z,Frith M,Niu T,Weng Z

    更新日期:2003-01-04 00:00:00

  • Missing genes in the annotation of prokaryotic genomes.

    abstract:BACKGROUND:Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting). Therefore the question ari...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-131

    authors: Warren AS,Archuleta J,Feng WC,Setubal JC

    更新日期:2010-03-15 00:00:00

  • R/BHC: fast Bayesian hierarchical clustering for microarray data.

    abstract:BACKGROUND:Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. RESULTS:We present an R/Bioconductor port of a fast novel algorithm for...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-242

    authors: Savage RS,Heller K,Xu Y,Ghahramani Z,Truman WM,Grant M,Denby KJ,Wild DL

    更新日期:2009-08-06 00:00:00

  • Combining sequence and network information to enhance protein-protein interaction prediction.

    abstract:BACKGROUND:Protein-protein interactions (PPIs) are of great importance in cellular systems of organisms, since they are the basis of cellular structure and function and many essential cellular processes are related to that. Most proteins perform their functions by interacting with other proteins, so predicting PPIs acc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03896-6

    authors: Liu L,Zhu X,Ma Y,Piao H,Yang Y,Hao X,Fu Y,Wang L,Peng J

    更新日期:2020-12-16 00:00:00

  • A decision analysis model for KEGG pathway analysis.

    abstract:BACKGROUND:The knowledge base-driven pathway analysis is becoming the first choice for many investigators, in that it not only can reduce the complexity of functional analysis by grouping thousands of genes into just several hundred pathways, but also can increase the explanatory power for the experiment by identifying...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1285-1

    authors: Du J,Li M,Yuan Z,Guo M,Song J,Xie X,Chen Y

    更新日期:2016-10-06 00:00:00

  • Automatic classification of protein structures using low-dimensional structure space mappings.

    abstract:BACKGROUND:Protein function is closely intertwined with protein structure. Discovery of meaningful structure-function relationships is of utmost importance in protein biochemistry and has led to creation of high-quality, manually curated classification databases, such as the gold-standard SCOP (Structural Classificatio...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S2-S1

    authors: Asarnow D,Singh R

    更新日期:2014-01-01 00:00:00

  • SILVA tree viewer: interactive web browsing of the SILVA phylogenetic guide trees.

    abstract:BACKGROUND:Phylogenetic trees are an important tool to study the evolutionary relationships among organisms. The huge amount of available taxa poses difficulties in their interactive visualization. This hampers the interaction with the users to provide feedback for the further improvement of the taxonomic framework. R...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1841-3

    authors: Beccati A,Gerken J,Quast C,Yilmaz P,Glöckner FO

    更新日期:2017-09-30 00:00:00

  • tcR: an R package for T cell receptor repertoire advanced data analysis.

    abstract:BACKGROUND:The Immunoglobulins (IG) and the T cell receptors (TR) play the key role in antigen recognition during the adaptive immune response. Recent progress in next-generation sequencing technologies has provided an opportunity for the deep T cell receptor repertoire profiling. However, a specialised software is req...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0613-1

    authors: Nazarov VI,Pogorelyy MV,Komech EA,Zvyagin IV,Bolotin DA,Shugay M,Chudakov DM,Lebedev YB,Mamedov IZ

    更新日期:2015-05-28 00:00:00

  • Evaluating eukaryotic secreted protein prediction.

    abstract:BACKGROUND:Improvements in protein sequence annotation and an increase in the number of annotated protein databases has fueled development of an increasing number of software tools to predict secreted proteins. Six software programs capable of high throughput and employing a wide range of prediction methods, SignalP 3....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-256

    authors: Klee EW,Ellis LB

    更新日期:2005-10-14 00:00:00

  • HapSolo: an optimization approach for removing secondary haplotigs during diploid genome assembly and scaffolding.

    abstract:BACKGROUND:Despite marked recent improvements in long-read sequencing technology, the assembly of diploid genomes remains a difficult task. A major obstacle is distinguishing between alternative contigs that represent highly heterozygous regions. If primary and secondary contigs are not properly identified, the primary...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03939-y

    authors: Solares EA,Tao Y,Long AD,Gaut BS

    更新日期:2021-01-06 00:00:00

  • Automated NMR relaxation dispersion data analysis using NESSY.

    abstract:BACKGROUND:Proteins are dynamic molecules with motions ranging from picoseconds to longer than seconds. Many protein functions, however, appear to occur on the micro to millisecond timescale and therefore there has been intense research of the importance of these motions in catalysis and molecular interactions. Nuclear...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-421

    authors: Bieri M,Gooley PR

    更新日期:2011-10-27 00:00:00

  • Visualizing complex feature interactions and feature sharing in genomic deep neural networks.

    abstract:BACKGROUND:Visualization tools for deep learning models typically focus on discovering key input features without considering how such low level features are combined in intermediate layers to make decisions. Moreover, many of these methods examine a network's response to specific input examples that may be insufficien...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2957-4

    authors: Liu G,Zeng H,Gifford DK

    更新日期:2019-07-19 00:00:00

  • Analyzing gene expression data for pediatric and adult cancer diagnosis using logic learning machine and standard supervised methods.

    abstract:BACKGROUND:Logic Learning Machine (LLM) is an innovative method of supervised analysis capable of constructing models based on simple and intelligible rules. In this investigation the performance of LLM in classifying patients with cancer was evaluated using a set of eight publicly available gene expression databases f...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2953-8

    authors: Verda D,Parodi S,Ferrari E,Muselli M

    更新日期:2019-11-22 00:00:00

  • DraGnET: software for storing, managing and analyzing annotated draft genome sequence data.

    abstract:BACKGROUND:New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-100

    authors: Duncan S,Sirkanungo R,Miller L,Phillips GJ

    更新日期:2010-02-22 00:00:00

  • Protein local 3D structure prediction by Super Granule Support Vector Machines (Super GSVM).

    abstract:BACKGROUND:Understanding the relationship between the protein sequence and the 3D structure is a major research area in bioinformatics. The prediction of complete protein tertiary structure based only on sequence information is still an impractical work. This paper aims at revealing the hidden knowledge of the sequence...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S11-S15

    authors: Chen B,Johnson M

    更新日期:2009-10-08 00:00:00

  • TE-Tracker: systematic identification of transposition events through whole-genome resequencing.

    abstract:BACKGROUND:Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now poss...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0377-z

    authors: Gilly A,Etcheverry M,Madoui MA,Guy J,Quadrana L,Alberti A,Martin A,Heitkam T,Engelen S,Labadie K,Le Pen J,Wincker P,Colot V,Aury JM

    更新日期:2014-11-19 00:00:00

  • Critique of the pairwise method for estimating qPCR amplification efficiency: beware of correlated data!

    abstract:BACKGROUND:A recently proposed method for estimating qPCR amplification efficiency E analyzes fluorescence intensity ratios from pairs of points deemed to lie in the exponential growth region on the amplification curves for all reactions in a dilution series. This method suffers from a serious problem: The resulting ra...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03604-4

    authors: Tellinghuisen J

    更新日期:2020-07-08 00:00:00

  • CONSTAX: a tool for improved taxonomic resolution of environmental fungal ITS sequences.

    abstract:BACKGROUND:One of the most crucial steps in high-throughput sequence-based microbiome studies is the taxonomic assignment of sequences belonging to operational taxonomic units (OTUs). Without taxonomic classification, functional and biological information of microbial communities cannot be inferred or interpreted. The ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1952-x

    authors: Gdanetz K,Benucci GMN,Vande Pol N,Bonito G

    更新日期:2017-12-06 00:00:00

  • Reporting FDR analogous confidence intervals for the log fold change of differentially expressed genes.

    abstract:BACKGROUND:Gene expression experiments are common in molecular biology, for example in order to identify genes which play a certain role in a specified biological framework. For that purpose expression levels of several thousand genes are measured simultaneously using DNA microarrays. Comparing two distinct groups of t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-288

    authors: Jung K,Friede T,Beissbarth T

    更新日期:2011-07-15 00:00:00

  • Impact of polymorphic transposable elements on transcription in lymphoblastoid cell lines from public data.

    abstract:BACKGROUND:Transposable elements (TEs) are DNA sequences able to mobilize themselves and to increase their copy-number in the host genome. In the past, they have been considered mainly selfish DNA without evident functions. Nevertheless, currently they are believed to have been extensively involved in the evolution of ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3113-x

    authors: Spirito G,Mangoni D,Sanges R,Gustincich S

    更新日期:2019-11-22 00:00:00

  • Predicting and improving the protein sequence alignment quality by support vector regression.

    abstract:BACKGROUND:For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significant...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-471

    authors: Lee M,Jeong CS,Kim D

    更新日期:2007-12-03 00:00:00

  • GObar: a gene ontology based analysis and visualization tool for gene sets.

    abstract:BACKGROUND:Microarray experiments, as well as other genomic analyses, often result in large gene sets containing up to several hundred genes. The biological significance of such sets of genes is, usually, not readily apparent. Identification of the functions of the genes in the set can help highlight features of intere...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-189

    authors: Lee JS,Katari G,Sachidanandam R

    更新日期:2005-07-25 00:00:00

  • Towards an automatic classification of protein structural domains based on structural similarity.

    abstract:BACKGROUND:Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dict...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-74

    authors: Sam V,Tai CH,Garnier J,Gibrat JF,Lee B,Munson PJ

    更新日期:2008-01-31 00:00:00

  • Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation.

    abstract:BACKGROUND:High-throughput experiments, such as with DNA microarrays, typically result in hundreds of genes potentially relevant to the process under study, rendering the interpretation of these experiments problematic. Here, we propose and evaluate an approach to find functional associations between large numbers of g...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-14

    authors: Jelier R,Jenster G,Dorssers LC,Wouters BJ,Hendriksen PJ,Mons B,Delwel R,Kors JA

    更新日期:2007-01-18 00:00:00

  • A context-blocks model for identifying clinical relationships in patient records.

    abstract:BACKGROUND:Patient records contain valuable information regarding explanation of diagnosis, progression of disease, prescription and/or effectiveness of treatment, and more. Automatic recognition of clinically important concepts and the identification of relationships between those concepts in patient records are preli...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S3-S3

    authors: Islamaj Doğan R,Névéol A,Lu Z

    更新日期:2011-06-09 00:00:00

  • Meta-aligner: long-read alignment based on genome statistics.

    abstract:BACKGROUND:Current development of sequencing technologies is towards generating longer and noisier reads. Evidently, accurate alignment of these reads play an important role in any downstream analysis. Similarly, reducing the overall cost of sequencing is related to the time consumption of the aligner. The tradeoff bet...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1518-y

    authors: Nashta-Ali D,Aliyari A,Ahmadian Moghadam A,Edrisi MA,Motahari SA,Hossein Khalaj B

    更新日期:2017-02-23 00:00:00

  • Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013.

    abstract:BACKGROUND:Since their introduction in 2009, the BioNLP Shared Task events have been instrumental in advancing the development of methods and resources for the automatic extraction of information from the biomedical literature. In this paper, we present the Cancer Genetics (CG) and Pathway Curation (PC) tasks, two even...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S10-S2

    authors: Pyysalo S,Ohta T,Rak R,Rowley A,Chun HW,Jung SJ,Choi SP,Tsujii J,Ananiadou S

    更新日期:2015-01-01 00:00:00

  • ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context.

    abstract:BACKGROUND:Elucidating gene regulatory networks is crucial for understanding normal cell physiology and complex pathologic phenotypes. Existing computational methods for the genome-wide "reverse engineering" of such networks have been successful only for lower eukaryotes with simple genomes. Here we present ARACNE, a n...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-S1-S7

    authors: Margolin AA,Nemenman I,Basso K,Wiggins C,Stolovitzky G,Dalla Favera R,Califano A

    更新日期:2006-03-20 00:00:00