Performance of a genetic algorithm for mass spectrometry proteomics.

Abstract:

BACKGROUND:Recently, mass spectrometry data have been mined using a genetic algorithm to produce discriminatory models that distinguish healthy individuals from those with cancer. This algorithm is the basis for claims of 100% sensitivity and specificity in two related publicly available datasets. To date, no detailed attempts have been made to explore the properties of this genetic algorithm within proteomic applications. Here the algorithm's performance on these datasets is evaluated relative to other methods. RESULTS:In reproducing the method, some modifications of the algorithm as it is described are necessary to get good performance. After modification, a cross-validation approach to model selection is used. The overall classification accuracy is comparable though not superior to other approaches considered. Also, some aspects of the process rely upon random sampling and thus for a fixed dataset the algorithm can produce many different models. This raises questions about how to choose among competing models. How this choice is made is important for interpreting sensitivity and specificity results as merely choosing the model with lowest test set error rate leads to overestimates of model performance. CONCLUSIONS:The algorithm needs to be modified to reduce variability and care must be taken in how to choose among competing models. Results derived from this algorithm must be accompanied by a full description of model selection procedures to give confidence that the reported accuracy is not overstated.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Jeffries NO

doi

10.1186/1471-2105-5-180

keywords:

subject

Has Abstract

pub_date

2004-11-19 00:00:00

pages

180

issn

1471-2105

pii

1471-2105-5-180

journal_volume

5

pub_type

杂志文章
  • EVA: Exome Variation Analyzer, an efficient and versatile tool for filtering strategies in medical genomics.

    abstract:BACKGROUND:Whole exome sequencing (WES) has become the strategy of choice to identify a coding allelic variant for a rare human monogenic disorder. This approach is a revolution in medical genetics history, impacting both fundamental research, and diagnostic methods leading to personalized medicine. A plethora of effic...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S14-S9

    authors: Coutant S,Cabot C,Lefebvre A,Léonard M,Prieur-Gaston E,Campion D,Lecroq T,Dauchel H

    更新日期:2012-01-01 00:00:00

  • Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data.

    abstract:BACKGROUND:This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulatio...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-310

    authors: Barros RC,Winck AT,Machado KS,Basgalupp MP,de Carvalho AC,Ruiz DD,de Souza ON

    更新日期:2012-11-21 00:00:00

  • Integrating diverse biological and computational sources for reliable protein-protein interactions.

    abstract:BACKGROUND:Protein-protein interactions (PPIs) play important roles in various cellular processes. However, the low quality of current PPI data detected from high-throughput screening techniques has diminished the potential usefulness of the data. We need to develop a method to address the high data noise and incomplet...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S7-S8

    authors: Wu M,Li X,Chua HN,Kwoh CK,Ng SK

    更新日期:2010-10-15 00:00:00

  • PuFFIN--a parameter-free method to build nucleosome maps from paired-end reads.

    abstract:BACKGROUND:We introduce a novel method, called PuFFIN, that takes advantage of paired-end short reads to build genome-wide nucleosome maps with larger numbers of detected nucleosomes and higher accuracy than existing tools. In contrast to other approaches that require users to optimize several parameters according to t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S9-S11

    authors: Polishko A,Bunnik EM,Le Roch KG,Lonardi S

    更新日期:2014-01-01 00:00:00

  • A multiresolution approach to automated classification of protein subcellular location images.

    abstract:BACKGROUND:Fluorescence microscopy is widely used to determine the subcellular location of proteins. Efforts to determine location on a proteome-wide basis create a need for automated methods to analyze the resulting images. Over the past ten years, the feasibility of using machine learning methods to recognize all maj...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-210

    authors: Chebira A,Barbotin Y,Jackson C,Merryman T,Srinivasa G,Murphy RF,Kovacević J

    更新日期:2007-06-19 00:00:00

  • JISTIC: identification of significant targets in cancer.

    abstract:BACKGROUND:Cancer is caused through a multistep process, in which a succession of genetic changes, each conferring a competitive advantage for growth and proliferation, leads to the progressive conversion of normal human cells into malignant cancer cells. Interrogation of cancer genomes holds the promise of understandi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-189

    authors: Sanchez-Garcia F,Akavia UD,Mozes E,Pe'er D

    更新日期:2010-04-14 00:00:00

  • PseUI: Pseudouridine sites identification based on RNA sequence information.

    abstract:BACKGROUND:Pseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefit for understanding t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2321-0

    authors: He J,Fang T,Zhang Z,Huang B,Zhu X,Xiong Y

    更新日期:2018-08-29 00:00:00

  • ElTetrado: a tool for identification and classification of tetrads and quadruplexes.

    abstract:BACKGROUND:Quadruplexes are specific structure motifs occurring, e.g., in telomeres and transcriptional regulatory regions. Recent discoveries confirmed their importance in biomedicine and led to an intensified examination of their properties. So far, the study of these motifs has focused mainly on the sequence and the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3385-1

    authors: Zok T,Popenda M,Szachniuk M

    更新日期:2020-01-31 00:00:00

  • PVT: an efficient computational procedure to speed up next-generation sequence analysis.

    abstract:BACKGROUND:High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the dif...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-167

    authors: Maji RK,Sarkar A,Khatua S,Dasgupta S,Ghosh Z

    更新日期:2014-06-04 00:00:00

  • ORdensity: user-friendly R package to identify differentially expressed genes.

    abstract:BACKGROUND:Microarray technology provides the expression level of many genes. Nowadays, an important issue is to select a small number of informative differentially expressed genes that provide biological knowledge and may be key elements for a disease. With the increasing volume of data generated by modern biomedical ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3463-4

    authors: Martínez-Otzeta JM,Irigoien I,Sierra B,Arenas C

    更新日期:2020-04-07 00:00:00

  • PIGS: improved estimates of identity-by-descent probabilities by probabilistic IBD graph sampling.

    abstract::Identifying segments in the genome of different individuals that are identical-by-descent (IBD) is a fundamental element of genetics. IBD data is used for numerous applications including demographic inference, heritability estimation, and mapping disease loci. Simultaneous detection of IBD over multiple haplotypes has...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S5-S9

    authors: Park DS,Baran Y,Hormozdiari F,Eng C,Torgerson DG,Burchard EG,Zaitlen N

    更新日期:2015-01-01 00:00:00

  • Identification of conserved gene clusters in multiple genomes based on synteny and homology.

    abstract:BACKGROUND:Uncovering the relationship between the conserved chromosomal segments and the functional relatedness of elements within these segments is an important question in computational genomics. We build upon the series of works on gene teams and homology teams. RESULTS:Our primary contribution is a local sliding-...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S9-S18

    authors: Sarkar A,Soueidan H,Nikolski M

    更新日期:2011-10-05 00:00:00

  • Computational algorithms to predict Gene Ontology annotations.

    abstract:BACKGROUND:Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biologi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S6-S4

    authors: Pinoli P,Chicco D,Masseroli M

    更新日期:2015-01-01 00:00:00

  • Bioinformatics research in the Asia Pacific: a 2007 update.

    abstract::We provide a 2007 update on the bioinformatics research in the Asia-Pacific from the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998. From 2002, APBioNet has organized the first International Conference on Bioinformatics (InCoB) bringing together scientists work...

    journal_title:BMC bioinformatics

    pub_type:

    doi:10.1186/1471-2105-9-S1-S1

    authors: Ranganathan S,Gribskov M,Tan TW

    更新日期:2008-01-01 00:00:00

  • CellSim: a novel software to calculate cell similarity and identify their co-regulation networks.

    abstract:BACKGROUND:Cell direct reprogramming technology has been rapidly developed with its low risk of tumor risk and avoidance of ethical issues caused by stem cells, but it is still limited to specific cell types. Direct reprogramming from an original cell to target cell type needs the cell similarity and cell specific regu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2699-3

    authors: Li L,Che D,Wang X,Zhang P,Rahman SU,Zhao J,Yu J,Tao S,Lu H,Liao M

    更新日期:2019-03-04 00:00:00

  • An evaluation of copy number variation detection tools for cancer using whole exome sequencing data.

    abstract:BACKGROUND:Recently copy number variation (CNV) has gained considerable interest as a type of genomic/genetic variation that plays an important role in disease susceptibility. Advances in sequencing technology have created an opportunity for detecting CNVs more accurately. Recently whole exome sequencing (WES) has beco...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1705-x

    authors: Zare F,Dow M,Monteleone N,Hosny A,Nabavi S

    更新日期:2017-05-31 00:00:00

  • GenHtr: a tool for comparative assessment of genetic heterogeneity in microbial genomes generated by massive short-read sequencing.

    abstract:BACKGROUND:Microevolution is the study of short-term changes of alleles within a population and their effects on the phenotype of organisms. The result of the below-species-level evolution is heterogeneity, where populations consist of subpopulations with a large number of structural variations. Heterogeneity analysis ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-508

    authors: Yu G

    更新日期:2010-10-12 00:00:00

  • Bayesian semiparametric regression models to characterize molecular evolution.

    abstract:BACKGROUND:Statistical models and methods that associate changes in the physicochemical properties of amino acids with natural selection at the molecular level typically do not take into account the correlations between such properties. We propose a Bayesian hierarchical regression model with a generalization of the Di...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-278

    authors: Datta S,Rodriguez A,Prado R

    更新日期:2012-10-30 00:00:00

  • An assessment of catalytic residue 3D ensembles for the prediction of enzyme function.

    abstract:BACKGROUND:The central element of each enzyme is the catalytic site, which commonly catalyzes a single biochemical reaction with high specificity. It was unclear to us how often sites that catalyze the same or highly similar reactions evolved on different, i. e. non-homologous protein folds and how similar their 3D pos...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0807-6

    authors: Žváček C,Friedrichs G,Heizinger L,Merkl R

    更新日期:2015-11-04 00:00:00

  • ProCKSI: a decision support system for Protein (structure) Comparison, Knowledge, Similarity and Information.

    abstract:BACKGROUND:We introduce the decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information (ProCKSI). ProCKSI integrates various protein similarity measures through an easy to use interface that allows the comparison of multiple proteins simultaneously. It employs the Universal Simila...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-416

    authors: Barthel D,Hirst JD,Błazewicz J,Burke EK,Krasnogor N

    更新日期:2007-10-26 00:00:00

  • Predicting protein functions by relaxation labelling protein interaction network.

    abstract:BACKGROUND:One of key issues in the post-genomic era is to assign functions to uncharacterized proteins. Since proteins seldom act alone; rather, they must interact with other biomolecular units to execute their functions. Thus, the functions of unknown proteins may be discovered through studying their interactions wit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S1-S64

    authors: Hu P,Jiang H,Emili A

    更新日期:2010-01-18 00:00:00

  • Detection of gene pathways with predictive power for breast cancer prognosis.

    abstract:BACKGROUND:Prognosis is of critical interest in breast cancer research. Biomedical studies suggest that genomic measurements may have independent predictive power for prognosis. Gene profiling studies have been conducted to search for predictive genomic measurements. Genes have the inherent pathway structure, where pat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-1

    authors: Ma S,Kosorok MR

    更新日期:2010-01-01 00:00:00

  • CLU: a new algorithm for EST clustering.

    abstract:BACKGROUND:The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST clustering, the process of grouping of original fragments according to their annotation, similarity to known genomic DNA or each other. Cluste...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-S2-S3

    authors: Ptitsyn A,Hide W

    更新日期:2005-07-15 00:00:00

  • Learning smoothing models of copy number profiles using breakpoint annotations.

    abstract:BACKGROUND:Many models have been proposed to detect copy number alterations in chromosomal copy number profiles, but it is usually not obvious to decide which is most effective for a given data set. Furthermore, most methods have a smoothing parameter that determines the number of breakpoints and must be chosen using v...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-164

    authors: Hocking TD,Schleiermacher G,Janoueix-Lerosey I,Boeva V,Cappo J,Delattre O,Bach F,Vert JP

    更新日期:2013-05-22 00:00:00

  • Genome Projector: zoomable genome map with multiple views.

    abstract:BACKGROUND:Molecular biology data exist on diverse scales, from the level of molecules to -omics. At the same time, the data at each scale can be categorised into multiple layers, such as the genome, transcriptome, proteome, metabolome, and biochemical pathways. Due to the highly multi-layer and multi-dimensional natur...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-31

    authors: Arakawa K,Tamaki S,Kono N,Kido N,Ikegami K,Ogawa R,Tomita M

    更新日期:2009-01-23 00:00:00

  • On the consistency of orthology relationships.

    abstract:BACKGROUND:Orthologs inference is the starting point of most comparative genomics studies, and a plethora of methods have been designed in the last decade to address this challenging task. In this paper we focus on the problems of deciding consistency with a species tree (known or not) of a partial set of orthology/par...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1267-3

    authors: Jones M,Paul C,Scornavacca C

    更新日期:2016-11-11 00:00:00

  • FocAn: automated 3D analysis of DNA repair foci in image stacks acquired by confocal fluorescence microscopy.

    abstract:BACKGROUND:Phosphorylated histone H2AX, also known as γH2AX, forms μm-sized nuclear foci at the sites of DNA double-strand breaks (DSBs) induced by ionizing radiation and other agents. Due to their specificity and sensitivity, γH2AX immunoassays have become the gold standard for studying DSB induction and repair. One o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3370-8

    authors: Memmel S,Sisario D,Zimmermann H,Sauer M,Sukhorukov VL,Djuzenova CS,Flentje M

    更新日期:2020-01-28 00:00:00

  • mRNA:guanine-N7 cap methyltransferases: identification of novel members of the family, evolutionary analysis, homology modeling, and analysis of sequence-structure-function relationships.

    abstract:BACKGROUND:The 5'-terminal cap structure plays an important role in many aspects of mRNA metabolism. Capping enzymes encoded by viruses and pathogenic fungi are attractive targets for specific inhibitors. There is a large body of experimental data on viral and cellular methyltransferases (MTases) that carry out guanine...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-2-2

    authors: Bujnicki JM,Feder M,Radlinska M,Rychlewski L

    更新日期:2001-01-01 00:00:00

  • Maximum expected accuracy structural neighbors of an RNA secondary structure.

    abstract:BACKGROUND:Since RNA molecules regulate genes and control alternative splicing by allostery, it is important to develop algorithms to predict RNA conformational switches. Some tools, such as paRNAss, RNAshapes and RNAbor, can be used to predict potential conformational switches; nevertheless, no existent tool can detec...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S5-S6

    authors: Clote P,Lou F,Lorenz WA

    更新日期:2012-04-12 00:00:00

  • A novel method to identify cooperative functional modules: study of module coordination in the Saccharomyces cerevisiae cell cycle.

    abstract:BACKGROUND:Identifying key components in biological processes and their associations is critical for deciphering cellular functions. Recently, numerous gene expression and molecular interaction experiments have been reported in Saccharomyces cerevisiae, and these have enabled systematic studies. Although a number of ap...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-281

    authors: Hsu JT,Peng CH,Hsieh WP,Lan CY,Tang CY

    更新日期:2011-07-12 00:00:00