Stable feature selection and classification algorithms for multiclass microarray data.

Abstract:

BACKGROUND:Recent studies suggest that gene expression profiles are a promising alternative for clinical cancer classification. One major problem in applying DNA microarrays for classification is the dimension of obtained data sets. In this paper we propose a multiclass gene selection method based on Partial Least Squares (PLS) for selecting genes for classification. The new idea is to solve multiclass selection problem with the PLS method and decomposition to a set of two-class sub-problems: one versus rest (OvR) and one versus one (OvO). We use OvR and OvO two-class decomposition for other recently published gene selection method. Ranked gene lists are highly unstable in the sense that a small change of the data set often leads to big changes in the obtained ordered lists. In this paper, we take a look at the assessment of stability of the proposed methods. We use the linear support vector machines (SVM) technique in different variants: one versus one, one versus rest, multiclass SVM (MSVM) and the linear discriminant analysis (LDA) as a classifier. We use balanced bootstrap to estimate the prediction error and to test the variability of the obtained ordered lists. RESULTS:This paper focuses on effective identification of informative genes. As a result, a new strategy to find a small subset of significant genes is designed. Our results on real multiclass cancer data show that our method has a very high accuracy rate for different combinations of classification methods, giving concurrently very stable feature rankings. CONCLUSIONS:This paper shows that the proposed strategies can improve the performance of selected gene sets substantially. OvR and OvO techniques applied to existing gene selection methods improve results as well. The presented method allows to obtain a more reliable classifier with less classifier error. In the same time the method generates more stable ordered feature lists in comparison with existing methods.

journal_name

Biol Direct

journal_title

Biology direct

authors

Student S,Fujarewicz K

doi

10.1186/1745-6150-7-33

subject

Has Abstract

pub_date

2012-10-02 00:00:00

pages

33

issn

1745-6150

pii

1745-6150-7-33

journal_volume

7

pub_type

杂志文章
  • Elusive data underlying debate at the prokaryote-eukaryote divide.

    abstract:BACKGROUND:The origin of eukaryotic cells was an important transition in evolution. The factors underlying the origin and evolutionary success of the eukaryote lineage are still discussed. One camp argues that mitochondria were essential for eukaryote origin because of the unique configuration of internalized bioenerge...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-018-0221-x

    authors: Gerlitz M,Knopp M,Kapust N,Xavier JC,Martin WF

    更新日期:2018-10-03 00:00:00

  • The multiple personalities of Watson and Crick strands.

    abstract:BACKGROUND:In genetics it is customary to refer to double-stranded DNA as containing a "Watson strand" and a "Crick strand." However, there seems to be no consensus in the literature on the exact meaning of these two terms, and the many usages contradict one another as well as the original definition. Here, we review t...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-6-7

    authors: Cartwright RA,Graur D

    更新日期:2011-02-08 00:00:00

  • Origin of the nuclear proteome on the basis of pre-existing nuclear localization signals in prokaryotic proteins.

    abstract:BACKGROUND:The origin of the selective nuclear protein import machinery, which consists of nuclear pore complexes and adaptor molecules interacting with the nuclear localization signals (NLSs) of cargo molecules, is one of the most important events in the evolution of eukaryotic cells. How proteins were selected for im...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-020-00263-6

    authors: Lisitsyna OM,Kurnaeva MA,Arifulin EA,Shubina MY,Musinova YR,Mironov AA,Sheval EV

    更新日期:2020-04-28 00:00:00

  • Assessment of urban microbiome assemblies with the help of targeted in silico gold standards.

    abstract:BACKGROUND:Microbial communities play a crucial role in our environment and may influence human health tremendously. Despite being the place where human interaction is most abundant we still know little about the urban microbiome. This is highlighted by the large amount of unclassified DNA reads found in urban metageno...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-018-0225-6

    authors: Gerner SM,Rattei T,Graf AB

    更新日期:2018-10-12 00:00:00

  • Infinitely long branches and an informal test of common ancestry.

    abstract:BACKGROUND:The evidence for universal common ancestry (UCA) is vast and persuasive. A phylogenetic test has been proposed for quantifying its odds against independently originated sequences based on the comparison between one versus several trees. This test was successfully applied to a well-supported homologous sequen...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-016-0120-y

    authors: de Oliveira Martins L,Posada D

    更新日期:2016-04-07 00:00:00

  • The archaeo-eukaryotic GINS proteins and the archaeal primase catalytic subunit PriS share a common domain.

    abstract:UNLABELLED:Primase and GINS are essential factors for chromosomal DNA replication in eukaryotic and archaeal cells. Here we describe a previously undetected relationship between the C-terminal domain of the catalytic subunit (PriS) of archaeal primase and the B-domains of the archaeo-eukaryotic GINS proteins in the for...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-5-17

    authors: Swiatek A,Macneill SA

    更新日期:2010-04-12 00:00:00

  • A highly conserved family of inactivated archaeal B family DNA polymerases.

    abstract::A widespread and highly conserved family of apparently inactivated derivatives of archaeal B-family DNA polymerases is described. Phylogenetic analysis shows that the inactivated forms comprise a distinct clade among archaeal B-family polymerases and that, within this clade, Euryarchaea and Crenarchaea are clearly sep...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-3-32

    authors: Rogozin IB,Makarova KS,Pavlov YI,Koonin EV

    更新日期:2008-08-06 00:00:00

  • Structural analysis of hubs in human NR-RTK network.

    abstract:BACKGROUND:Currently a huge amount of protein-protein interaction data is available therefore extracting meaningful ones are a challenging task. In a protein-protein interaction network, hubs are considered as key proteins maintaining function and stability of the network. Therefore, studying protein-protein complexes ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-6-49

    authors: Choura M,Rebaï A

    更新日期:2011-10-05 00:00:00

  • Trees and networks before and after Darwin.

    abstract::It is well-known that Charles Darwin sketched abstract trees of relationship in his 1837 notebook, and depicted a tree in the Origin of Species (1859). Here I attempt to place Darwin's trees in historical context. By the mid-Eighteenth century the Great Chain of Being was increasingly seen to be an inadequate descript...

    journal_title:Biology direct

    pub_type: 历史文章,杂志文章,评审

    doi:10.1186/1745-6150-4-43

    authors: Ragan MA

    更新日期:2009-11-16 00:00:00

  • Interplay of recombination and selection in the genomes of Chlamydia trachomatis.

    abstract:BACKGROUND:Chlamydia trachomatis is an obligate intracellular bacterial parasite, which causes several severe and debilitating diseases in humans. This study uses comparative genomic analyses of 12 complete published C. trachomatis genomes to assess the contribution of recombination and selection in this pathogen and t...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-6-28

    authors: Joseph SJ,Didelot X,Gandhi K,Dean D,Read TD

    更新日期:2011-05-26 00:00:00

  • Impairment of translation in neurons as a putative causative factor for autism.

    abstract:BACKGROUND:A dramatic increase in the prevalence of autism and Autistic Spectrum Disorders (ASD) has been observed over the last two decades in USA, Europe and Asia. Given the accumulating data on the possible role of translation in the etiology of ASD, we analyzed potential effects of rare synonymous substitutions ass...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-9-16

    authors: Poliakov E,Koonin EV,Rogozin IB

    更新日期:2014-07-10 00:00:00

  • Orphan SelD proteins and selenium-dependent molybdenum hydroxylases.

    abstract::Bacterial and Archaeal cells use selenium structurally in selenouridine-modified tRNAs, in proteins translated with selenocysteine, and in the selenium-dependent molybdenum hydroxylases (SDMH). The first two uses both require the selenophosphate synthetase gene, selD. Examining over 500 complete prokaryotic genomes fi...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-3-4

    authors: Haft DH,Self WT

    更新日期:2008-02-20 00:00:00

  • Pathophysiology of Crohn's disease inflammation and recurrence.

    abstract::Chron's Disease is a chronic inflammatory intestinal disease, first described at the beginning of the last century. The disease is characterized by the alternation of periods of flares and remissions influenced by a complex pathogenesis in which inflammation plays a key role. Crohn's disease evolution is mediated by a...

    journal_title:Biology direct

    pub_type: 杂志文章,评审

    doi:10.1186/s13062-020-00280-5

    authors: Petagna L,Antonelli A,Ganini C,Bellato V,Campanelli M,Divizia A,Efrati C,Franceschilli M,Guida AM,Ingallinella S,Montagnese F,Sensi B,Siragusa L,Sica GS

    更新日期:2020-11-07 00:00:00

  • Diverse bacterial genomes encode an operon of two genes, one of which is an unusual class-I release factor that potentially recognizes atypical mRNA signals other than normal stop codons.

    abstract:BACKGROUND:While all codons that specify amino acids are universally recognized by tRNA molecules, codons signaling termination of translation are recognized by proteins known as class-I release factors (RF). In most eukaryotes and archaea a single RF accomplishes termination at all three stop codons. In most bacteria,...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-1-28

    authors: Baranov PV,Vestergaard B,Hamelryck T,Gesteland RF,Nyborg J,Atkins JF

    更新日期:2006-09-13 00:00:00

  • Stringent homology-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions.

    abstract:BACKGROUND:H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are essential for understanding the infection mechanism of the formidable pathogen M. tuberculosis H37Rv. Computational prediction is an important strategy to fill the gap in experimental H. sapiens-M. tuberculosis H37Rv PPI data. Homolo...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-9-5

    authors: Zhou H,Gao S,Nguyen NN,Fan M,Jin J,Liu B,Zhao L,Xiong G,Tan M,Li S,Wong L

    更新日期:2014-04-08 00:00:00

  • The fundamental units, processes and patterns of evolution, and the tree of life conundrum.

    abstract:BACKGROUND:The elucidation of the dominant role of horizontal gene transfer (HGT) in the evolution of prokaryotes led to a severe crisis of the Tree of Life (TOL) concept and intense debates on this subject. CONCEPT:Prompted by the crisis of the TOL, we attempt to define the primary units and the fundamental patterns ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-4-33

    authors: Koonin EV,Wolf YI

    更新日期:2009-09-29 00:00:00

  • A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis.

    abstract:BACKGROUND:Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches--the examination of similarities to known disease genes and/or the evaluation of functional annotation of...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-6-30

    authors: Lombard Z,Park C,Makova KD,Ramsay M

    更新日期:2011-06-13 00:00:00

  • A web server for analysis, comparison and prediction of protein ligand binding sites.

    abstract:BACKGROUND:One of the major challenges in the field of system biology is to understand the interaction between a wide range of proteins and ligands. In the past, methods have been developed for predicting binding sites in a protein for a limited number of ligands. RESULTS:In order to address this problem, we developed...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-016-0118-5

    authors: Singh H,Srivastava HK,Raghava GP

    更新日期:2016-03-25 00:00:00

  • Optimal treatment and stochastic modeling of heterogeneous tumors.

    abstract:UNLABELLED:In this work we review past articles that have mathematically studied cancer heterogeneity and the impact of this heterogeneity on the structure of optimal therapy. We look at past works on modeling how heterogeneous tumors respond to radiotherapy, and take a particularly close look at how the optimal radiot...

    journal_title:Biology direct

    pub_type: 杂志文章,评审

    doi:10.1186/s13062-016-0142-5

    authors: Badri H,Leder K

    更新日期:2016-08-23 00:00:00

  • Once upon a time the cell membranes: 175 years of cell boundary research.

    abstract::All modern cells are bounded by cell membranes best described by the fluid mosaic model. This statement is so widely accepted by biologists that little attention is generally given to the theoretical importance of cell membranes in describing the cell. This has not always been the case. When the Cell Theory was first ...

    journal_title:Biology direct

    pub_type: 历史文章,杂志文章,评审

    doi:10.1186/s13062-014-0032-7

    authors: Lombard J

    更新日期:2014-12-19 00:00:00

  • A network-based approach to classify the three domains of life.

    abstract:BACKGROUND:Identifying group-specific characteristics in metabolic networks can provide better insight into evolutionary developments. Here, we present an approach to classify the three domains of life using topological information about the underlying metabolic networks. These networks have been shown to share domain-...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-6-53

    authors: Mueller LA,Kugler KG,Netzer M,Graber A,Dehmer M

    更新日期:2011-10-13 00:00:00

  • From tumors to species: a SCANDAL hypothesis.

    abstract::ᅟ: Some tumor cells can evolve into transmissible parasites. Notable examples include the Tasmanian devil facial tumor disease, the canine transmissible venereal tumor and transmissible cancers of mollusks. We present a hypothesis that such transmissible tumors existed in the past and that some modern animal taxa are ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-019-0233-1

    authors: Panchin AY,Aleoshin VV,Panchin YV

    更新日期:2019-01-23 00:00:00

  • Activating and inhibiting connections in biological network dynamics.

    abstract:BACKGROUND:Many studies of biochemical networks have analyzed network topology. Such work has suggested that specific types of network wiring may increase network robustness and therefore confer a selective advantage. However, knowledge of network topology does not allow one to predict network dynamical behavior--for e...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-3-49

    authors: McDonald D,Waterbury L,Knight R,Betterton MD

    更新日期:2008-12-04 00:00:00

  • Prediction and mechanistic analysis of drug-induced liver injury (DILI) based on chemical structure.

    abstract:BACKGROUND:Drug-induced liver injury (DILI) is a major safety concern characterized by a complex and diverse pathogenesis. In order to identify DILI early in drug development, a better understanding of the injury and models with better predictivity are urgently needed. One approach in this regard are in silico models w...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-020-00285-0

    authors: Liu A,Walter M,Wright P,Bartosik A,Dolciami D,Elbasir A,Yang H,Bender A

    更新日期:2021-01-18 00:00:00

  • Hereditary profiles of disorderly transcription?

    abstract:BACKGROUND:Microscopic examination of living cells often reveals that cells from some cell strains appear to be in a permanent state of disarray without obvious reason. In all probability such a disorderly state affects cell functioning. The aim of this study was to establish whether a disorderly state could occur that...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-1-9

    authors: Simons JW

    更新日期:2006-04-02 00:00:00

  • The mechanistic and evolutionary aspects of the 2'- and 3'-OH paradigm in biosynthetic machinery.

    abstract:BACKGROUND:The translation machinery underlies a multitude of biological processes within the cell. The design and implementation of the modern translation apparatus on even the simplest course of action is extremely complex, and involves different RNA and protein factors. According to the "RNA world" idea, the critica...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-8-17

    authors: Safro M,Klipcan L

    更新日期:2013-07-08 00:00:00

  • Human gammadelta T cell recognition of lipid A is predominately presented by CD1b or CD1c on dendritic cells.

    abstract:BACKGROUND:The gammadelta T cells serve as early immune defense against certain encountered microbes. Only a few gammadelta T cell-recognized ligands from microbial antigens have been identified so far and the mechanisms by which gammadelta T cells recognize these ligands remain unknown. Here we explored the mechanism ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-4-47

    authors: Cui Y,Kang L,Cui L,He W

    更新日期:2009-12-01 00:00:00

  • Rotational restriction of nascent peptides as an essential element of co-translational protein folding: possible molecular players and structural consequences.

    abstract:BACKGROUND:A basic tenet of protein science is that all information about the spatial structure of proteins is present in their sequences. Nonetheless, many proteins fail to attain native structure upon experimental denaturation and refolding in vitro, raising the question of the specific role of cellular machinery in ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-017-0186-1

    authors: Sorokina I,Mushegian A

    更新日期:2017-05-31 00:00:00

  • Component retention in principal component analysis with application to cDNA microarray data.

    abstract::Shannon entropy is used to provide an estimate of the number of interpretable components in a principal component analysis. In addition, several ad hoc stopping rules for dimension determination are reviewed and a modification of the broken stick model is presented. The modification incorporates a test for the presenc...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-2-2

    authors: Cangelosi R,Goriely A

    更新日期:2007-01-17 00:00:00

  • MutL homologs in restriction-modification systems and the origin of eukaryotic MORC ATPases.

    abstract::The provenance and biochemical roles of eukaryotic MORC proteins have remained poorly understood since the discovery of their prototype MORC1, which is required for meiotic nuclear division in animals. The MORC family contains a combination of a gyrase, histidine kinase, and MutL (GHKL) and S5 domains that together co...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-3-8

    authors: Iyer LM,Abhiman S,Aravind L

    更新日期:2008-03-17 00:00:00