Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis.

Abstract:

BACKGROUND:Dimension reduction is a critical issue in the analysis of microarray data, because the high dimensionality of gene expression microarray data set hurts generalization performance of classifiers. It consists of two types of methods, i.e. feature selection and feature extraction. Principle component analysis (PCA) and partial least squares (PLS) are two frequently used feature extraction methods, and in the previous works, the top several components of PCA or PLS are selected for modeling according to the descending order of eigenvalues. While in this paper, we prove that not all the top features are useful, but features should be selected from all the components by feature selection methods. RESULTS:We demonstrate a framework for selecting feature subsets from all the newly extracted components, leading to reduced classification error rates on the gene expression microarray data. Here we have considered both an unsupervised method PCA and a supervised method PLS for extracting new components, genetic algorithms for feature selection, and support vector machines and k nearest neighbor for classification. Experimental results illustrate that our proposed framework is effective to select feature subsets and to reduce classification error rates. CONCLUSION:Not only the top features newly extracted by PCA or PLS are important, therefore, feature selection should be performed to select subsets from new features to improve generalization performance of classifiers.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Li GZ,Bu HL,Yang MQ,Zeng XQ,Yang JY

doi

10.1186/1471-2164-9-S2-S24

subject

Has Abstract

pub_date

2008-09-16 00:00:00

pages

S24

issn

1471-2164

pii

1471-2164-9-S2-S24

journal_volume

9 Suppl 2

pub_type

杂志文章
  • Deep and comparative analysis of the mycelium and appressorium transcriptomes of Magnaporthe grisea using MPSS, RL-SAGE, and oligoarray methods.

    abstract:BACKGROUND:Rice blast, caused by the fungal pathogen Magnaporthe grisea, is a devastating disease causing tremendous yield loss in rice production. The public availability of the complete genome sequence of M. grisea provides ample opportunities to understand the molecular mechanism of its pathogenesis on rice plants a...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-7-310

    authors: Gowda M,Venu RC,Raghupathy MB,Nobuta K,Li H,Wing R,Stahlberg E,Couglan S,Haudenschild CD,Dean R,Nahm BH,Meyers BC,Wang GL

    更新日期:2006-12-08 00:00:00

  • Transcriptomic response of the Antarctic pteropod Limacina helicina antarctica to ocean acidification.

    abstract:BACKGROUND:Ocean acidification (OA), a change in ocean chemistry due to the absorption of atmospheric CO2 into surface oceans, challenges biogenic calcification in many marine organisms. Ocean acidification is expected to rapidly progress in polar seas, with regions of the Southern Ocean expected to experience severe O...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4161-0

    authors: Johnson KM,Hofmann GE

    更新日期:2017-10-23 00:00:00

  • Functional and gene network analyses of transcriptional signatures characterizing pre-weaned bovine mammary parenchyma or fat pad uncovered novel inter-tissue signaling networks during development.

    abstract:BACKGROUND:The neonatal bovine mammary fat pad (MFP) surrounding the mammary parenchyma (PAR) is thought to exert proliferative effects on the PAR through secretion of local modulators of growth induced by systemic hormones. We used bioinformatics to characterize transcriptomics differences between PAR and MFP from app...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-331

    authors: Piantoni P,Bionaz M,Graugnard DE,Daniels KM,Everts RE,Rodriguez-Zas SL,Lewin HA,Hurley HL,Akers M,Loor JJ

    更新日期:2010-05-26 00:00:00

  • RNA sequencing and transcriptome arrays analyses show opposing results for alternative splicing in patient derived samples.

    abstract:BACKGROUND:RNA sequencing (RNA-seq) and microarrays are two transcriptomics techniques aimed at the quantification of transcribed genes and their isoforms. Here we compare the latest Affymetrix HTA 2.0 microarray with Illumina 2000 RNA-seq for the analysis of patient samples - normal lung epithelium tissue and squamous...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-3819-y

    authors: Nazarov PV,Muller A,Kaoma T,Nicot N,Maximo C,Birembaut P,Tran NL,Dittmar G,Vallar L

    更新日期:2017-06-06 00:00:00

  • Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution.

    abstract:BACKGROUND:New gene emergence is so far assumed to be mostly driven by duplication and divergence of existing genes. The possibility that entirely new genes could emerge out of the non-coding genomic background was long thought to be almost negligible. With the increasing availability of fully sequenced genomes across ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-117

    authors: Neme R,Tautz D

    更新日期:2013-02-21 00:00:00

  • Genome-based analysis for the identification of genes involved in o-xylene degradation in Rhodococcus opacus R7.

    abstract:BACKGROUND:Bacteria belonging to the Rhodococcus genus play an important role in the degradation of many contaminants, including methylbenzenes. These bacteria, widely distributed in the environment, are known to be a powerhouse of numerous degradation functions, due to their ability to metabolize a wide range of organ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4965-6

    authors: Di Canito A,Zampolli J,Orro A,D'Ursi P,Milanesi L,Sello G,Steinbüchel A,Di Gennaro P

    更新日期:2018-08-06 00:00:00

  • Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding.

    abstract:BACKGROUND:The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar h...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-57

    authors: Ralph SG,Chun HJ,Cooper D,Kirkpatrick R,Kolosova N,Gunter L,Tuskan GA,Douglas CJ,Holt RA,Jones SJ,Marra MA,Bohlmann J

    更新日期:2008-01-29 00:00:00

  • Comparing Mycobacterium tuberculosis genomes using genome topology networks.

    abstract:BACKGROUND:Over the last decade, emerging research methods, such as comparative genomic analysis and phylogenetic study, have yielded new insights into genotypes and phenotypes of closely related bacterial strains. Several findings have revealed that genomic structural variations (SVs), including gene gain/loss, gene d...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1259-0

    authors: Jiang J,Gu J,Zhang L,Zhang C,Deng X,Dou T,Zhao G,Zhou Y

    更新日期:2015-02-14 00:00:00

  • Proteome-wide analysis of Anopheles culicifacies mosquito midgut: new insights into the mechanism of refractoriness.

    abstract:BACKGROUND:Midgut invasion, a major bottleneck for malaria parasites transmission is considered as a potential target for vector-parasite interaction studies. New intervention strategies are required to explore the midgut proteins and their potential role in refractoriness for malaria control in Anopheles mosquitoes. T...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4729-3

    authors: Vijay S,Rawal R,Kadian K,Singh J,Adak T,Sharma A

    更新日期:2018-05-08 00:00:00

  • Comparison of gene expression of Paramecium bursaria with and without Chlorella variabilis symbionts.

    abstract:BACKGROUND:The ciliate Paramecium bursaria harbors several hundred cells of the green-alga Chlorella sp. in their cytoplasm. Irrespective of the mutual relation between P. bursaria and the symbiotic algae, both cells retain the ability to grow without the partner. They can easily reestablish endosymbiosis when put in c...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-183

    authors: Kodama Y,Suzuki H,Dohra H,Sugii M,Kitazume T,Yamaguchi K,Shigenobu S,Fujishima M

    更新日期:2014-03-10 00:00:00

  • Genome-wide transcriptomic profiling of Anopheles gambiae hemocytes reveals pathogen-specific signatures upon bacterial challenge and Plasmodium berghei infection.

    abstract:BACKGROUND:The mosquito Anopheles gambiae is a major vector of human malaria. Increasing evidence indicates that blood cells (hemocytes) comprise an essential arm of the mosquito innate immune response against both bacteria and malaria parasites. To further characterize the role of hemocytes in mosquito immunity, we un...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-257

    authors: Baton LA,Robertson A,Warr E,Strand MR,Dimopoulos G

    更新日期:2009-06-05 00:00:00

  • Transcriptional response to sulfide in the Echiuran Worm Urechis unicinctus by digital gene expression analysis.

    abstract:BACKGROUND:Urechis unicinctus, an echiuran worm inhabiting the U-shaped burrows in the coastal mud flats, is an important commercial and ecological invertebrate in Northeast Asian countries, which has potential applications in the study of animal evolution, coastal sediment improvement and marine drug development. Furt...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-2094-z

    authors: Liu X,Zhang L,Zhang Z,Ma X,Liu J

    更新日期:2015-10-21 00:00:00

  • Multivariate genome wide association and network analysis of subcortical imaging phenotypes in Alzheimer's disease.

    abstract:BACKGROUND:Genome-wide association studies (GWAS) have identified many individual genes associated with brain imaging quantitative traits (QTs) in Alzheimer's disease (AD). However single marker level association discovery may not be able to address the underlying biological interactions with disease mechanism. RESULT...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-07282-7

    authors: Meng X,Li J,Zhang Q,Chen F,Bian C,Yao X,Yan J,Xu Z,Risacher SL,Saykin AJ,Liang H,Shen L,Alzheimer’s Disease Neuroimaging Initiative.

    更新日期:2020-12-29 00:00:00

  • Transcriptome sequencing of a keystone aquatic herbivore yields insights on the temperature-dependent metabolism of essential lipids.

    abstract:BACKGROUND:Nutritional quality of phytoplankton is a major determinant of the trophic transfer efficiency at the plant-herbivore interface in freshwater food webs. In particular, the phytoplankton's content of the essential polyunsaturated omega-3 fatty acid eicosapentaenoic acid (EPA) has been repeatedly shown to limi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6268-y

    authors: Windisch HS,Fink P

    更新日期:2019-11-21 00:00:00

  • Comparative transcriptomics of Gymnosporangium spp. teliospores reveals a conserved genetic program at this specific stage of the rust fungal life cycle.

    abstract:BACKGROUND:Gymnosporangium spp. are fungal plant pathogens causing rust disease and most of them are known to infect two different host plants (heteroecious) with four spore stages (demicyclic). In the present study, we sequenced the transcriptome of G. japonicum teliospores on its host plant Juniperus chinensis and we...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6099-x

    authors: Tao SQ,Cao B,Morin E,Liang YM,Duplessis S

    更新日期:2019-10-09 00:00:00

  • Identification of novel aspartic proteases from Strongyloides ratti and characterisation of their evolutionary relationships, stage-specific expression and molecular structure.

    abstract:BACKGROUND:Aspartic proteases are known to play an important role in the biology of nematode parasitism. This role is best characterised in blood-feeding nematodes, where they digest haemoglobin, but they are also likely to play important roles in the biology of nematode parasites that do not feed on blood. In the pres...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-611

    authors: Mello LV,O'Meara H,Rigden DJ,Paterson S

    更新日期:2009-12-16 00:00:00

  • LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo.

    abstract:BACKGROUND:Long terminal repeat retrotransposons are the most abundant transposons in plants. They play important roles in alternative splicing, recombination, gene regulation, and defense mechanisms. Large-scale sequencing projects for plant genomes are currently underway. Software tools are important for annotating l...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5796-9

    authors: Valencia JD,Girgis HZ

    更新日期:2019-06-03 00:00:00

  • An unbiased approach to identify genes involved in development in a turtle with temperature-dependent sex determination.

    abstract:BACKGROUND:Many reptiles exhibit temperature-dependent sex determination (TSD). The initial cue in TSD is incubation temperature, unlike genotypic sex determination (GSD) where it is determined by the presence of specific alleles (or genetic loci). We used patterns of gene expression to identify candidates for genes wi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-308

    authors: Chojnowski JL,Braun EL

    更新日期:2012-07-15 00:00:00

  • A hybrid expectation maximisation and MCMC sampling algorithm to implement Bayesian mixture model based genomic prediction and QTL mapping.

    abstract:BACKGROUND:Bayesian mixture models in which the effects of SNP are assumed to come from normal distributions with different variances are attractive for simultaneous genomic prediction and QTL mapping. These models are usually implemented with Monte Carlo Markov Chain (MCMC) sampling, which requires long compute times ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3082-7

    authors: Wang T,Chen YP,Bowman PJ,Goddard ME,Hayes BJ

    更新日期:2016-09-21 00:00:00

  • Transcriptome analysis reveals key roles of AtLBR-2 in LPS-induced defense responses in plants.

    abstract:BACKGROUND:Lipopolysaccharide (LPS) from Gram-negative bacteria cause innate immune responses in animals and plants. The molecules involved in LPS signaling in animals are well studied, whereas those in plants are not yet as well documented. Recently, we identified Arabidopsis AtLBR-2, which binds to LPS from Pseudomon...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4372-4

    authors: Iizasa S,Iizasa E,Watanabe K,Nagano Y

    更新日期:2017-12-29 00:00:00

  • Outlier analysis of functional genomic profiles enriches for oncology targets and enables precision medicine.

    abstract:BACKGROUND:Genome-scale functional genomic screens across large cell line panels provide a rich resource for discovering tumor vulnerabilities that can lead to the next generation of targeted therapies. Their data analysis typically has focused on identifying genes whose knockdown enhances response in various pre-defin...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2807-y

    authors: Zhu Z,Ihle NT,Rejto PA,Zarrinkar PP

    更新日期:2016-06-13 00:00:00

  • OxyGene: an innovative platform for investigating oxidative-response genes in whole prokaryotic genomes.

    abstract:BACKGROUND:Oxidative stress is a common stress encountered by living organisms and is due to an imbalance between intracellular reactive oxygen and nitrogen species (ROS, RNS) and cellular antioxidant defence. To defend themselves against ROS/RNS, bacteria possess a subsystem of detoxification enzymes, which are classi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-637

    authors: Thybert D,Avner S,Lucchetti-Miganeh C,Chéron A,Barloy-Hubler F

    更新日期:2008-12-31 00:00:00

  • Whole transcriptome analyses of six thoroughbred horses before and after exercise using RNA-Seq.

    abstract:BACKGROUND:Thoroughbred horses are the most expensive domestic animals, and their running ability and knowledge about their muscle-related diseases are important in animal genetics. While the horse reference genome is available, there has been no large-scale functional annotation of the genome using expressed genes der...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-473

    authors: Park KD,Park J,Ko J,Kim BC,Kim HS,Ahn K,Do KT,Choi H,Kim HM,Song S,Lee S,Jho S,Kong HS,Yang YM,Jhun BH,Kim C,Kim TH,Hwang S,Bhak J,Lee HK,Cho BW

    更新日期:2012-09-12 00:00:00

  • KSP: an integrated method for predicting catalyzing kinases of phosphorylation sites in proteins.

    abstract:BACKGROUND:Protein phosphorylation by kinases plays crucial roles in various biological processes including signal transduction and tumorigenesis, thus a better understanding of protein phosphorylation events in cells is fundamental for studying protein functions and designing drugs to treat diseases caused by the malf...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-06895-2

    authors: Ma H,Li G,Su Z

    更新日期:2020-08-04 00:00:00

  • Genome-wide analysis of the R2R3-MYB transcription factor genes in Chinese cabbage (Brassica rapa ssp. pekinensis) reveals their stress and hormone responsive patterns.

    abstract:BACKGROUND:The MYB superfamily is one of the most abundant transcription factor (TF) families in plants. MYB proteins include highly conserved N-terminal MYB repeats (1R, R2R3, 3R, and atypical) and various C-terminal sequences that confer extensive functions. However, the functions of most MYB genes are unknown, and h...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1216-y

    authors: Wang Z,Tang J,Hu R,Wu P,Hou XL,Song XM,Xiong AS

    更新日期:2015-01-23 00:00:00

  • Evidence for niche adaptation in the genome of the bovine pathogen Streptococcus uberis.

    abstract:BACKGROUND:Streptococcus uberis, a Gram positive bacterial pathogen responsible for a significant proportion of bovine mastitis in commercial dairy herds, colonises multiple body sites of the cow including the gut, genital tract and mammary gland. Comparative analysis of the complete genome sequence of S. uberis strain...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-54

    authors: Ward PN,Holden MT,Leigh JA,Lennard N,Bignell A,Barron A,Clark L,Quail MA,Woodward J,Barrell BG,Egan SA,Field TR,Maskell D,Kehoe M,Dowson CG,Chanter N,Whatmore AM,Bentley SD,Parkhill J

    更新日期:2009-01-28 00:00:00

  • Genome-wide association study of prolactin levels in blood plasma and cerebrospinal fluid.

    abstract:BACKGROUND:Prolactin is a polypeptide hormone secreted by the anterior pituitary gland that plays an essential role in lactation, tissue growth, and suppressing apoptosis to increase cell survival. Prolactin serves as a key player in many life-critical processes, including immune system and reproduction. Prolactin is a...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2785-0

    authors: Staley LA,Ebbert MT,Parker S,Bailey M,Alzheimer’s Disease Neuroimaging Initiative.,Ridge PG,Goate AM,Kauwe JS

    更新日期:2016-06-29 00:00:00

  • Transcriptional analysis of the mammalian heart with special reference to its endocrine function.

    abstract:BACKGROUND:Pharmacological and gene ablation studies have demonstrated the crucial role of the endocrine function of the heart as mediated by the polypeptide hormones ANF and BNP in the maintenance of cardiovascular homeostasis. The importance of these studies lies on the fact that hypertension and chronic congestive h...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-254

    authors: McGrath MF,de Bold AJ

    更新日期:2009-06-01 00:00:00

  • Construction of an integrated genetic linkage map for the A genome of Brassica napus using SSR markers derived from sequenced BACs in B. rapa.

    abstract:BACKGROUND:The Multinational Brassica rapa Genome Sequencing Project (BrGSP) has developed valuable genomic resources, including BAC libraries, BAC-end sequences, genetic and physical maps, and seed BAC sequences for Brassica rapa. An integrated linkage map between the amphidiploid B. napus and diploid B. rapa will fac...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-594

    authors: Xu J,Qian X,Wang X,Li R,Cheng X,Yang Y,Fu J,Zhang S,King GJ,Wu J,Liu K

    更新日期:2010-10-22 00:00:00

  • Gene expression analyses in Atlantic salmon challenged with infectious salmon anemia virus reveal differences between individuals with early, intermediate and late mortality.

    abstract:BACKGROUND:Infectious salmon anemia virus (ISAV) causes a multisystemic disease responsible for severe losses in salmon aquaculture. Better understanding of factors that explain variations in resistance between individuals and families is essential for development of strategies for disease control. To approach this, we...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-179

    authors: Jørgensen SM,Afanasyev S,Krasnov A

    更新日期:2008-04-18 00:00:00