Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia.

Abstract:

BACKGROUND:This paper presents a retrospective statistical study on the newly-released data set by the Stanley Neuropathology Consortium on gene expression in bipolar disorder and schizophrenia. This data set contains gene expression data as well as limited demographic and clinical data for each subject. Previous studies using statistical classification or machine learning algorithms have focused on gene expression data only. The present paper investigates if such techniques can benefit from including demographic and clinical data. RESULTS:We compare six classification algorithms: support vector machines (SVMs), nearest shrunken centroids, decision trees, ensemble of voters, naïve Bayes, and nearest neighbor. SVMs outperform the other algorithms. Using expression data only, they yield an area under the ROC curve of 0.92 for bipolar disorder versus control, and 0.91 for schizophrenia versus control. By including demographic and clinical data, classification performance improves to 0.97 and 0.94 respectively. CONCLUSION:This paper demonstrates that SVMs can distinguish bipolar disorder and schizophrenia from normal control at a very high rate. Moreover, it shows that classification performance improves by including demographic and clinical data. We also found that some variables in this data set, such as alcohol and drug use, are strongly associated to the diseases. These variables may affect gene expression and make it more difficult to identify genes that are directly associated to the diseases. Stratification can correct for such variables, but we show that this reduces the power of the statistical methods.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Struyf J,Dobrin S,Page D

doi

10.1186/1471-2164-9-531

subject

Has Abstract

pub_date

2008-11-07 00:00:00

pages

531

issn

1471-2164

pii

1471-2164-9-531

journal_volume

9

pub_type

杂志文章
  • Synteny mapping between common bean and soybean reveals extensive blocks of shared loci.

    abstract:BACKGROUND:Understanding syntentic relationship between two species is critical to assessing the potential for comparative genomic analysis. Common bean (Phaseolus vulgaris L.) and soybean (Glycine max L.), the two most important members of the Phaseoleae legumes, appear to have a diploid and polyploidy recent past, re...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-184

    authors: McClean PE,Mamidi S,McConnell M,Chikara S,Lee R

    更新日期:2010-03-18 00:00:00

  • Structured RNAs and synteny regions in the pig genome.

    abstract:BACKGROUND:Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However, a more direct strateg...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-459

    authors: Anthon C,Tafer H,Havgaard JH,Thomsen B,Hedegaard J,Seemann SE,Pundhir S,Kehr S,Bartschat S,Nielsen M,Nielsen RO,Fredholm M,Stadler PF,Gorodkin J

    更新日期:2014-06-10 00:00:00

  • RNAseq expression analysis of resistant and susceptible mice after influenza A virus infection identifies novel genes associated with virus replication and important for host resistance to infection.

    abstract:BACKGROUND:The host response to influenza A infections is strongly influenced by host genetic factors. Animal models of genetically diverse mouse strains are well suited to identify host genes involved in severe pathology, viral replication and immune responses. Here, we have utilized a dual RNAseq approach that allowe...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1867-8

    authors: Wilk E,Pandey AK,Leist SR,Hatesuer B,Preusse M,Pommerenke C,Wang J,Schughart K

    更新日期:2015-09-02 00:00:00

  • Highly-multiplexed SNP genotyping for genetic mapping and germplasm diversity studies in pea.

    abstract:BACKGROUND:Single Nucleotide Polymorphisms (SNPs) can be used as genetic markers for applications such as genetic diversity studies or genetic mapping. New technologies now allow genotyping hundreds to thousands of SNPs in a single reaction.In order to evaluate the potential of these technologies in pea, we selected a ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-468

    authors: Deulvot C,Charrel H,Marty A,Jacquin F,Donnadieu C,Lejeune-Hénaut I,Burstin J,Aubert G

    更新日期:2010-08-11 00:00:00

  • Single-cell transcriptomics using spliced leader PCR: Evidence for multiple losses of photosynthesis in polykrikoid dinoflagellates.

    abstract:BACKGROUND:Most microbial eukaryotes are uncultivated and thus poorly suited to standard genomic techniques. This is the case for Polykrikos lebouriae, a dinoflagellate with ultrastructurally aberrant plastids. It has been suggested that these plastids stem from a novel symbiosis with either a diatom or haptophyte, but...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1636-8

    authors: Gavelis GS,White RA,Suttle CA,Keeling PJ,Leander BS

    更新日期:2015-07-17 00:00:00

  • High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs.

    abstract:BACKGROUND:Allelic variation is the cornerstone of genetically determined differences in gene expression, gene product structure, physiology, and behavior. However, allelic variation, particularly cryptic (unknown or not annotated) variation, is problematic for follow up analyses. Polymorphisms result in a high inciden...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-379

    authors: Walter NA,Bottomly D,Laderas T,Mooney MA,Darakjian P,Searles RP,Harrington CA,McWeeney SK,Hitzemann R,Buck KJ

    更新日期:2009-08-17 00:00:00

  • Comparative genomics of the family Vibrionaceae reveals the wide distribution of genes encoding virulence-associated proteins.

    abstract:BACKGROUND:Species of the family Vibrionaceae are ubiquitous in marine environments. Several of these species are important pathogens of humans and marine species. Evidence indicates that genetic exchange plays an important role in the emergence of new pathogenic strains within this family. Data from the sequenced geno...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-369

    authors: Lilburn TG,Gu J,Cai H,Wang Y

    更新日期:2010-06-10 00:00:00

  • Structural and sequence diversity of the transposon Galileo in the Drosophila willistoni genome.

    abstract:BACKGROUND:Galileo is one of three members of the P superfamily of DNA transposons. It was originally discovered in Drosophila buzzatii, in which three segregating chromosomal inversions were shown to have been generated by ectopic recombination between Galileo copies. Subsequently, Galileo was identified in six of 12 ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-792

    authors: Gonçalves JW,Valiati VH,Delprat A,Valente VL,Ruiz A

    更新日期:2014-09-13 00:00:00

  • Gene silencing pathways found in the green alga Volvox carteri reveal insights into evolution and origins of small RNA systems in plants.

    abstract:BACKGROUND:Volvox carteri (V. carteri) is a multicellular green alga used as model system for the evolution of multicellularity. So far, the contribution of small RNA pathways to these phenomena is not understood. Thus, we have sequenced V. carteri Argonaute 3 (VcAGO3)-associated small RNAs from different developmental...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3202-4

    authors: Dueck A,Evers M,Henz SR,Unger K,Eichner N,Merkl R,Berezikov E,Engelmann JC,Weigel D,Wenzl S,Meister G

    更新日期:2016-11-02 00:00:00

  • NovelFam3000--uncharacterized human protein domains conserved across model organisms.

    abstract:BACKGROUND:Despite significant efforts from the research community, an extensive portion of the proteins encoded by human genes lack an assigned cellular function. Most metazoan proteins are composed of structural and/or functional domains, of which many appear in multiple proteins. Once a domain is characterized in on...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-7-48

    authors: Kemmer D,Podowski RM,Arenillas D,Lim J,Hodges E,Roth P,Sonnhammer EL,Höög C,Wasserman WW

    更新日期:2006-03-13 00:00:00

  • Positively correlated miRNA-mRNA regulatory networks in mouse frontal cortex during early stages of alcohol dependence.

    abstract:BACKGROUND:Although the study of gene regulation via the action of specific microRNAs (miRNAs) has experienced a boom in recent years, the analysis of genome-wide interaction networks among miRNAs and respective targeted mRNAs has lagged behind. MicroRNAs simultaneously target many transcripts and fine-tune the express...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-725

    authors: Nunez YO,Truitt JM,Gorini G,Ponomareva ON,Blednov YA,Harris RA,Mayfield RD

    更新日期:2013-10-22 00:00:00

  • Identifying Mendelian disease genes with the variant effect scoring tool.

    abstract:BACKGROUND:Whole exome sequencing studies identify hundreds to thousands of rare protein coding variants of ambiguous significance for human health. Computational tools are needed to accelerate the identification of specific variants and genes that contribute to human disease. RESULTS:We have developed the Variant Eff...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-S3-S3

    authors: Carter H,Douville C,Stenson PD,Cooper DN,Karchin R

    更新日期:2013-01-01 00:00:00

  • Profiling and metaanalysis of epidermal keratinocytes responses to epidermal growth factor.

    abstract:BACKGROUND:One challenge of systems biology is the integration of new data into the preexisting, and then re-interpretation of the integrated data. Here we use readily available metaanalysis computational methods to integrate new data on the transcriptomic effects of EGF in primary human epidermal keratinocytes with pr...

    journal_title:BMC genomics

    pub_type: 杂志文章,meta分析

    doi:10.1186/1471-2164-14-85

    authors: Blumenberg M

    更新日期:2013-02-08 00:00:00

  • Culture-independent genomic characterisation of Candidatus Chlamydia sanzinia, a novel uncultivated bacterium infecting snakes.

    abstract:BACKGROUND:Recent molecular studies have revealed considerably more diversity in the phylum Chlamydiae than was previously thought. Evidence is growing that many of these novel chlamydiae may be important pathogens in humans and animals. A significant barrier to characterising these novel chlamydiae is the requirement ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3055-x

    authors: Taylor-Brown A,Bachmann NL,Borel N,Polkinghorne A

    更新日期:2016-09-05 00:00:00

  • Meta-analysis of muscle transcriptome data using the MADMuscle database reveals biologically relevant gene patterns.

    abstract:BACKGROUND:DNA microarray technology has had a great impact on muscle research and microarray gene expression data has been widely used to identify gene signatures characteristic of the studied conditions. With the rapid accumulation of muscle microarray data, it is of great interest to understand how to compare and co...

    journal_title:BMC genomics

    pub_type: 杂志文章,meta分析

    doi:10.1186/1471-2164-12-113

    authors: Baron D,Dubois E,Bihouée A,Teusan R,Steenman M,Jourdon P,Magot A,Péréon Y,Veitia R,Savagner F,Ramstein G,Houlgatte R

    更新日期:2011-02-16 00:00:00

  • Metabolite and transcriptome analysis during fasting suggest a role for the p53-Ddit4 axis in major metabolic tissues.

    abstract:BACKGROUND:Fasting induces specific molecular and metabolic adaptions in most organisms. In biomedical research fasting is used in metabolic studies to synchronize nutritional states of study subjects. Because there is a lack of standardization for this procedure, we need a deeper understanding of the dynamics and the ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-758

    authors: Schupp M,Chen F,Briggs ER,Rao S,Pelzmann HJ,Pessentheiner AR,Bogner-Strauss JG,Lazar MA,Baldwin D,Prokesch A

    更新日期:2013-11-05 00:00:00

  • A graph-theoretic approach for classification and structure prediction of transmembrane β-barrel proteins.

    abstract:BACKGROUND:Transmembrane β-barrel proteins are a special class of transmembrane proteins which play several key roles in human body and diseases. Due to experimental difficulties, the number of transmembrane β-barrel proteins with known structures is very small. Over the years, a number of learning-based methods have b...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-S2-S5

    authors: Tran Vdu T,Chassignet P,Sheikh S,Steyaert JM

    更新日期:2012-04-12 00:00:00

  • RNA-Seq analysis of soft rush (Juncus effusus): transcriptome sequencing, de novo assembly, annotation, and polymorphism identification.

    abstract:BACKGROUND:Juncus effusus L. (family: Juncaceae; order: Poales) is a helophytic rush growing in temperate damp or wet terrestrial habitats and is of almost cosmopolitan distribution. The species has been studied intensively with respect to its interaction with co-occurring plants as well as microbes being involved in m...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5886-8

    authors: Arslan M,Devisetty UK,Porsch M,Große I,Müller JA,Michalski SG

    更新日期:2019-06-13 00:00:00

  • Autocorrelation analysis reveals widespread spatial biases in microarray experiments.

    abstract:BACKGROUND:DNA microarrays provide the ability to interrogate multiple genes in a single experiment and have revolutionized genomic research. However, the microarray technology suffers from various forms of biases and relatively low reproducibility. A particular source of false data has been described, in which non-ran...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-164

    authors: Koren A,Tirosh I,Barkai N

    更新日期:2007-06-12 00:00:00

  • Automatic B cell lymphoma detection using flow cytometry data.

    abstract:BACKGROUND:Flow cytometry has been widely used for the diagnosis of various hematopoietic diseases. Although there have been advances in the number of biomarkers that can be analyzed simultaneously and technologies that enable fast performance, the diagnostic data are still interpreted by a manual gating strategy. The ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-S7-S1

    authors: Shih MC,Huang SH,Donohue R,Chang CC,Zu Y

    更新日期:2013-01-01 00:00:00

  • SNP discovery by high-throughput sequencing in soybean.

    abstract:BACKGROUND:With the advance of new massively parallel genotyping technologies, quantitative trait loci (QTL) fine mapping and map-based cloning become more achievable in identifying genes for important and complex traits. Development of high-density genetic markers in the QTL regions of specific mapping populations is ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-469

    authors: Wu X,Ren C,Joshi T,Vuong T,Xu D,Nguyen HT

    更新日期:2010-08-11 00:00:00

  • Genetic structure of the Spanish population.

    abstract:BACKGROUND:Genetic admixture is a common caveat for genetic association analysis. Therefore, it is important to characterize the genetic structure of the population under study to control for this kind of potential bias. RESULTS:In this study we have sampled over 800 unrelated individuals from the population of Spain,...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-326

    authors: Gayán J,Galan JJ,González-Pérez A,Sáez ME,Martínez-Larrad MT,Zabena C,Rivero MC,Salinas A,Ramírez-Lorca R,Morón FJ,Royo JL,Moreno-Rey C,Velasco J,Carrasco JM,Molero E,Ochoa C,Ochoa MD,Gutiérrez M,Reina M,Pascual R,

    更新日期:2010-05-25 00:00:00

  • Single-cell RNA sequencing reveals dynamic changes in A-to-I RNA editome during early human embryogenesis.

    abstract:BACKGROUND:A-to-I RNA-editing mediated by ADAR (adenosine deaminase acting on RNA) enzymes that converts adenosine to inosine in RNA sequence can generate mutations and alter gene regulation in metazoans. Previous studies have shown that A-to-I RNA-editing plays vital roles in mouse embryogenesis. However, the RNA-edit...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3115-2

    authors: Qiu S,Li W,Xiong H,Liu D,Bai Y,Wu K,Zhang X,Yang H,Ma K,Hou Y,Li B

    更新日期:2016-09-29 00:00:00

  • Comparative genomics of the wheat fungal pathogen Pyrenophora tritici-repentis reveals chromosomal variations and genome plasticity.

    abstract:BACKGROUND:Pyrenophora tritici-repentis (Ptr) is a necrotrophic fungal pathogen that causes the major wheat disease, tan spot. We set out to provide essential genomics-based resources in order to better understand the pathogenicity mechanisms of this important pathogen. RESULTS:Here, we present eight new Ptr isolate g...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4680-3

    authors: Moolhuijzen P,See PT,Hane JK,Shi G,Liu Z,Oliver RP,Moffat CS

    更新日期:2018-04-23 00:00:00

  • Profiling expression changes caused by a segmental aneuploid in maize.

    abstract:BACKGROUND:While changes in chromosome number that result in aneuploidy are associated with phenotypic consequences such as Down syndrome and cancer, the molecular causes of specific phenotypes and genome-wide expression changes that occur in aneuploids are still being elucidated. RESULTS:We employed a segmental aneup...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-7

    authors: Makarevitch I,Phillips RL,Springer NM

    更新日期:2008-01-10 00:00:00

  • Genome-wide association and transcriptional studies reveal novel genes for unsaturated fatty acid synthesis in a panel of soybean accessions.

    abstract:BACKGROUND:The nutritional value of soybean oil is largely influenced by the proportions of unsaturated fatty acids (FAs), including oleic acid (OA, 18:1), linoleic acid (LLA, 18:2), and linolenic acid (LNA, 18:3). Genome-wide association (GWAS) studies along with gene expression studies in soybean [Glycine max (L.) Me...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5449-z

    authors: Zhao X,Jiang H,Feng L,Qu Y,Teng W,Qiu L,Zheng H,Han Y,Li W

    更新日期:2019-01-21 00:00:00

  • Overlap between eQTL and QTL associated with production traits and fertility in dairy cattle.

    abstract:BACKGROUND:Identifying causative mutations or genes through which quantitative trait loci (QTL) act has proven very difficult. Using information such as gene expression may help to identify genes and mutations underlying QTL. Our objective was to identify regions associated both with production traits or fertility and ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5656-7

    authors: van den Berg I,Hayes BJ,Chamberlain AJ,Goddard ME

    更新日期:2019-04-15 00:00:00

  • Diversity and structure of PIF/Harbinger-like elements in the genome of Medicago truncatula.

    abstract:BACKGROUND:Transposable elements constitute a significant fraction of plant genomes. The PIF/Harbinger superfamily includes DNA transposons (class II elements) carrying terminal inverted repeats and producing a 3 bp target site duplication upon insertion. The presence of an ORF coding for the DDE/DDD transposase, requi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-409

    authors: Grzebelus D,Lasota S,Gambin T,Kucherov G,Gambin A

    更新日期:2007-11-09 00:00:00

  • Next-generation sequencing identifies equine cartilage and subchondral bone miRNAs and suggests their involvement in osteochondrosis physiopathology.

    abstract:BACKGROUND:MicroRNAs (miRNAs) are an abundant class of small single-stranded non-coding RNA molecules ranging from 18 to 24 nucleotides. They negatively regulate gene expression at the post-transcriptional level and play key roles in many biological processes, including skeletal development and cartilage maturation. In...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-798

    authors: Desjardin C,Vaiman A,Mata X,Legendre R,Laubier J,Kennedy SP,Laloe D,Barrey E,Jacques C,Cribiu EP,Schibler L

    更新日期:2014-09-17 00:00:00

  • Genomic sequencing of Troides aeacus nucleopolyhedrovirus (TraeNPV) from golden birdwing larvae (Troides aeacus formosanus) to reveal defective Autographa californica NPV genomic features.

    abstract:BACKGROUND:The golden birdwing butterfly (Troides aeacus formosanus) is a rarely observed species in Taiwan. Recently, a typical symptom of nuclear polyhedrosis was found in reared T. aeacus larvae. From the previous Kimura-2 parameter (K-2-P) analysis based on the nucleotide sequence of three genes in this isolate, po...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5713-2

    authors: Huang YF,Chen TH,Chang ZT,Wang TC,Lee SJ,Kim JC,Kim JS,Chiu KP,Nai YS

    更新日期:2019-05-27 00:00:00