Gene2vec: distributed representation of genes based on co-expression.

Abstract:

BACKGROUND:Existing functional description of genes are categorical, discrete, and mostly through manual process. In this work, we explore the idea of gene embedding, distributed representation of genes, in the spirit of word embedding. RESULTS:From a pure data-driven fashion, we trained a 200-dimension vector representation of all human genes, using gene co-expression patterns in 984 data sets from the GEO databases. These vectors capture functional relatedness of genes in terms of recovering known pathways - the average inner product (similarity) of genes within a pathway is 1.52X greater than that of random genes. Using t-SNE, we produced a gene co-expression map that shows local concentrations of tissue specific genes. We also illustrated the usefulness of the embedded gene vectors, laden with rich information on gene co-expression patterns, in tasks such as gene-gene interaction prediction. CONCLUSIONS:We proposed a machine learning method that utilizes transcriptome-wide gene co-expression to generate a distributed representation of genes. We further demonstrated the utility of our distribution by predicting gene-gene interaction based solely on gene names. The distributed representation of genes could be useful for more bioinformatics applications.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Du J,Jia P,Dai Y,Tao C,Zhao Z,Zhi D

doi

10.1186/s12864-018-5370-x

subject

Has Abstract

pub_date

2019-02-04 00:00:00

pages

82

issue

Suppl 1

issn

1471-2164

pii

10.1186/s12864-018-5370-x

journal_volume

20

pub_type

杂志文章
  • Getting insight into the pan-genome structure with PangTree.

    abstract:BACKGROUND:The term pan-genome was proposed to denominate collections of genomic sequences jointly analyzed or used as a reference. The constant growth of genomic data intensifies development of data structures and algorithms to investigate pan-genomes efficiently. RESULTS:This work focuses on providing a tool for dis...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-6610-4

    authors: Dziadkiewicz P,Dojer N

    更新日期:2020-04-16 00:00:00

  • The comparison of four mitochondrial genomes reveals cytoplasmic male sterility candidate genes in cotton.

    abstract:BACKGROUND:The mitochondrial genomes of higher plants vary remarkably in size, structure and sequence content, as demonstrated by the accumulation and activity of repetitive DNA sequences. Incompatibility between mitochondrial genome and nuclear genome leads to non-functional male reproductive organs and results in cyt...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5122-y

    authors: Li S,Chen Z,Zhao N,Wang Y,Nie H,Hua J

    更新日期:2018-10-26 00:00:00

  • Genomic characterization of JG068, a novel virulent podovirus active against Burkholderia cenocepacia.

    abstract:BACKGROUND:As is true for many other antibiotic-resistant Gram-negative pathogens, members of the Burkholderia cepacia complex (BCC) are currently being assessed for their susceptibility to phage therapy as an antimicrobial treatment. The objective of this study was to perform genomic and limited functional characteriz...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-574

    authors: Lynch KH,Abdu AH,Schobert M,Dennis JJ

    更新日期:2013-08-27 00:00:00

  • Polygenic and sex specific architecture for two maturation traits in farmed Atlantic salmon.

    abstract:BACKGROUND:A key developmental transformation in the life of all vertebrates is the transition to sexual maturity, whereby individuals are capable of reproducing for the first time. In the farming of Atlantic salmon, early maturation prior to harvest size has serious negative production impacts. RESULTS:We report geno...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5525-4

    authors: Mohamed AR,Verbyla KL,Al-Mamun HA,McWilliam S,Evans B,King H,Kube P,Kijas JW

    更新日期:2019-02-15 00:00:00

  • Prediction of HIV drug resistance from genotype with encoded three-dimensional protein structure.

    abstract:BACKGROUND:Drug resistance has become a severe challenge for treatment of HIV infections. Mutations accumulate in the HIV genome and make certain drugs ineffective. Prediction of resistance from genotype data is a valuable guide in choice of drugs for effective therapy. RESULTS:In order to improve the computational pr...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-S5-S1

    authors: Yu X,Weber IT,Harrison RW

    更新日期:2014-01-01 00:00:00

  • In silico discovery of transcription regulatory elements in Plasmodium falciparum.

    abstract:BACKGROUND:With the sequence of the Plasmodium falciparum genome and several global mRNA and protein life cycle expression profiling projects now completed, elucidating the underlying networks of transcriptional control important for the progression of the parasite life cycle is highly pertinent to the development of n...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-70

    authors: Young JA,Johnson JR,Benner C,Yan SF,Chen K,Le Roch KG,Zhou Y,Winzeler EA

    更新日期:2008-02-07 00:00:00

  • Endogenous siRNAs and piRNAs derived from transposable elements and genes in the malaria vector mosquito Anopheles gambiae.

    abstract:BACKGROUND:The siRNA and piRNA pathways have been shown in insects to be essential for regulation of gene expression and defence against exogenous and endogenous genetic elements (viruses and transposable elements). The vast majority of endogenous small RNAs produced by the siRNA and piRNA pathways originate from repet...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1436-1

    authors: Biryukova I,Ye T

    更新日期:2015-04-10 00:00:00

  • Widespread Alu repeat-driven expansion of consensus DR2 retinoic acid response elements during primate evolution.

    abstract:BACKGROUND:Nuclear receptors are hormone-regulated transcription factors whose signaling controls numerous aspects of development and physiology. Many receptors recognize DNA hormone response elements formed by direct repeats of RGKTCA motifs separated by 1 to 5 bp (DR1-DR5). Although many known such response elements ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-23

    authors: Laperriere D,Wang TT,White JH,Mader S

    更新日期:2007-01-19 00:00:00

  • Differential control of Zap1-regulated genes in response to zinc deficiency in Saccharomyces cerevisiae.

    abstract:BACKGROUND:The Zap1 transcription factor is a central player in the response of yeast to changes in zinc status. We previously used transcriptome profiling with DNA microarrays to identify 46 potential Zap1 target genes in the yeast genome. In this new study, we used complementary methods to identify additional Zap1 ta...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-370

    authors: Wu CY,Bird AJ,Chung LM,Newton MA,Winge DR,Eide DJ

    更新日期:2008-08-01 00:00:00

  • A gene sets approach for identifying prognostic gene signatures for outcome prediction.

    abstract:BACKGROUND:Gene expression profiling is a promising approach to better estimate patient prognosis; however, there are still unresolved problems, including little overlap among similarly developed gene sets and poor performance of a developed gene set in other datasets. RESULTS:We applied a gene sets approach to develo...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-177

    authors: Kim SY,Kim YS

    更新日期:2008-04-16 00:00:00

  • Identification of dysfunctional modules and disease genes in congenital heart disease by a network-based approach.

    abstract:BACKGROUND:The incidence of congenital heart disease (CHD) is continuously increasing among infants born alive nowadays, making it one of the leading causes of infant morbidity worldwide. Various studies suggest that both genetic and environmental factors lead to CHD, and therefore identifying its candidate genes and d...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-592

    authors: He D,Liu ZP,Chen L

    更新日期:2011-12-02 00:00:00

  • Parallel progressive multiple sequence alignment on reconfigurable meshes.

    abstract:BACKGROUND:One of the most fundamental and challenging tasks in bio-informatics is to identify related sequences and their hidden biological significance. The most popular and proven best practice method to accomplish this task is aligning multiple sequences together. However, multiple sequence alignment is a computing...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-S5-S4

    authors: Nguyen KD,Pan Y,Nong G

    更新日期:2011-12-23 00:00:00

  • A general pipeline for the development of anchor markers for comparative genomics in plants.

    abstract:BACKGROUND:Complete or near-complete genomic sequence information is presently only available for a few plant species representing a large phylogenetic diversity among plants. In order to effectively transfer this information to species lacking sequence information, comparative genomic tools need to be developed. Molec...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-7-207

    authors: Fredslund J,Madsen LH,Hougaard BK,Nielsen AM,Bertioli D,Sandal N,Stougaard J,Schauser L

    更新日期:2006-08-14 00:00:00

  • Biclustering of transcriptome sequencing data reveals human tissue-specific circular RNAs.

    abstract:BACKGROUND:Emerging evidence has been experimentally confirmed the tissue-specific expression of circRNAs (circRNAs). Global identification of human tissue-specific circRNAs is crucial for the functionality study, which facilitates the discovery of circRNAs for potential diagnostic biomarkers. RESULTS:In this study, c...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4335-9

    authors: Liu YC,Chiu YJ,Li JR,Sun CH,Liu CC,Huang HD

    更新日期:2018-01-19 00:00:00

  • Global analyses of Ceratocystis cacaofunesta mitochondria: from genome to proteome.

    abstract:BACKGROUND:The ascomycete fungus Ceratocystis cacaofunesta is the causal agent of wilt disease in cacao, which results in significant economic losses in the affected producing areas. Despite the economic importance of the Ceratocystis complex of species, no genomic data are available for any of its members. Given that ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-91

    authors: Ambrosio AB,do Nascimento LC,Oliveira BV,Teixeira PJ,Tiburcio RA,Toledo Thomazella DP,Leme AF,Carazzolle MF,Vidal RO,Mieczkowski P,Meinhardt LW,Pereira GA,Cabrera OG

    更新日期:2013-02-11 00:00:00

  • Conserved alternative and antisense transcripts at the programmed cell death 2 locus.

    abstract:BACKGROUND:The programmed cell death 2 (Pdcd2) gene on mouse chromosome 17 was evaluated as a member of a highly conserved synteny, a candidate for an imprinted locus, and a candidate for the Hybrid sterility 1 (Hst1) gene. RESULTS:New mouse transcripts were identified at this locus: an alternative Pdcd2 mRNA skipping...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-20

    authors: Mihola O,Forejt J,Trachtulec Z

    更新日期:2007-01-18 00:00:00

  • Genome analysis reveals three genomospecies in Mycobacterium abscessus.

    abstract:BACKGROUND:Mycobacterium abscessus complex, the third most frequent mycobacterial complex responsible for community- and health care-associated infections in developed countries, comprises of M. abscessus subsp. abscessus and M. abscessus subsp. bolletii reviously referred as Mycobacterium bolletii and Mycobacterium ma...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-359

    authors: Sassi M,Drancourt M

    更新日期:2014-05-12 00:00:00

  • Comprehensive analysis of genetic and evolutionary features of the hepatitis E virus.

    abstract:BACKGROUND:The hepatitis E virus (HEV) is the causative pathogen of hepatitis E, a global public health concern. HEV comprises 8 genotypes with a wide host range and geographic distribution. This study aims to determine the genetic factors influencing the molecular adaptive changes of HEV open reading frames (ORFs) and...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6100-8

    authors: Baha S,Behloul N,Liu Z,Wei W,Shi R,Meng J

    更新日期:2019-10-29 00:00:00

  • A phenomics-based approach for the detection and interpretation of shared genetic influences on 29 biochemical indices in southern Chinese men.

    abstract:BACKGROUND:Phenomics provides new technologies and platforms as a systematic phenome-genome approach. However, few studies have reported on the systematic mining of shared genetics among clinical biochemical indices based on phenomics methods, especially in China. This study aimed to apply phenomics to systematically e...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6363-0

    authors: Hu Y,Tan A,Yu L,Hou C,Kuang H,Wu Q,Su J,Zhou Q,Zhu Y,Zhang C,Wei W,Li L,Li W,Huang Y,Huang H,Xie X,Lu T,Zhang H,Yang X,Gao Y,Li T,Jiang Y,Mo Z

    更新日期:2019-12-16 00:00:00

  • A dynamic degradome landscape on miRNAs and their predicted targets in sugarcane caused by Sporisorium scitamineum stress.

    abstract:BACKGROUND:Sugarcane smut is a fungal disease caused by Sporisorium scitamineum. Cultivation of smut-resistant sugarcane varieties is the most effective way to control this disease. The interaction between sugarcane and S. scitamineum is a complex network system. However, to date, there is no report on the identificati...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5400-8

    authors: Su Y,Xiao X,Ling H,Huang N,Liu F,Su W,Zhang Y,Xu L,Muhammad K,Que Y

    更新日期:2019-01-18 00:00:00

  • Microarray analysis of response of Salmonella during infection of HLA-B27- transfected human macrophage-like U937 cells.

    abstract:BACKGROUND:Human leukocyte antigen (HLA)-B27 is strongly associated with the development of reactive arthritis (ReA) in humans after salmonellosis. Human monocytic U937 cells transfected with HLA-B27 are less able to eliminate intracellular Salmonella enterica serovar Enteritidis than those transfected with control HLA...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-456

    authors: Ge S,Danino V,He Q,Hinton JC,Granfors K

    更新日期:2010-07-30 00:00:00

  • TSCC: Two-Stage Combinatorial Clustering for virtual screening using protein-ligand interactions and physicochemical features.

    abstract:BACKGROUND:The increasing numbers of 3D compounds and protein complexes stored in databases contribute greatly to current advances in biotechnology, being employed in several pharmaceutical and industrial applications. However, screening and retrieving appropriate candidates as well as handling false positives presents...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-S4-S26

    authors: Clinciu DL,Chen YF,Ko CN,Lo CC,Yang JM

    更新日期:2010-12-02 00:00:00

  • Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding.

    abstract:BACKGROUND:The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar h...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-57

    authors: Ralph SG,Chun HJ,Cooper D,Kirkpatrick R,Kolosova N,Gunter L,Tuskan GA,Douglas CJ,Holt RA,Jones SJ,Marra MA,Bohlmann J

    更新日期:2008-01-29 00:00:00

  • A multi-treatment experimental system to examine photosynthetic differentiation in the maize leaf.

    abstract:BACKGROUND:The establishment of C4 photosynthesis in maize is associated with differential accumulation of gene transcripts and proteins between bundle sheath and mesophyll photosynthetic cell types. We have physically separated photosynthetic cell types in the leaf blade to characterize differences in gene expression ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-12

    authors: Sawers RJ,Liu P,Anufrikova K,Hwang JT,Brutnell TP

    更新日期:2007-01-09 00:00:00

  • Identification of three extra-chromosomal replicons in Leptospira pathogenic strain and development of new shuttle vectors.

    abstract:BACKGROUND:The genome of pathogenic Leptospira interrogans contains two chromosomes. Plasmids and prophages are known to play specific roles in gene transfer in bacteria and can potentially serve as efficient genetic tools in these organisms. Although plasmids and prophage remnants have recently been reported in Leptos...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1321-y

    authors: Zhu W,Wang J,Zhu Y,Tang B,Zhang Y,He P,Zhang Y,Liu B,Guo X,Zhao G,Qin J

    更新日期:2015-02-15 00:00:00

  • Markedly different genome arrangements between serotype a strains and serotypes b or c strains of Aggregatibacter actinomycetemcomitans.

    abstract:BACKGROUND:Bacterial phenotype may be profoundly affected by the physical arrangement of their genes in the genome. The Gram-negative species Aggregatibacter actinomycetemcomitans is a major etiologic agent of human periodontitis. Individual clonal types of A. actinomycetemcomitans may exhibit variable virulence and di...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-489

    authors: Kittichotirat W,Bumgarner R,Chen C

    更新日期:2010-09-08 00:00:00

  • Highly-multiplexed SNP genotyping for genetic mapping and germplasm diversity studies in pea.

    abstract:BACKGROUND:Single Nucleotide Polymorphisms (SNPs) can be used as genetic markers for applications such as genetic diversity studies or genetic mapping. New technologies now allow genotyping hundreds to thousands of SNPs in a single reaction.In order to evaluate the potential of these technologies in pea, we selected a ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-468

    authors: Deulvot C,Charrel H,Marty A,Jacquin F,Donnadieu C,Lejeune-Hénaut I,Burstin J,Aubert G

    更新日期:2010-08-11 00:00:00

  • High throughput RNA sequencing of a hybrid maize and its parents shows different mechanisms responsive to nitrogen limitation.

    abstract:BACKGROUND:Development of crop varieties with high nitrogen use efficiency (NUE) is crucial for minimizing N loss, reducing environmental pollution and decreasing input cost. Maize is one of the most important crops cultivated worldwide and its productivity is closely linked to the amount of fertilizer used. A survey o...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-77

    authors: Bi YM,Meyer A,Downs GS,Shi X,El-Kereamy A,Lukens L,Rothstein SJ

    更新日期:2014-01-28 00:00:00

  • MHC2SKpan: a novel kernel based approach for pan-specific MHC class II peptide binding prediction.

    abstract:BACKGROUND:Computational methods for the prediction of Major Histocompatibility Complex (MHC) class II binding peptides play an important role in facilitating the understanding of immune recognition and the process of epitope discovery. To develop an effective computational method, we need to consider two important cha...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-S5-S11

    authors: Guo L,Luo C,Zhu S

    更新日期:2013-01-01 00:00:00

  • High-throughput cis-regulatory element discovery in the vector mosquito Aedes aegypti.

    abstract:BACKGROUND:Despite substantial progress in mosquito genomic and genetic research, few cis-regulatory elements (CREs), DNA sequences that control gene expression, have been identified in mosquitoes or other non-model insects. Formaldehyde-assisted isolation of regulatory elements paired with DNA sequencing, FAIRE-seq, i...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2468-x

    authors: Behura SK,Sarro J,Li P,Mysore K,Severson DW,Emrich SJ,Duman-Scheel M

    更新日期:2016-05-10 00:00:00