Abstract:
BACKGROUND:Existing functional description of genes are categorical, discrete, and mostly through manual process. In this work, we explore the idea of gene embedding, distributed representation of genes, in the spirit of word embedding. RESULTS:From a pure data-driven fashion, we trained a 200-dimension vector representation of all human genes, using gene co-expression patterns in 984 data sets from the GEO databases. These vectors capture functional relatedness of genes in terms of recovering known pathways - the average inner product (similarity) of genes within a pathway is 1.52X greater than that of random genes. Using t-SNE, we produced a gene co-expression map that shows local concentrations of tissue specific genes. We also illustrated the usefulness of the embedded gene vectors, laden with rich information on gene co-expression patterns, in tasks such as gene-gene interaction prediction. CONCLUSIONS:We proposed a machine learning method that utilizes transcriptome-wide gene co-expression to generate a distributed representation of genes. We further demonstrated the utility of our distribution by predicting gene-gene interaction based solely on gene names. The distributed representation of genes could be useful for more bioinformatics applications.
journal_name
BMC Genomicsjournal_title
BMC genomicsauthors
Du J,Jia P,Dai Y,Tao C,Zhao Z,Zhi Ddoi
10.1186/s12864-018-5370-xsubject
Has Abstractpub_date
2019-02-04 00:00:00pages
82issue
Suppl 1issn
1471-2164pii
10.1186/s12864-018-5370-xjournal_volume
20pub_type
杂志文章相关文献
BMC GENOMICS文献大全abstract:BACKGROUND:The term pan-genome was proposed to denominate collections of genomic sequences jointly analyzed or used as a reference. The constant growth of genomic data intensifies development of data structures and algorithms to investigate pan-genomes efficiently. RESULTS:This work focuses on providing a tool for dis...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-020-6610-4
更新日期:2020-04-16 00:00:00
abstract:BACKGROUND:The mitochondrial genomes of higher plants vary remarkably in size, structure and sequence content, as demonstrated by the accumulation and activity of repetitive DNA sequences. Incompatibility between mitochondrial genome and nuclear genome leads to non-functional male reproductive organs and results in cyt...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-5122-y
更新日期:2018-10-26 00:00:00
abstract:BACKGROUND:As is true for many other antibiotic-resistant Gram-negative pathogens, members of the Burkholderia cepacia complex (BCC) are currently being assessed for their susceptibility to phage therapy as an antimicrobial treatment. The objective of this study was to perform genomic and limited functional characteriz...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-14-574
更新日期:2013-08-27 00:00:00
abstract:BACKGROUND:A key developmental transformation in the life of all vertebrates is the transition to sexual maturity, whereby individuals are capable of reproducing for the first time. In the farming of Atlantic salmon, early maturation prior to harvest size has serious negative production impacts. RESULTS:We report geno...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-019-5525-4
更新日期:2019-02-15 00:00:00
abstract:BACKGROUND:Drug resistance has become a severe challenge for treatment of HIV infections. Mutations accumulate in the HIV genome and make certain drugs ineffective. Prediction of resistance from genotype data is a valuable guide in choice of drugs for effective therapy. RESULTS:In order to improve the computational pr...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-S5-S1
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:With the sequence of the Plasmodium falciparum genome and several global mRNA and protein life cycle expression profiling projects now completed, elucidating the underlying networks of transcriptional control important for the progression of the parasite life cycle is highly pertinent to the development of n...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-9-70
更新日期:2008-02-07 00:00:00
abstract:BACKGROUND:The siRNA and piRNA pathways have been shown in insects to be essential for regulation of gene expression and defence against exogenous and endogenous genetic elements (viruses and transposable elements). The vast majority of endogenous small RNAs produced by the siRNA and piRNA pathways originate from repet...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-015-1436-1
更新日期:2015-04-10 00:00:00
abstract:BACKGROUND:Nuclear receptors are hormone-regulated transcription factors whose signaling controls numerous aspects of development and physiology. Many receptors recognize DNA hormone response elements formed by direct repeats of RGKTCA motifs separated by 1 to 5 bp (DR1-DR5). Although many known such response elements ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-8-23
更新日期:2007-01-19 00:00:00
abstract:BACKGROUND:The Zap1 transcription factor is a central player in the response of yeast to changes in zinc status. We previously used transcriptome profiling with DNA microarrays to identify 46 potential Zap1 target genes in the yeast genome. In this new study, we used complementary methods to identify additional Zap1 ta...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-9-370
更新日期:2008-08-01 00:00:00
abstract:BACKGROUND:Gene expression profiling is a promising approach to better estimate patient prognosis; however, there are still unresolved problems, including little overlap among similarly developed gene sets and poor performance of a developed gene set in other datasets. RESULTS:We applied a gene sets approach to develo...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-9-177
更新日期:2008-04-16 00:00:00
abstract:BACKGROUND:The incidence of congenital heart disease (CHD) is continuously increasing among infants born alive nowadays, making it one of the leading causes of infant morbidity worldwide. Various studies suggest that both genetic and environmental factors lead to CHD, and therefore identifying its candidate genes and d...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-12-592
更新日期:2011-12-02 00:00:00
abstract:BACKGROUND:One of the most fundamental and challenging tasks in bio-informatics is to identify related sequences and their hidden biological significance. The most popular and proven best practice method to accomplish this task is aligning multiple sequences together. However, multiple sequence alignment is a computing...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-12-S5-S4
更新日期:2011-12-23 00:00:00
abstract:BACKGROUND:Complete or near-complete genomic sequence information is presently only available for a few plant species representing a large phylogenetic diversity among plants. In order to effectively transfer this information to species lacking sequence information, comparative genomic tools need to be developed. Molec...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-7-207
更新日期:2006-08-14 00:00:00
abstract:BACKGROUND:Emerging evidence has been experimentally confirmed the tissue-specific expression of circRNAs (circRNAs). Global identification of human tissue-specific circRNAs is crucial for the functionality study, which facilitates the discovery of circRNAs for potential diagnostic biomarkers. RESULTS:In this study, c...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-017-4335-9
更新日期:2018-01-19 00:00:00
abstract:BACKGROUND:The ascomycete fungus Ceratocystis cacaofunesta is the causal agent of wilt disease in cacao, which results in significant economic losses in the affected producing areas. Despite the economic importance of the Ceratocystis complex of species, no genomic data are available for any of its members. Given that ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-14-91
更新日期:2013-02-11 00:00:00
abstract:BACKGROUND:The programmed cell death 2 (Pdcd2) gene on mouse chromosome 17 was evaluated as a member of a highly conserved synteny, a candidate for an imprinted locus, and a candidate for the Hybrid sterility 1 (Hst1) gene. RESULTS:New mouse transcripts were identified at this locus: an alternative Pdcd2 mRNA skipping...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-8-20
更新日期:2007-01-18 00:00:00
abstract:BACKGROUND:Mycobacterium abscessus complex, the third most frequent mycobacterial complex responsible for community- and health care-associated infections in developed countries, comprises of M. abscessus subsp. abscessus and M. abscessus subsp. bolletii reviously referred as Mycobacterium bolletii and Mycobacterium ma...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-359
更新日期:2014-05-12 00:00:00
abstract:BACKGROUND:The hepatitis E virus (HEV) is the causative pathogen of hepatitis E, a global public health concern. HEV comprises 8 genotypes with a wide host range and geographic distribution. This study aims to determine the genetic factors influencing the molecular adaptive changes of HEV open reading frames (ORFs) and...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-019-6100-8
更新日期:2019-10-29 00:00:00
abstract:BACKGROUND:Phenomics provides new technologies and platforms as a systematic phenome-genome approach. However, few studies have reported on the systematic mining of shared genetics among clinical biochemical indices based on phenomics methods, especially in China. This study aimed to apply phenomics to systematically e...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-019-6363-0
更新日期:2019-12-16 00:00:00
abstract:BACKGROUND:Sugarcane smut is a fungal disease caused by Sporisorium scitamineum. Cultivation of smut-resistant sugarcane varieties is the most effective way to control this disease. The interaction between sugarcane and S. scitamineum is a complex network system. However, to date, there is no report on the identificati...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-5400-8
更新日期:2019-01-18 00:00:00
abstract:BACKGROUND:Human leukocyte antigen (HLA)-B27 is strongly associated with the development of reactive arthritis (ReA) in humans after salmonellosis. Human monocytic U937 cells transfected with HLA-B27 are less able to eliminate intracellular Salmonella enterica serovar Enteritidis than those transfected with control HLA...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-11-456
更新日期:2010-07-30 00:00:00
abstract:BACKGROUND:The increasing numbers of 3D compounds and protein complexes stored in databases contribute greatly to current advances in biotechnology, being employed in several pharmaceutical and industrial applications. However, screening and retrieving appropriate candidates as well as handling false positives presents...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-11-S4-S26
更新日期:2010-12-02 00:00:00
abstract:BACKGROUND:The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar h...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-9-57
更新日期:2008-01-29 00:00:00
abstract:BACKGROUND:The establishment of C4 photosynthesis in maize is associated with differential accumulation of gene transcripts and proteins between bundle sheath and mesophyll photosynthetic cell types. We have physically separated photosynthetic cell types in the leaf blade to characterize differences in gene expression ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-8-12
更新日期:2007-01-09 00:00:00
abstract:BACKGROUND:The genome of pathogenic Leptospira interrogans contains two chromosomes. Plasmids and prophages are known to play specific roles in gene transfer in bacteria and can potentially serve as efficient genetic tools in these organisms. Although plasmids and prophage remnants have recently been reported in Leptos...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-015-1321-y
更新日期:2015-02-15 00:00:00
abstract:BACKGROUND:Bacterial phenotype may be profoundly affected by the physical arrangement of their genes in the genome. The Gram-negative species Aggregatibacter actinomycetemcomitans is a major etiologic agent of human periodontitis. Individual clonal types of A. actinomycetemcomitans may exhibit variable virulence and di...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-11-489
更新日期:2010-09-08 00:00:00
abstract:BACKGROUND:Single Nucleotide Polymorphisms (SNPs) can be used as genetic markers for applications such as genetic diversity studies or genetic mapping. New technologies now allow genotyping hundreds to thousands of SNPs in a single reaction.In order to evaluate the potential of these technologies in pea, we selected a ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-11-468
更新日期:2010-08-11 00:00:00
abstract:BACKGROUND:Development of crop varieties with high nitrogen use efficiency (NUE) is crucial for minimizing N loss, reducing environmental pollution and decreasing input cost. Maize is one of the most important crops cultivated worldwide and its productivity is closely linked to the amount of fertilizer used. A survey o...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-77
更新日期:2014-01-28 00:00:00
abstract:BACKGROUND:Computational methods for the prediction of Major Histocompatibility Complex (MHC) class II binding peptides play an important role in facilitating the understanding of immune recognition and the process of epitope discovery. To develop an effective computational method, we need to consider two important cha...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-14-S5-S11
更新日期:2013-01-01 00:00:00
abstract:BACKGROUND:Despite substantial progress in mosquito genomic and genetic research, few cis-regulatory elements (CREs), DNA sequences that control gene expression, have been identified in mosquitoes or other non-model insects. Formaldehyde-assisted isolation of regulatory elements paired with DNA sequencing, FAIRE-seq, i...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-016-2468-x
更新日期:2016-05-10 00:00:00