Prediction of enhancer-promoter interactions via natural language processing.

Abstract:

BACKGROUND:Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. RESULTS:We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. CONCLUSIONS:EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Zeng W,Wu M,Jiang R

doi

10.1186/s12864-018-4459-6

subject

Has Abstract

pub_date

2018-05-09 00:00:00

pages

84

issue

Suppl 2

issn

1471-2164

pii

10.1186/s12864-018-4459-6

journal_volume

19

pub_type

杂志文章
  • Proteome-wide analysis of Anopheles culicifacies mosquito midgut: new insights into the mechanism of refractoriness.

    abstract:BACKGROUND:Midgut invasion, a major bottleneck for malaria parasites transmission is considered as a potential target for vector-parasite interaction studies. New intervention strategies are required to explore the midgut proteins and their potential role in refractoriness for malaria control in Anopheles mosquitoes. T...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4729-3

    authors: Vijay S,Rawal R,Kadian K,Singh J,Adak T,Sharma A

    更新日期:2018-05-08 00:00:00

  • Genome-wide expression profiling of leaves and roots of watermelon in response to low nitrogen.

    abstract:BACKGROUND:Nitrogen (N) is a key macronutrient required for plant growth and development. In this study, watermelon plants were grown under hydroponic conditions at 0.2 mM N, 4.5 mM N, and 9 mM N for 14 days. RESULTS:Dry weight and photosynthetic assimilation at low N (0.2 mM) was reduced by 29 and 74% compared with h...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4856-x

    authors: Nawaz MA,Chen C,Shireen F,Zheng Z,Sohail H,Afzal M,Ali MA,Bie Z,Huang Y

    更新日期:2018-06-13 00:00:00

  • High-throughput sequencing of circRNAs reveals novel insights into mechanisms of nigericin in pancreatic cancer.

    abstract:BACKGROUND:Our previous study had proved that nigericin could reduce colorectal cancer cell proliferation in dose- and time-dependent manners by targeting Wnt/β-catenin signaling. To better elucidate its potential anti-cancer mechanism, two pancreatic cancer (PC) cell lines were exposed to increasing concentrations of ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6032-3

    authors: Xu Z,Shen J,Hua S,Wan D,Chen Q,Han Y,Ren R,Liu F,Du Z,Guo X,Shi J,Zhi Q

    更新日期:2019-09-18 00:00:00

  • Delineation of condition specific Cis- and Trans-acting elements in plant promoters under various Endo- and exogenous stimuli.

    abstract:BACKGROUND:Transcription factors (TFs) play essential roles during plant development and response to environmental stresses. However, the relationships among transcription factors, cis-acting elements and target gene expression under endo- and exogenous stimuli have not been systematically characterized. RESULTS:Here,...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4469-4

    authors: Chow CN,Chiang-Hsieh YF,Chien CH,Zheng HQ,Lee TY,Wu NY,Tseng KC,Hou PF,Chang WC

    更新日期:2018-05-09 00:00:00

  • Analysis of qPCR reference gene stability determination methods and a practical approach for efficiency calculation on a turbot (Scophthalmus maximus) gonad dataset.

    abstract:BACKGROUND:Gene expression analysis by reverse transcription quantitative PCR (qPCR) is the most widely used method for analyzing the expression of a moderate number of genes and also for the validation of microarray results. Several issues are crucial for a successful qPCR study, particularly the selection of internal...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-648

    authors: Robledo D,Hernández-Urcera J,Cal RM,Pardo BG,Sánchez L,Martínez P,Viñas A

    更新日期:2014-08-04 00:00:00

  • RNA-Seq quantification of the human small airway epithelium transcriptome.

    abstract:BACKGROUND:The small airway epithelium (SAE), the cell population that covers the human airway surface from the 6th generation of airway branching to the alveoli, is the major site of lung disease caused by smoking. The focus of this study is to provide quantitative assessment of the SAE transcriptome in the resting st...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-82

    authors: Hackett NR,Butler MW,Shaykhiev R,Salit J,Omberg L,Rodriguez-Flores JL,Mezey JG,Strulovici-Barel Y,Wang G,Didon L,Crystal RG

    更新日期:2012-02-29 00:00:00

  • The sulfur/sulfonates transport systems in Xanthomonas citri pv. citri.

    abstract:BACKGROUND:The Xanthomonas citri pv. citri (X. citri) is a phytopathogenic bacterium that infects different species of citrus plants where it causes canker disease. The adaptation to different habitats is related to the ability of the cells to metabolize and to assimilate diverse compounds, including sulfur, an essenti...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1736-5

    authors: Pereira CT,Moutran A,Fessel M,Balan A

    更新日期:2015-07-14 00:00:00

  • Transcriptomic profiling revealed key signaling pathways for cold tolerance and acclimation of two carp species.

    abstract:BACKGROUND:Closely related species of the carp family (Cyprinidae) have evolved distinctive abilities to survive under cold stress, but molecular mechanisms underlying the generation of cold resistance remain largely unknown. In this study, we compared transcriptomic profiles of two carp species to identify key factors...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-06946-8

    authors: Ge G,Long Y,Shi L,Ren J,Yan J,Li C,Li Q,Cui Z

    更新日期:2020-08-05 00:00:00

  • Diversity in domain architectures of Ser/Thr kinases and their homologues in prokaryotes.

    abstract:BACKGROUND:Ser/Thr/Tyr kinases (STYKs) commonly found in eukaryotes have been recently reported in many bacterial species. Recent studies elucidating their cellular functions have established their roles in bacterial growth and development. However functions of a large number of bacterial STYKs still remain elusive. Th...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-6-129

    authors: Krupa A,Srinivasan N

    更新日期:2005-09-19 00:00:00

  • Phylogeny, Divergent Evolution, and Speciation of Sulfur-Oxidizing Acidithiobacillus Populations.

    abstract:BACKGROUND:Habitats colonized by acidophiles as an ideal physical barrier may induce genetic exchange of microbial members within the common communities, but little is known about how species in extremely acidic environments diverge and evolve. RESULTS:Using the acidophilic sulfur-oxidizer Acidithiobacillus as a case ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5827-6

    authors: Zhang X,Liu X,Li L,Wei G,Zhang D,Liang Y,Miao B

    更新日期:2019-05-30 00:00:00

  • Rapid single cell evaluation of human disease and disorder targets using REVEAL: SingleCell™.

    abstract:BACKGROUND:Single-cell (sc) sequencing performs unbiased profiling of individual cells and enables evaluation of less prevalent cellular populations, often missed using bulk sequencing. However, the scale and the complexity of the sc datasets poses a great challenge in its utility and this problem is further exacerbate...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-07300-8

    authors: Kumar N,Golhar R,Sharma KS,Holloway JL,Sarangi S,Neuhaus I,Walsh AM,Pitluk ZW

    更新日期:2021-01-06 00:00:00

  • Genomic comparison of multi-drug resistant invasive and colonizing Acinetobacter baumannii isolated from diverse human body sites reveals genomic plasticity.

    abstract:BACKGROUND:Acinetobacter baumannii has recently emerged as a significant global pathogen, with a surprisingly rapid acquisition of antibiotic resistance and spread within hospitals and health care institutions. This study examines the genomic content of three A. baumannii strains isolated from distinct body sites. Isol...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-291

    authors: Sahl JW,Johnson JK,Harris AD,Phillippy AM,Hsiao WW,Thom KA,Rasko DA

    更新日期:2011-06-04 00:00:00

  • Effects of Alu elements on global nucleosome positioning in the human genome.

    abstract:BACKGROUND:Understanding the genome sequence-specific positioning of nucleosomes is essential to understand various cellular processes, such as transcriptional regulation and replication. As a typical example, the 10-bp periodicity of AA/TT and GC dinucleotides has been reported in several species, but it is still uncl...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-309

    authors: Tanaka Y,Yamashita R,Suzuki Y,Nakai K

    更新日期:2010-05-17 00:00:00

  • Metabolic analysis of the soil microbe Dechloromonas aromatica str. RCB: indications of a surprisingly complex life-style and cryptic anaerobic pathways for aromatic degradation.

    abstract:BACKGROUND:Initial interest in Dechloromonas aromatica strain RCB arose from its ability to anaerobically degrade benzene. It is also able to reduce perchlorate and oxidize chlorobenzoate, toluene, and xylene, creating interest in using this organism for bioremediation. Little physiological data has been published for ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-351

    authors: Salinero KK,Keller K,Feil WS,Feil H,Trong S,Di Bartolo G,Lapidus A

    更新日期:2009-08-03 00:00:00

  • The repetitive component of the sunflower genome as shown by different procedures for assembling next generation sequencing reads.

    abstract:BACKGROUND:Next generation sequencing provides a powerful tool to study genome structure in species whose genomes are far from being completely sequenced. In this work we describe and compare different computational approaches to evaluate the repetitive component of the genome of sunflower, by using medium/low coverage...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-686

    authors: Natali L,Cossu RM,Barghini E,Giordani T,Buti M,Mascagni F,Morgante M,Gill N,Kane NC,Rieseberg L,Cavallini A

    更新日期:2013-10-06 00:00:00

  • Transcriptomic profiling of Burkholderia phymatum STM815, Cupriavidus taiwanensis LMG19424 and Rhizobium mesoamericanum STM3625 in response to Mimosa pudica root exudates illuminates the molecular basis of their nodulation competitiveness and symbiotic ev

    abstract:BACKGROUND:Rhizobial symbionts belong to the classes Alphaproteobacteria and Betaproteobacteria (called "alpha" and "beta"-rhizobia). Most knowledge on the genetic basis of symbiosis is based on model strains belonging to alpha-rhizobia. Mimosa pudica is a legume that offers an excellent opportunity to study the adapta...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4487-2

    authors: Klonowska A,Melkonian R,Miché L,Tisseyre P,Moulin L

    更新日期:2018-01-30 00:00:00

  • Multifunctional polyketide synthase genes identified by genomic survey of the symbiotic dinoflagellate, Symbiodinium minutum.

    abstract:BACKGROUND:Dinoflagellates are unicellular marine and freshwater eukaryotes. They possess large nuclear genomes (1.5-245 gigabases) and produce structurally unique and biologically active polyketide secondary metabolites. Although polyketide biosynthesis is well studied in terrestrial and freshwater organisms, only rec...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-2195-8

    authors: Beedessee G,Hisata K,Roy MC,Satoh N,Shoguchi E

    更新日期:2015-11-14 00:00:00

  • Acute systemic inflammatory response to lipopolysaccharide stimulation in pigs divergently selected for residual feed intake.

    abstract:BACKGROUND:It is unclear whether improving feed efficiency by selection for low residual feed intake (RFI) compromises pigs' immunocompetence. Here, we aimed at investigating whether pig lines divergently selected for RFI had different inflammatory responses to lipopolysaccharide (LPS) exposure, regarding to clinical p...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6127-x

    authors: Liu H,Feye KM,Nguyen YT,Rakhshandeh A,Loving CL,Dekkers JCM,Gabler NK,Tuggle CK

    更新日期:2019-10-11 00:00:00

  • Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data.

    abstract:BACKGROUND:High throughput experiments resulted in many genomic datasets and hundreds of candidate disease genes. To discover the real disease genes from a set of candidate genes, computational methods have been proposed and worked on various types of genomic data sources. As a single source of genomic data is prone of...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-S7-S27

    authors: Li Y,Li J

    更新日期:2012-01-01 00:00:00

  • An integrated approach to characterize transcription factor and microRNA regulatory networks involved in Schwann cell response to peripheral nerve injury.

    abstract:BACKGROUND:The regenerative response of Schwann cells after peripheral nerve injury is a critical process directly related to the pathophysiology of a number of neurodegenerative diseases. This SC injury response is dependent on an intricate gene regulatory program coordinated by a number of transcription factors and m...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-84

    authors: Chang LW,Viader A,Varghese N,Payton JE,Milbrandt J,Nagarajan R

    更新日期:2013-02-06 00:00:00

  • DNA methylation regulates discrimination of enhancers from promoters through a H3K4me1-H3K4me3 seesaw mechanism.

    abstract:BACKGROUND:DNA methylation at promoters is largely correlated with inhibition of gene expression. However, the role of DNA methylation at enhancers is not fully understood, although a crosstalk with chromatin marks is expected. Actually, there exist contradictory reports about positive and negative correlations between...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4353-7

    authors: Sharifi-Zarchi A,Gerovska D,Adachi K,Totonchi M,Pezeshk H,Taft RJ,Schöler HR,Chitsaz H,Sadeghi M,Baharvand H,Araúzo-Bravo MJ

    更新日期:2017-12-12 00:00:00

  • CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences.

    abstract:BACKGROUND:The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annota...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-715

    authors: Liu C,Shi L,Zhu Y,Chen H,Zhang J,Lin X,Guan X

    更新日期:2012-12-20 00:00:00

  • Stringent comparative sequence analysis reveals SOX10 as a putative inhibitor of glial cell differentiation.

    abstract:BACKGROUND:The transcription factor SOX10 is essential for all stages of Schwann cell development including myelination. SOX10 cooperates with other transcription factors to activate the expression of key myelin genes in Schwann cells and is therefore a context-dependent, pro-myelination transcription factor. As such, ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3167-3

    authors: Gopinath C,Law WD,Rodríguez-Molina JF,Prasad AB,Song L,Crawford GE,Mullikin JC,Svaren J,Antonellis A

    更新日期:2016-11-07 00:00:00

  • LPS-treatment of bovine endometrial epithelial cells causes differential DNA methylation of genes associated with inflammation and endometrial function.

    abstract:BACKGROUND:Lipopolysaccharide (LPS) endotoxin stimulates pro-inflammatory pathways and is a key player in the pathological mechanisms involved in the development of endometritis. This study aimed to investigate LPS-induced DNA methylation changes in bovine endometrial epithelial cells (bEECs), which may affect endometr...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-06777-7

    authors: Jhamat N,Niazi A,Guo Y,Chanrot M,Ivanova E,Kelsey G,Bongcam-Rudloff E,Andersson G,Humblot P

    更新日期:2020-06-03 00:00:00

  • The Babesia bovis gene and promoter model: an update from full-length EST analysis.

    abstract:BACKGROUND:Babesia bovis is an apicomplexan parasite that causes babesiosis in infected cattle. Genomes of pathogens contain promising information that can facilitate the development of methods for controlling infections. Although the genome of B. bovis is publically available, annotated gene models are not highly reli...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-678

    authors: Yamagishi J,Wakaguri H,Yokoyama N,Yamashita R,Suzuki Y,Xuan X,Igarashi I

    更新日期:2014-08-13 00:00:00

  • Alternative splicing enriched cDNA libraries identify breast cancer-associated transcripts.

    abstract:BACKGROUND:Alternative splicing (AS) is a central mechanism in the generation of genomic complexity and is a major contributor to transcriptome and proteome diversity. Alterations of the splicing process can lead to deregulation of crucial cellular processes and have been associated with a large spectrum of human disea...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-S5-S4

    authors: Ferreira EN,Rangel MC,Galante PF,de Souza JE,Molina GC,de Souza SJ,Carraro DM

    更新日期:2010-12-22 00:00:00

  • Expansion of CORE-SINEs in the genome of the Tasmanian devil.

    abstract:BACKGROUND:The genome of the carnivorous marsupial, the Tasmanian devil (Sarcophilus harrisii, Order: Dasyuromorphia), was sequenced in the hopes of finding a cure for or gaining a better understanding of the contagious devil facial tumor disease that is threatening the species' survival. To better understand the Tasma...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-172

    authors: Nilsson MA,Janke A,Murchison EP,Ning Z,Hallström BM

    更新日期:2012-05-06 00:00:00

  • Correction to: An Arabidopsis introgression zone studied at high spatio-temporal resolution: interglacial and multiple genetic contact exemplified using whole nuclear and plastid genomes.

    abstract::ᅟ: Upon publication of the original article [1], the authors had flagged that there was an error in Fig. 1c, as the key in this figure was displaying incorrectly. The colours had not displayed in the key in the final published article, and instead appear as plain white. ...

    journal_title:BMC genomics

    pub_type: 杂志文章,已发布勘误

    doi:10.1186/s12864-018-4614-0

    authors: Hohmann N,Koch MA

    更新日期:2018-04-11 00:00:00

  • Stoichiometric gene-to-reaction associations enhance model-driven analysis performance: Metabolic response to chronic exposure to Aldrin in prostate cancer.

    abstract:BACKGROUND:Genome-scale metabolic models (GSMM) integrating transcriptomics have been widely used to study cancer metabolism. This integration is achieved through logical rules that describe the association between genes, proteins, and reactions (GPRs). However, current gene-to-reaction formulation lacks the stoichiome...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5979-4

    authors: Marín de Mas I,Torrents L,Bedia C,Nielsen LK,Cascante M,Tauler R

    更新日期:2019-08-15 00:00:00

  • Identification of genomic aberrations in hemangioblastoma by droplet digital PCR and SNP microarray highlights novel candidate genes and pathways for pathogenesis.

    abstract:BACKGROUND:The genetic mechanisms underlying hemangioblastoma development are still largely unknown. We used high-resolution single nucleotide polymorphism microarrays and droplet digital PCR analysis to detect copy number variations (CNVs) in total of 45 hemangioblastoma tumors. RESULTS:We identified 94 CNVs with a m...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2370-6

    authors: Mehrian-Shai R,Yalon M,Moshe I,Barshack I,Nass D,Jacob J,Dor C,Reichardt JK,Constantini S,Toren A

    更新日期:2016-01-14 00:00:00