GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences.

Abstract:

BACKGROUND:As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is < 50%, CpG obs /CpG exp varies, and the length of CGI ranges from eight nucleotides to a few thousand of nucleotides. It implies that CGI detection is not just a straightly statistical task and some unrevealed rules probably are hidden. RESULTS:A novel Gaussian model, GaussianCpG, is developed for detection of CpG islands on human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection. CONCLUSIONS:Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Yu N,Guo X,Zelikovsky A,Pan Y

doi

10.1186/s12864-017-3731-5

subject

Has Abstract

pub_date

2017-05-24 00:00:00

pages

392

issue

Suppl 4

issn

1471-2164

pii

10.1186/s12864-017-3731-5

journal_volume

18

pub_type

杂志文章
  • Rapid single cell evaluation of human disease and disorder targets using REVEAL: SingleCell™.

    abstract:BACKGROUND:Single-cell (sc) sequencing performs unbiased profiling of individual cells and enables evaluation of less prevalent cellular populations, often missed using bulk sequencing. However, the scale and the complexity of the sc datasets poses a great challenge in its utility and this problem is further exacerbate...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-07300-8

    authors: Kumar N,Golhar R,Sharma KS,Holloway JL,Sarangi S,Neuhaus I,Walsh AM,Pitluk ZW

    更新日期:2021-01-06 00:00:00

  • Visualizing spatiotemporal dynamics of apoptosis after G1 arrest by human T cell leukemia virus type 1 Tax and insights into gene expression changes using microarray-based gene expression analysis.

    abstract:BACKGROUND:Human T cell leukemia virus type 1 (HTLV-1) Tax is a potent activator of viral and cellular gene expression that interacts with a number of cellular proteins. Many reports show that Tax is capable of regulating cell cycle progression and apoptosis both positively and negatively. However, it still remains to ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-275

    authors: Arainga M,Murakami H,Aida Y

    更新日期:2012-06-22 00:00:00

  • Clostridium sticklandii, a specialist in amino acid degradation:revisiting its metabolism through its genome sequence.

    abstract:BACKGROUND:Clostridium sticklandii belongs to a cluster of non-pathogenic proteolytic clostridia which utilize amino acids as carbon and energy sources. Isolated by T.C. Stadtman in 1954, it has been generally regarded as a "gold mine" for novel biochemical reactions and is used as a model organism for studying metabol...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-555

    authors: Fonknechten N,Chaussonnerie S,Tricot S,Lajus A,Andreesen JR,Perchat N,Pelletier E,Gouyvenoux M,Barbe V,Salanoubat M,Le Paslier D,Weissenbach J,Cohen GN,Kreimeyer A

    更新日期:2010-10-11 00:00:00

  • Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies.

    abstract:BACKGROUND:By assaying hundreds of thousands of single nucleotide polymorphisms, genome wide association studies (GWAS) allow for a powerful, unbiased review of the entire genome to localize common genetic variants that influence health and disease. Although it is widely recognized that some correction for multiple tes...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-516

    authors: Duggal P,Gillanders EM,Holmes TN,Bailey-Wilson JE

    更新日期:2008-10-31 00:00:00

  • Cluster analysis of replicated alternative polyadenylation data using canonical correlation analysis.

    abstract:BACKGROUND:Alternative polyadenylation (APA) has emerged as a pervasive mechanism that contributes to the transcriptome complexity and dynamics of gene regulation. The current tsunami of whole genome poly(A) site data from various conditions generated by 3' end sequencing provides a valuable data source for the study o...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5433-7

    authors: Ye W,Long Y,Ji G,Su Y,Ye P,Fu H,Wu X

    更新日期:2019-01-22 00:00:00

  • Small RNAs from plants, bacteria and fungi within the order Hypocreales are ubiquitous in human plasma.

    abstract:BACKGROUND:The human microbiome plays a significant role in maintaining normal physiology. Changes in its composition have been associated with bowel disease, metabolic disorders and atherosclerosis. Sequences of microbial origin have been observed within small RNA sequencing data obtained from blood samples. The aim o...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-933

    authors: Beatty M,Guduric-Fuchs J,Brown E,Bridgett S,Chakravarthy U,Hogg RE,Simpson DA

    更新日期:2014-10-25 00:00:00

  • A deconvolution method and its application in analyzing the cellular fractions in acute myeloid leukemia samples.

    abstract:BACKGROUND:The identification of cell type-specific genes (markers) is an essential step for the deconvolution of the cellular fractions, primarily, from the gene expression data of a bulk sample. However, the genes with significant changes identified by pair-wise comparisons cannot indeed represent the specificity of ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-06888-1

    authors: Li H,Sharma A,Ming W,Sun X,Liu H

    更新日期:2020-09-23 00:00:00

  • H2B ubiquitylation is part of chromatin architecture that marks exon-intron structure in budding yeast.

    abstract:BACKGROUND:The packaging of DNA into chromatin regulates transcription from initiation through 3' end processing. One aspect of transcription in which chromatin plays a poorly understood role is the co-transcriptional splicing of pre-mRNA. RESULTS:Here we provide evidence that H2B monoubiquitylation (H2BK123ub1) marks...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-627

    authors: Shieh GS,Pan CH,Wu JH,Sun YJ,Wang CC,Hsiao WC,Lin CY,Tung L,Chang TH,Fleming AB,Hillyer C,Lo YC,Berger SL,Osley MA,Kao CF

    更新日期:2011-12-22 00:00:00

  • RBM6-RBM5 transcription-induced chimeras are differentially expressed in tumours.

    abstract:UNLABELLED:Transcription-induced chimerism, a mechanism involving the transcription and intergenic splicing of two consecutive genes, has recently been estimated to account for approximately 5% of the human transcriptome. Despite this prevalence, the regulation and function of these fused transcripts remains largely un...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-348

    authors: Wang K,Ubriaco G,Sutherland LC

    更新日期:2007-10-01 00:00:00

  • Spread of avian pathogenic Escherichia coli ST117 O78:H4 in Nordic broiler production.

    abstract:BACKGROUND:Escherichia coli infections known as colibacillosis constitute a considerable challenge to poultry farmers worldwide, in terms of decreased animal welfare and production economy. Colibacillosis is caused by avian pathogenic E. coli (APEC). APEC strains are extraintestinal pathogenic E. coli and have in gener...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3415-6

    authors: Ronco T,Stegger M,Olsen RH,Sekse C,Nordstoga AB,Pohjanvirta T,Lilje B,Lyhs U,Andersen PS,Pedersen K

    更新日期:2017-01-03 00:00:00

  • Signal pathways JNK and NF-kappaB, identified by global gene expression profiling, are involved in regulation of TNFalpha-induced mPGES-1 and COX-2 expression in gingival fibroblasts.

    abstract:BACKGROUND:Prostaglandin E2 (PGE2) is involved in several chronic inflammatory diseases including periodontitis, which causes loss of the gingival tissue and alveolar bone supporting the teeth. We have previously shown that tumor necrosis factor alpha (TNFalpha) induces PGE2 synthesis in gingival fibroblasts. In this s...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-241

    authors: Båge T,Lindberg J,Lundeberg J,Modéer T,Yucel-Lindberg T

    更新日期:2010-04-15 00:00:00

  • Multiple genome alignment for identifying the core structure among moderately related microbial genomes.

    abstract:BACKGROUND:Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-515

    authors: Uchiyama I

    更新日期:2008-10-31 00:00:00

  • Transcriptomic analysis of flower induction for long-day pitaya by supplementary lighting in short-day winter season.

    abstract:BACKGROUND:Pitayas are currently attracting considerable interest as a tropical fruit with numerous health benefits. However, as a long-day plant, pitaya plants cannot flower in the winter season from November to April in Hainan, China. To harvest pitayas with high economic value in the winter season, it is necessary t...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-6726-6

    authors: Xiong R,Liu C,Xu M,Wei SS,Huang JQ,Tang H

    更新日期:2020-04-29 00:00:00

  • Complete chloroplast genome sequence of Betula platyphylla: gene organization, RNA editing, and comparative and phylogenetic analyses.

    abstract:BACKGROUND:Betula platyphylla is a common tree species in northern China that has high economic and medicinal value. Our laboratory has been devoted to genome research on B. platyphylla for approximately 10 years. As primary organelle genomes, the complete genome sequences of chloroplasts are important to study the div...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5346-x

    authors: Wang S,Yang C,Zhao X,Chen S,Qu GZ

    更新日期:2018-12-20 00:00:00

  • Flatfish monophyly refereed by the relationship of Psettodes in Carangimorphariae.

    abstract:BACKGROUND:The monophyly of flatfishes has not been supported in many molecular phylogenetic studies. The monophyly of Pleuronectoidei, which comprises all but one family of flatfishes, is broadly supported. However, the Psettodoidei, comprising the single family Psettodidae, is often found to be most closely related t...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4788-5

    authors: Shi W,Chen S,Kong X,Si L,Gong L,Zhang Y,Yu H

    更新日期:2018-05-25 00:00:00

  • Identification of QTL with large effect on seed weight in a selective population of soybean with genome-wide association and fixation index analyses.

    abstract:BACKGROUND:Soybean seed weight is not only a yield component, but also a critical trait for various soybean food products such as sprouts, edamame, soy nuts, natto and miso. Linkage analysis and genome-wide association study (GWAS) are two complementary and powerful tools to connect phenotypic differences to the underl...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-3922-0

    authors: Yan L,Hofmann N,Li S,Ferreira ME,Song B,Jiang G,Ren S,Quigley C,Fickus E,Cregan P,Song Q

    更新日期:2017-07-12 00:00:00

  • Tandem repeats derived from centromeric retrotransposons.

    abstract:BACKGROUND:Tandem repeats are ubiquitous and abundant in higher eukaryotic genomes and constitute, along with transposable elements, much of DNA underlying centromeres and other heterochromatic domains. In maize, centromeric satellite repeat (CentC) and centromeric retrotransposons (CR), a class of Ty3/gypsy retrotrans...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-142

    authors: Sharma A,Wolfgruber TK,Presting GG

    更新日期:2013-03-04 00:00:00

  • High-throughput genome sequencing of lichenizing fungi to assess gene loss in the ammonium transporter/ammonia permease gene family.

    abstract:BACKGROUND:Horizontal gene transfer has shaped the evolution of the ammonium transporter/ammonia permease gene family. Horizontal transfers of ammonium transporter/ammonia permease genes into the fungi include one transfer from archaea to the filamentous ascomycetes associated with the adaptive radiation of the leotiom...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-225

    authors: McDonald TR,Mueller O,Dietrich FS,Lutzoni F

    更新日期:2013-04-04 00:00:00

  • Genome and epigenome analysis of monozygotic twins discordant for congenital heart disease.

    abstract:BACKGROUND:Congenital heart disease (CHD) is the leading non-infectious cause of death in infants. Monozygotic (MZ) twins share nearly all of their genetic variants before and after birth. Nevertheless, MZ twins are sometimes discordant for common complex diseases. The goal of this study is to identify genomic and epig...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4814-7

    authors: Lyu G,Zhang C,Ling T,Liu R,Zong L,Guan Y,Huang X,Sun L,Zhang L,Li C,Nie Y,Tao W

    更新日期:2018-06-04 00:00:00

  • SNP discovery and genetic mapping using genotyping by sequencing of whole genome genomic DNA from a pea RIL population.

    abstract:BACKGROUND:Progress in genetics and breeding in pea still suffers from the limited availability of molecular resources. SNP markers that can be identified through affordable sequencing processes, without the need for prior genome reduction or a reference genome to assemble sequencing data would allow the discovery and ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2447-2

    authors: Boutet G,Alves Carvalho S,Falque M,Peterlongo P,Lhuillier E,Bouchez O,Lavaud C,Pilet-Nayel ML,Rivière N,Baranger A

    更新日期:2016-02-18 00:00:00

  • Whole-genome phylogenies of the family Bacillaceae and expansion of the sigma factor gene family in the Bacillus cereus species-group.

    abstract:BACKGROUND:The Bacillus cereus sensu lato group consists of six species (B. anthracis, B. cereus, B. mycoides, B. pseudomycoides, B. thuringiensis, and B. weihenstephanensis). While classical microbial taxonomy proposed these organisms as distinct species, newer molecular phylogenies and comparative genome sequencing s...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-430

    authors: Schmidt TR,Scott EJ 2nd,Dyer DW

    更新日期:2011-08-24 00:00:00

  • Transcriptional ontogeny of the developing liver.

    abstract:BACKGROUND:During embryogenesis the liver is derived from endodermal cells lining the digestive tract. These endodermal progenitor cells contribute to forming the parenchyma of a number of organs including the liver and pancreas. Early in organogenesis the fetal liver is populated by hematopoietic stem cells, the sourc...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-33

    authors: Lee JS,Ward WO,Knapp G,Ren H,Vallanat B,Abbott B,Ho K,Karp SJ,Corton JC

    更新日期:2012-01-19 00:00:00

  • Pangenome analysis of Bifidobacterium longum and site-directed mutagenesis through by-pass of restriction-modification systems.

    abstract:BACKGROUND:Bifidobacterial genome analysis has provided insights as to how these gut commensals adapt to and persist in the human GIT, while also revealing genetic diversity among members of a given bifidobacterial (sub)species. Bifidobacteria are notoriously recalcitrant to genetic modification, which prevents explora...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1968-4

    authors: O'Callaghan A,Bottacini F,O'Connell Motherway M,van Sinderen D

    更新日期:2015-10-21 00:00:00

  • Genome-wide analysis of hepatic LRH-1 reveals a promoter binding preference and suggests a role in regulating genes of lipid metabolism in concert with FXR.

    abstract:BACKGROUND:In a previous genome-wide analysis of FXR binding to hepatic chromatin, we noticed that an extra nuclear receptor (NR) half-site was co-enriched close to the FXR binding IR-1 elements and we provided limited support that the monomeric LRH-1 receptor that binds to NR half-sites might function together with FX...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-51

    authors: Chong HK,Biesinger J,Seo YK,Xie X,Osborne TF

    更新日期:2012-02-01 00:00:00

  • The role of retinoic acid in hepatic lipid homeostasis defined by genomic binding and transcriptome profiling.

    abstract:BACKGROUND:The eyes and skin are obvious retinoid target organs. Vitamin A deficiency causes night blindness and retinoids are widely used to treat acne and psoriasis. However, more than 90% of total body retinol is stored in liver stellate cells. In addition, hepatocytes produce the largest amount of retinol binding p...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-575

    authors: He Y,Gong L,Fang Y,Zhan Q,Liu HX,Lu Y,Guo GL,Lehman-McKeeman L,Fang J,Wan YJ

    更新日期:2013-08-28 00:00:00

  • Characterization of the bovine pregnancy-associated glycoprotein gene family--analysis of gene sequences, regulatory regions within the promoter and expression of selected genes.

    abstract:BACKGROUND:The Pregnancy-associated glycoproteins (PAGs) belong to a large family of aspartic peptidases expressed exclusively in the placenta of species in the Artiodactyla order. In cattle, the PAG gene family is comprised of at least 22 transcribed genes, as well as some variants. Phylogenetic analyses have shown th...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-185

    authors: Telugu BP,Walker AM,Green JA

    更新日期:2009-04-24 00:00:00

  • Multiple displacement amplification for complex mixtures of DNA fragments.

    abstract:BACKGROUND:A fundamental requirement for genomic studies is the availability of genetic material of good quality and quantity. The desired quantity and quality are often hard to obtain when target DNA is composed of complex mixtures of relatively short DNA fragments. Here, we sought to develop a method to representativ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-415

    authors: Shoaib M,Baconnais S,Mechold U,Le Cam E,Lipinski M,Ogryzko V

    更新日期:2008-09-15 00:00:00

  • In silico and in vivo splicing analysis of MLH1 and MSH2 missense mutations shows exon- and tissue-specific effects.

    abstract:BACKGROUND:Abnormalities of pre-mRNA splicing are increasingly recognized as an important mechanism through which gene mutations cause disease. However, apart from the mutations in the donor and acceptor sites, the effects on splicing of other sequence variations are difficult to predict. Loosely defined exonic and int...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-7-243

    authors: Lastella P,Surdo NC,Resta N,Guanti G,Stella A

    更新日期:2006-09-22 00:00:00

  • Identification of a quantitative trait loci (QTL) associated with ammonia tolerance in the Pacific white shrimp (Litopenaeus vannamei).

    abstract:BACKGROUND:Ammonia is one of the most common toxicological environment factors affecting shrimp health. Although ammonia tolerance in shrimp is closely related to successful industrial production, few genetic studies of this trait are available. RESULTS:In this study, we constructed a high-density genetic map of the P...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-07254-x

    authors: Zeng D,Yang C,Li Q,Zhu W,Chen X,Peng M,Chen X,Lin Y,Wang H,Liu H,Liang J,Liu Q,Zhao Y

    更新日期:2020-12-02 00:00:00

  • New insights into the evolution and functional divergence of the CIPK gene family in Saccharum.

    abstract:BACKGROUND:Calcineurin B-like protein (CBL)-interacting protein kinases (CIPKs) are the primary components of calcium sensors, and play crucial roles in plant developmental processes, hormone signaling transduction, and in the response to exogenous stresses. RESULTS:In this study, 48 CIPK genes (SsCIPKs) were identifi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-07264-9

    authors: Su W,Ren Y,Wang D,Huang L,Fu X,Ling H,Su Y,Huang N,Tang H,Xu L,Que Y

    更新日期:2020-12-07 00:00:00