A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction.

Abstract:

:The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the k-mer set memory (KSM), which consists of a set of aligned k-mers that are overrepresented at TF binding sites, and a new method called KMAC for de novo discovery of KSMs. We find that KSMs more accurately predict in vivo binding sites than position weight matrix (PWM) models and other more complex motif models across a large set of ChIP-seq experiments. Furthermore, KSMs outperform PWMs and more complex motif models in predicting in vitro binding sites. KMAC also identifies correct motifs in more experiments than five state-of-the-art motif discovery methods. In addition, KSM-derived features outperform both PWM and deep learning model derived sequence features in predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles. Finally, we have applied KMAC to 1600 ENCODE TF ChIP-seq data sets and created a public resource of KSM and PWM motifs. We expect that the KSM representation and KMAC method will be valuable in characterizing TF binding specificities and in interpreting the effects of noncoding genetic variations.

journal_name

Genome Res

journal_title

Genome research

authors

Guo Y,Tian K,Zeng H,Guo X,Gifford DK

doi

10.1101/gr.226852.117

subject

Has Abstract

pub_date

2018-06-01 00:00:00

pages

891-900

issue

6

eissn

1088-9051

issn

1549-5469

pii

gr.226852.117

journal_volume

28

pub_type

杂志文章
  • Massive reshaping of genome-nuclear lamina interactions during oncogene-induced senescence.

    abstract::Cellular senescence is a mechanism that virtually irreversibly suppresses the proliferative capacity of cells in response to various stress signals. This includes the expression of activated oncogenes, which causes Oncogene-Induced Senescence (OIS). A body of evidence points to the involvement in OIS of chromatin reor...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.225763.117

    authors: Lenain C,de Graaf CA,Pagie L,Visser NL,de Haas M,de Vries SS,Peric-Hupkes D,van Steensel B,Peeper DS

    更新日期:2017-10-01 00:00:00

  • A linkage map of the rat genome derived from three F2 crosses.

    abstract::We report the construction of a dense linkage map of the rat genome integrating 767 simple sequence length polymorphism markers, combined over three crosses with high rates of polymorphism. F2 populations from WKY x S (n = 159), BN x S (n = 91), and BN x GK (n = 139) were selected and genotyped for combinations of mic...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.7.5.434

    authors: Bihoreau MT,Gauguier D,Kato N,Hyne G,Lindpaintner K,Rapp JP,James MR,Lathrop GM

    更新日期:1997-05-01 00:00:00

  • Perspectives: sequence data base searching in the era of large-scale genomic sequencing.

    abstract::Large-scale sequencing of human and model organism genomes will have a profound impact on our ability to use sequence data base searching to predict the biochemical functions of sequences of interest. Despite the great value of more sequences in the data bases, a huge increase in data base size will also have adverse ...

    journal_title:Genome research

    pub_type: 杂志文章,评审

    doi:10.1101/gr.6.8.653

    authors: Smith RF

    更新日期:1996-08-01 00:00:00

  • A unified model for yeast transcript definition.

    abstract::Identifying genes in the genomic context is central to a cell's ability to interpret the genome. Yet, in general, the signals used to define eukaryotic genes are poorly described. Here, we derived simple classifiers that identify where transcription will initiate and terminate using nucleic acid sequence features dete...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.164327.113

    authors: de Boer CG,van Bakel H,Tsui K,Li J,Morris QD,Nislow C,Greenblatt JF,Hughes TR

    更新日期:2014-01-01 00:00:00

  • Models of human core transcriptional regulatory circuitries.

    abstract::A small set of core transcription factors (TFs) dominates control of the gene expression program in embryonic stem cells and other well-studied cellular models. These core TFs collectively regulate their own gene expression, thus forming an interconnected auto-regulatory loop that can be considered the core transcript...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.197590.115

    authors: Saint-André V,Federation AJ,Lin CY,Abraham BJ,Reddy J,Lee TI,Bradner JE,Young RA

    更新日期:2016-03-01 00:00:00

  • High-resolution landmark framework for the sequence-ready mapping of Xq23-q26.1.

    abstract::We have established a landmark framework map over 20-25 Mb of the long arm of the human X chromosome using yeast artificial chromosome (YAC) clones. The map has approximately one landmark per 45 kb of DNA and stretches from DXS7531 in proximal Xq23 to DXS895 in proximal Xq26, connecting to published framework maps on ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:

    authors: Steingruber HE,Dunham A,Coffey AJ,Clegg SM,Howell GR,Maslen GL,Scott CE,Gwilliam R,Hunt PJ,Sotheran EC,Huckle EJ,Hunt SE,Dhami P,Soderlund C,Leversha MA,Bentley DR,Ross MT

    更新日期:1999-08-01 00:00:00

  • A pooling-based approach to mapping genetic variants associated with DNA methylation.

    abstract::DNA methylation is an epigenetic modification that plays a key role in gene regulation. Previous studies have investigated its genetic basis by mapping genetic variants that are associated with DNA methylation at specific sites, but these have been limited to microarrays that cover <2% of the genome and cannot account...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.183749.114

    authors: Kaplow IM,MacIsaac JL,Mah SM,McEwen LM,Kobor MS,Fraser HB

    更新日期:2015-06-01 00:00:00

  • Long-read single-molecule maps of the functional methylome.

    abstract::We report on the development of a methylation analysis workflow for optical detection of fluorescent methylation profiles along chromosomal DNA molecules. In combination with Bionano Genomics genome mapping technology, these profiles provide a hybrid genetic/epigenetic genome-wide map composed of DNA molecules spannin...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.240739.118

    authors: Sharim H,Grunwald A,Gabrieli T,Michaeli Y,Margalit S,Torchinsky D,Arielly R,Nifker G,Juhasz M,Gularek F,Almalvez M,Dufault B,Chandra SS,Liu A,Bhattacharya S,Chen YW,Vilain E,Wagner KR,Pevsner J,Reifenberger J,Lam

    更新日期:2019-04-01 00:00:00

  • Patterns of meiotic recombination on the long arm of human chromosome 21.

    abstract::In this study we quantify the features of meiotic recombination on the long arm of human chromosome 21. We constructed a 67. 3-centimorgan (cM) high-resolution, comprehensive, and accurate genetic linkage map of chromosome 21q using 187 highly polymorphic markers covering almost the entire long arm; 46 loci, consistin...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.138100

    authors: Lynn A,Kashuk C,Petersen MB,Bailey JA,Cox DR,Antonarakis SE,Chakravarti A

    更新日期:2000-09-01 00:00:00

  • Natural genetic variation in yeast longevity.

    abstract::The genetics of aging in the yeast Saccharomyces cerevisiae has involved the manipulation of individual genes in laboratory strains. We have instituted a quantitative genetic analysis of the yeast replicative lifespan by sampling the natural genetic variation in a wild yeast isolate. Haploid segregants from a cross be...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.136549.111

    authors: Stumpferl SW,Brand SE,Jiang JC,Korona B,Tiwari A,Dai J,Seo JG,Jazwinski SM

    更新日期:2012-10-01 00:00:00

  • Copy number and targeted mutational analysis reveals novel somatic events in metastatic prostate tumors.

    abstract::Advanced prostate cancer can progress to systemic metastatic tumors, which are generally androgen insensitive and ultimately lethal. Here, we report a comprehensive genomic survey for somatic events in systemic metastatic prostate tumors using both high-resolution copy number analysis and targeted mutational survey of...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.107961.110

    authors: Robbins CM,Tembe WA,Baker A,Sinari S,Moses TY,Beckstrom-Sternberg S,Beckstrom-Sternberg J,Barrett M,Long J,Chinnaiyan A,Lowey J,Suh E,Pearson JV,Craig DW,Agus DB,Pienta KJ,Carpten JD

    更新日期:2011-01-01 00:00:00

  • The human protein coevolution network.

    abstract::Coevolution maintains interactions between phenotypic traits through the process of reciprocal natural selection. Detecting molecular coevolution can expose functional interactions between molecules in the cell, generating insights into biological processes, pathways, and the networks of interactions important for cel...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.092452.109

    authors: Tillier ER,Charlebois RL

    更新日期:2009-10-01 00:00:00

  • A dynamic H3K27ac signature identifies VEGFA-stimulated endothelial enhancers and requires EP300 activity.

    abstract::Histone modifications are now well-established mediators of transcriptional programs that distinguish cell states. However, the kinetics of histone modification and their role in mediating rapid, signal-responsive gene expression changes has been little studied on a genome-wide scale. Vascular endothelial growth facto...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.149674.112

    authors: Zhang B,Day DS,Ho JW,Song L,Cao J,Christodoulou D,Seidman JG,Crawford GE,Park PJ,Pu WT

    更新日期:2013-06-01 00:00:00

  • Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis.

    abstract::Through comparative studies of the model organism Arabidopsis thaliana and its close relative Brassica oleracea, we have identified conserved regions that represent potentially functional sequences overlooked by previous Arabidopsis genome annotation methods. A total of 454,274 whole genome shotgun sequences covering ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3176505

    authors: Ayele M,Haas BJ,Kumar N,Wu H,Xiao Y,Van Aken S,Utterback TR,Wortman JR,White OR,Town CD

    更新日期:2005-04-01 00:00:00

  • Rapid molecular assays to study human centromere genomics.

    abstract::The centromere is the structural unit responsible for the faithful segregation of chromosomes. Although regulation of centromeric function by epigenetic factors has been well-studied, the contributions of the underlying DNA sequences have been much less well defined, and existing methodologies for studying centromere ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.219709.116

    authors: Contreras-Galindo R,Fischer S,Saha AK,Lundy JD,Cervantes PW,Mourad M,Wang C,Qian B,Dai M,Meng F,Chinnaiyan A,Omenn GS,Kaplan MH,Markovitz DM

    更新日期:2017-12-01 00:00:00

  • A complexity reduction algorithm for analysis and annotation of large genomic sequences.

    abstract::DNA is a universal language encrypted with biological instruction for life. In higher organisms, the genetic information is preserved predominantly in an organized exon/intron structure. When a gene is expressed, the exons are spliced together to form the transcript for protein synthesis. We have developed a complexit...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.313703

    authors: Chuang TJ,Lin WC,Lee HC,Wang CW,Hsiao KL,Wang ZH,Shieh D,Lin SC,Ch'ang LY

    更新日期:2003-02-01 00:00:00

  • Evolutionary features of the 4-Mb Xq21.3 XY homology region revealed by a map at 60-kb resolution.

    abstract::Forty-three yeast artificial chromosomes (YACs) from the X chromosome have been overlapped across the 4-Mb Xq21.3 region, which is homologous to a segment in Yp11.1. The region is formatted to 60-kb resolution with 57 STSs and is merged at its edges with contigs specific for X. This allows a direct comparison of marke...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.7.4.307

    authors: Mumm S,Molini B,Terrell J,Srivastava A,Schlessinger D

    更新日期:1997-04-01 00:00:00

  • Thermophilic bacteria strictly obey Szybalski's transcription direction rule and politely purine-load RNAs with both adenine and guanine.

    abstract::When transcription is to the right of the promoter, the "top," mRNA-synonymous strand of DNA tends to be purine-rich. When transcription is to the left of the promoter, the top, mRNA-template strand tends to be pyrimidine-rich. This transcription-direction rule suggests that there has been an evolutionary selection pr...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.10.2.228

    authors: Lao PJ,Forsdyke DR

    更新日期:2000-02-01 00:00:00

  • Uncovering cis-regulatory sequence requirements for context-specific transcription factor binding.

    abstract::The regulation of gene expression is mediated at the transcriptional level by enhancer regions that are bound by sequence-specific transcription factors (TFs). Recent studies have shown that the in vivo binding sites of single TFs differ between developmental or cellular contexts. How this context-specific binding is ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.132811.111

    authors: Yáñez-Cuna JO,Dinh HQ,Kvon EZ,Shlyueva D,Stark A

    更新日期:2012-10-01 00:00:00

  • The human homolog T of the mouse T(Brachyury) gene; gene structure, cDNA sequence, and assignment to chromosome 6q27.

    abstract::We have cloned the human gene encoding the transcription factor T. T protein is vital for the formation of posterior mesoderm and axial development in all vertebrates. Brachyury mutant mice, which lack T protein, die in utero with abnormal notochord, posterior somites, and allantois. We have identified human T genomic...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6.3.226

    authors: Edwards YH,Putt W,Lekoape KM,Stott D,Fox M,Hopkinson DA,Sowden J

    更新日期:1996-03-01 00:00:00

  • Comparative gene mapping: a fine-scale survey of chromosome rearrangements between ruminants and humans.

    abstract::A total of 202 genes were cytogenetically mapped to goat chromosomes, multiplying by five the total number of regional gene localizations in domestic ruminants (255). This map encompasses 249 and 173 common anchor loci regularly spaced along human and murine chromosomes, respectively, which makes it possible to perfor...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.8.9.901

    authors: Schibler L,Vaiman D,Oustry A,Giraud-Delville C,Cribiu EP

    更新日期:1998-09-01 00:00:00

  • Distribution of hammerhead and hammerhead-like RNA motifs through the GenBank.

    abstract::Hammerhead ribozymes previously were found in satellite RNAs from plant viroids and in repetitive DNA from certain species of newts and schistosomes. To determine if this catalytic RNA motif has a wider distribution, we decided to scrutinize the GenBank database for RNAs that contain hammerhead or hammerhead-like moti...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.10.7.1011

    authors: Ferbeyre G,Bourdeau V,Pageau M,Miramontes P,Cedergren R

    更新日期:2000-07-01 00:00:00

  • Genome dynamics in aging mice.

    abstract::Random spontaneous genome rearrangements are difficult to detect in vivo, especially in postmitotic tissues. Using a lacZ-plasmid reporter mouse model, we have previously presented evidence for the accumulation of large genome rearrangements in various tissues, including postmitotic tissues, during aging. These rearra...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.125502

    authors: Dollé ME,Vijg J

    更新日期:2002-11-01 00:00:00

  • Molecular genetic maps in wild emmer wheat, Triticum dicoccoides: genome-wide coverage, massive negative interference, and putative quasi-linkage.

    abstract::The main objectives of the study reported here were to construct a molecular map of wild emmer wheat, Triticum dicoccoides, to characterize the marker-related anatomy of the genome, and to evaluate segregation and recombination patterns upon crossing T. dicoccoides with its domesticated descendant Triticum durum (cult...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.150300

    authors: Peng J,Korol AB,Fahima T,Röder MS,Ronin YI,Li YC,Nevo E

    更新日期:2000-10-01 00:00:00

  • Evolutionary conservation of Y Chromosome ampliconic gene families despite extensive structural variation.

    abstract::Despite claims that the mammalian Y Chromosome is on a path to extinction, comparative sequence analysis of primate Y Chromosomes has shown the decay of the ancestral single-copy genes has all but ceased in this eutherian lineage. The suite of single-copy Y-linked genes is highly conserved among the majority of euther...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.237586.118

    authors: Brashear WA,Raudsepp T,Murphy WJ

    更新日期:2018-12-01 00:00:00

  • Systematic interrogation of human promoters.

    abstract::Despite much research, our understanding of the architecture and cis-regulatory elements of human promoters is still lacking. Here, we devised a high-throughput assay to quantify the activity of approximately 15,000 fully designed sequences that we integrated and expressed from a fixed location within the human genome...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.236075.118

    authors: Weingarten-Gabbay S,Nir R,Lubliner S,Sharon E,Kalma Y,Weinberger A,Segal E

    更新日期:2019-02-01 00:00:00

  • A method for detecting IBD regions simultaneously in multiple individuals--with applications to disease genetics.

    abstract::All individuals in a finite population are related if traced back long enough and will, therefore, share regions of their genomes identical by descent (IBD). Detection of such regions has several important applications-from answering questions about human evolution to locating regions in the human genome containing di...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.115360.110

    authors: Moltke I,Albrechtsen A,Hansen TV,Nielsen FC,Nielsen R

    更新日期:2011-07-01 00:00:00

  • Molecular cloning and RARE cleavage mapping of human 2p, 6q, 8q, 12q, and 18q telomeres.

    abstract::Large terminal fragments of human chromosomes 2p, 6p, 8q, 12q, and 18q were cloned using yeast artificial chromosomes (YACs). RecA-assisted restriction endonuclease (RARE) cleavage analysis of genomic DNA samples from II unrelated individuals using YAC-derived probes confirmed the telomeric localizations of the half-Y...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5.3.225

    authors: Macina RA,Morii K,Hu XL,Negorev DG,Spais C,Ruthig LA,Riethman HC

    更新日期:1995-10-01 00:00:00

  • A virome-wide clonal integration analysis platform for discovering cancer viral etiology.

    abstract::Oncoviral infection is responsible for 12%-15% of cancer in humans. Convergent evidence from epidemiology, pathology, and oncology suggests that new viral etiologies for cancers remain to be discovered. Oncoviral profiles can be obtained from cancer genome sequencing data; however, widespread viral sequence contaminat...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.242529.118

    authors: Chen X,Kost J,Sulovari A,Wong N,Liang WS,Cao J,Li D

    更新日期:2019-05-01 00:00:00

  • De novo rates and selection of large copy number variation.

    abstract::While copy number variation (CNV) is an active area of research, de novo mutation rates within human populations are not well characterized. By focusing on large (>100 kbp) events, we estimate the rate of de novo CNV formation in humans by analyzing 4394 transmissions from human pedigrees with and without neurocogniti...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.107680.110

    authors: Itsara A,Wu H,Smith JD,Nickerson DA,Romieu I,London SJ,Eichler EE

    更新日期:2010-11-01 00:00:00