The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes.

Abstract:

:Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.

journal_name

Genome Res

journal_title

Genome research

authors

Montgomery SB,Goode DL,Kvikstad E,Albers CA,Zhang ZD,Mu XJ,Ananda G,Howie B,Karczewski KJ,Smith KS,Anaya V,Richardson R,Davis J,1000 Genomes Project Consortium.,MacArthur DG,Sidow A,Duret L,Gerstein M,Makova KD,Marc

doi

10.1101/gr.148718.112

subject

Has Abstract

pub_date

2013-05-01 00:00:00

pages

749-61

issue

5

eissn

1088-9051

issn

1549-5469

pii

gr.148718.112

journal_volume

23

pub_type

杂志文章
  • Rapid molecular assays to study human centromere genomics.

    abstract::The centromere is the structural unit responsible for the faithful segregation of chromosomes. Although regulation of centromeric function by epigenetic factors has been well-studied, the contributions of the underlying DNA sequences have been much less well defined, and existing methodologies for studying centromere ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.219709.116

    authors: Contreras-Galindo R,Fischer S,Saha AK,Lundy JD,Cervantes PW,Mourad M,Wang C,Qian B,Dai M,Meng F,Chinnaiyan A,Omenn GS,Kaplan MH,Markovitz DM

    更新日期:2017-12-01 00:00:00

  • Comparative analysis of mammalian Y chromosomes illuminates ancestral structure and lineage-specific evolution.

    abstract::Although more than thirty mammalian genomes have been sequenced to draft quality, very few of these include the Y chromosome. This has limited our understanding of the evolutionary dynamics of gene persistence and loss, our ability to identify conserved regulatory elements, as well our knowledge of the extent to which...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.154286.112

    authors: Li G,Davis BW,Raudsepp T,Pearks Wilkerson AJ,Mason VC,Ferguson-Smith M,O'Brien PC,Waters PD,Murphy WJ

    更新日期:2013-09-01 00:00:00

  • Coding and noncoding variants in HFM1, MLH3, MSH4, MSH5, RNF212, and RNF212B affect recombination rate in cattle.

    abstract::We herein study genetic recombination in three cattle populations from France, New Zealand, and the Netherlands. We identify 2,395,177 crossover (CO) events in 94,516 male gametes, and 579,996 CO events in 25,332 female gametes. The average number of COs was found to be larger in males (23.3) than in females (21.4). T...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.204214.116

    authors: Kadri NK,Harland C,Faux P,Cambisano N,Karim L,Coppieters W,Fritz S,Mullaart E,Baurain D,Boichard D,Spelman R,Charlier C,Georges M,Druet T

    更新日期:2016-10-01 00:00:00

  • The landscape of histone modifications across 1% of the human genome in five human cell lines.

    abstract::We generated high-resolution maps of histone H3 lysine 9/14 acetylation (H3ac), histone H4 lysine 5/8/12/16 acetylation (H4ac), and histone H3 at lysine 4 mono-, di-, and trimethylation (H3K4me1, H3K4me2, H3K4me3, respectively) across the ENCODE regions. Studying each modification in five human cell lines including th...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5704207

    authors: Koch CM,Andrews RM,Flicek P,Dillon SC,Karaöz U,Clelland GK,Wilcox S,Beare DM,Fowler JC,Couttet P,James KD,Lefebvre GC,Bruce AW,Dovey OM,Ellis PD,Dhami P,Langford CF,Weng Z,Birney E,Carter NP,Vetrie D,Dunham I

    更新日期:2007-06-01 00:00:00

  • A-to-I RNA editing promotes developmental stage-specific gene and lncRNA expression.

    abstract::A-to-I RNA editing is a conserved widespread phenomenon in which adenosine (A) is converted to inosine (I) by adenosine deaminases (ADARs) in double-stranded RNA regions, mainly noncoding. Mutations in ADAR enzymes in Caenorhabditis elegans cause defects in normal development but are not lethal as in human and mouse. ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.211169.116

    authors: Goldstein B,Agranat-Tamir L,Light D,Ben-Naim Zgayer O,Fishman A,Lamm AT

    更新日期:2017-03-01 00:00:00

  • A matter of life or death: how microsatellites emerge in and vanish from the human genome.

    abstract::Microsatellites--tandem repeats of short DNA motifs--are abundant in the human genome and have high mutation rates. While microsatellite instability is implicated in numerous genetic diseases, the molecular processes involved in their emergence and disappearance are still not well understood. Microsatellites are hypot...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.122937.111

    authors: Kelkar YD,Eckert KA,Chiaromonte F,Makova KD

    更新日期:2011-12-01 00:00:00

  • Selfish mutations dysregulating RAS-MAPK signaling are pervasive in aged human testes.

    abstract::Mosaic mutations present in the germline have important implications for reproductive risk and disease transmission. We previously demonstrated a phenomenon occurring in the male germline, whereby specific mutations arising spontaneously in stem cells (spermatogonia) lead to clonal expansion, resulting in elevated mut...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.239186.118

    authors: Maher GJ,Ralph HK,Ding Z,Koelling N,Mlcochova H,Giannoulatou E,Dhami P,Paul DS,Stricker SH,Beck S,McVean G,Wilkie AOM,Goriely A

    更新日期:2018-12-01 00:00:00

  • Dynamic effects of interacting genes underlying rice flowering-time phenotypic plasticity and global adaptation.

    abstract::The phenotypic variation of living organisms is shaped by genetics, environment, and their interaction. Understanding phenotypic plasticity under natural conditions is hindered by the apparently complex environment and the interacting genes and pathways. Herein, we report findings from the dissection of rice flowering...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.255703.119

    authors: Guo T,Mu Q,Wang J,Vanous AE,Onogi A,Iwata H,Li X,Yu J

    更新日期:2020-05-01 00:00:00

  • Time course regulatory analysis based on paired expression and chromatin accessibility data.

    abstract::A time course experiment is a widely used design in the study of cellular processes such as differentiation or response to stimuli. In this paper, we propose time course regulatory analysis (TimeReg) as a method for the analysis of gene regulatory networks based on paired gene expression and chromatin accessibility da...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.257063.119

    authors: Duren Z,Chen X,Xin J,Wang Y,Wong WH

    更新日期:2020-04-01 00:00:00

  • Transposon expression in the Drosophila brain is driven by neighboring genes and diversifies the neural transcriptome.

    abstract::Somatic transposon expression in neural tissue is commonly considered as a measure of mobilization and has therefore been linked to neuropathology and organismal individuality. We combined genome sequencing data with single-cell mRNA sequencing of the same inbred fly strain to map transposon expression in the Drosophi...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.259200.119

    authors: Treiber CD,Waddell S

    更新日期:2020-11-01 00:00:00

  • The pig X and Y Chromosomes: structure, sequence, and evolution.

    abstract::We have generated an improved assembly and gene annotation of the pig X Chromosome, and a first draft assembly of the pig Y Chromosome, by sequencing BAC and fosmid clones from Duroc animals and incorporating information from optical mapping and fiber-FISH. The X Chromosome carries 1033 annotated genes, 690 of which a...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.188839.114

    authors: Skinner BM,Sargent CA,Churcher C,Hunt T,Herrero J,Loveland JE,Dunn M,Louzada S,Fu B,Chow W,Gilbert J,Austin-Guest S,Beal K,Carvalho-Silva D,Cheng W,Gordon D,Grafham D,Hardy M,Harley J,Hauser H,Howden P,Howe K,

    更新日期:2016-01-01 00:00:00

  • Integrated annotations and analyses of small RNA-producing loci from 47 diverse plants.

    abstract::Plant endogenous small RNAs (sRNAs) are important regulators of gene expression. There are two broad categories of plant sRNAs: microRNAs (miRNAs) and endogenous short interfering RNAs (siRNAs). MicroRNA loci are relatively well-annotated but compose only a small minority of the total sRNA pool; siRNA locus annotation...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.256750.119

    authors: Lunardon A,Johnson NR,Hagerott E,Phifer T,Polydore S,Coruh C,Axtell MJ

    更新日期:2020-03-01 00:00:00

  • Genome-wide identification of conserved regulatory function in diverged sequences.

    abstract::Plasticity of gene regulatory encryption can permit DNA sequence divergence without loss of function. Functional information is preserved through conservation of the composition of transcription factor binding sites (TFBS) in a regulatory element. We have developed a method that can accurately identify pairs of functi...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.119016.110

    authors: Taher L,McGaughey DM,Maragh S,Aneas I,Bessling SL,Miller W,Nobrega MA,McCallion AS,Ovcharenko I

    更新日期:2011-07-01 00:00:00

  • Ribosome profiling reveals post-transcriptional buffering of divergent gene expression in yeast.

    abstract::Understanding the patterns and causes of phenotypic divergence is a central goal in evolutionary biology. Much work has shown that mRNA abundance is highly variable between closely related species. However, the extent and mechanisms of post-transcriptional gene regulatory evolution are largely unknown. Here we used ri...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.164996.113

    authors: McManus CJ,May GE,Spealman P,Shteyman A

    更新日期:2014-03-01 00:00:00

  • Polymorphic centromere locations in the pathogenic yeast Candida parapsilosis.

    abstract::Centromeres pose an evolutionary paradox: strongly conserved in function but rapidly changing in sequence and structure. However, in the absence of damage, centromere locations are usually conserved within a species. We report here that isolates of the pathogenic yeast species Candida parapsilosis show within-species ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.257816.119

    authors: Ola M,O'Brien CE,Coughlan AY,Ma Q,Donovan PD,Wolfe KH,Butler G

    更新日期:2020-05-01 00:00:00

  • Natural genetic variation in C. elegans identified genomic loci controlling metabolite levels.

    abstract::Metabolic homeostasis is sustained by complex biological networks that respond to nutrient availability. Genetic and environmental factors may disrupt this equilibrium, leading to metabolic disorders, including obesity and type 2 diabetes. To identify the genetic factors controlling metabolism, we performed quantitati...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.232322.117

    authors: Gao AW,Sterken MG,Uit de Bos J,van Creij J,Kamble R,Snoek BL,Kammenga JE,Houtkooper RH

    更新日期:2018-09-01 00:00:00

  • Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement, and intron loss.

    abstract::The str family of genes encoding seven-transmembrane G-protein-coupled or serpentine receptors related to the ODR-10 diacetyl chemoreceptor is very large, with at least 197 members in the Caenorhabditis elegans genome. The closely related stl family has 43 genes, and both families are distantly related to the srd fami...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.8.5.449

    authors: Robertson HM

    更新日期:1998-05-01 00:00:00

  • Translation initiation downstream from annotated start codons in human mRNAs coevolves with the Kozak context.

    abstract::Eukaryotic translation initiation involves preinitiation ribosomal complex 5'-to-3' directional probing of mRNA for codons suitable for starting protein synthesis. The recognition of codons as starts depends on the codon identity and on its immediate nucleotide context known as Kozak context. When the context is weak ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.257352.119

    authors: Benitez-Cantos MS,Yordanova MM,O'Connor PBF,Zhdanov AV,Kovalchuk SI,Papkovsky DB,Andreev DE,Baranov PV

    更新日期:2020-07-01 00:00:00

  • The first five years of single-cell cancer genomics and beyond.

    abstract::Single-cell sequencing (SCS) is a powerful new tool for investigating evolution and diversity in cancer and understanding the role of rare cells in tumor progression. These methods have begun to unravel key questions in cancer biology that have been difficult to address with bulk tumor measurements. Over the past five...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.191098.115

    authors: Navin NE

    更新日期:2015-10-01 00:00:00

  • An extraordinary retrotransposon family encoding dual endonucleases.

    abstract::Retrotransposons commonly encode a reverse transcriptase (RT), but other functional domains are variable. The acquisition of new domains is the dominant evolutionary force that brings structural variety to retrotransposons. Non-long-terminal-repeat (non-LTR) retrotransposons are classified into two groups by their str...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3271405

    authors: Kojima KK,Fujiwara H

    更新日期:2005-08-01 00:00:00

  • A GC-rich sequence feature in the 3' UTR directs UPF1-dependent mRNA decay in mammalian cells.

    abstract::Up-frameshift protein 1 (UPF1) is an ATP-dependent RNA helicase that has essential roles in RNA surveillance and in post-transcriptional gene regulation by promoting the degradation of mRNAs. Previous studies revealed that UPF1 is associated with the 3' untranslated region (UTR) of target mRNAs via as-yet-unknown sequ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.206060.116

    authors: Imamachi N,Salam KA,Suzuki Y,Akimitsu N

    更新日期:2017-03-01 00:00:00

  • A bioinformatics-based strategy identifies c-Myc and Cdc25A as candidates for the Apmt mammary tumor latency modifiers.

    abstract::The epistatically interacting modifier loci (Apmt1 and Apmt2) accelerate the polyoma Middle-T (PyVT)-induced mammary tumor. To identify potential candidate genes loci, a combined bioinformatics and genomics strategy was used. On the basis of the assumption that the loci were functioning in the same or intersecting pat...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.210502

    authors: Cozma D,Lukes L,Rouse J,Qiu TH,Liu ET,Hunter KW

    更新日期:2002-06-01 00:00:00

  • Theories and applications for sequencing randomly selected clones.

    abstract::Theory is developed for the process of sequencing randomly selected large-insert clones. Genome size, library depth, clone size, and clone distribution are considered relevant properties and perfect overlap detection for contig assembly is assumed. Genome-specific and nonrandom effects are neglected. Order of magnitud...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.gr-1339r

    authors: Wendl MC,Marra MA,Hillier LW,Chinwalla AT,Wilson RK,Waterston RH

    更新日期:2001-02-01 00:00:00

  • An assessment of gene prediction accuracy in large DNA sequences.

    abstract::One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.122800

    authors: Guigó R,Agarwal P,Abril JF,Burset M,Fickett JW

    更新日期:2000-10-01 00:00:00

  • Parallel radiation hybrid mapping: a powerful tool for high-resolution genomic comparison.

    abstract::Comparative gene mapping in mammals typically involves identification of segments of conserved synteny in diverse genomes. The development of maps that permit comparison of gene order within conserved synteny has not advanced beyond the mouse map that takes advantage of linkage analysis in interspecific backcrosses. R...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.8.7.731

    authors: Yang YP,Womack JE

    更新日期:1998-07-01 00:00:00

  • Systematic insertional mutagenesis of a streptomycete genome: a link between osmoadaptation and antibiotic production.

    abstract::The model organism Streptomyces coelicolor represents a genus that produces a vast range of bioactive secondary metabolites. We describe a versatile procedure for systematic and comprehensive mutagenesis of the S. coelicolor genome. The high-throughput process relies on in vitro transposon mutagenesis of an ordered co...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1710304

    authors: Bishop A,Fielding S,Dyson P,Herron P

    更新日期:2004-05-01 00:00:00

  • Capture of a functionally active methyl-CpG binding domain by an arthropod retrotransposon family.

    abstract::The repressive capacity of cytosine DNA methylation is mediated by recruitment of silencing complexes by methyl-CpG binding domain (MBD) proteins. Despite MBD proteins being associated with silencing, we discovered that a family of arthropod Copia retrotransposons have incorporated a host-derived MBD. We functionally ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.243774.118

    authors: de Mendoza A,Pflueger J,Lister R

    更新日期:2019-08-01 00:00:00

  • Introgression maintains the genetic integrity of the mating-type determining chromosome of the fungus Neurospora tetrasperma.

    abstract::Genome evolution is driven by a complex interplay of factors, including selection, recombination, and introgression. The regions determining sexual identity are particularly dynamic parts of eukaryotic genomes that are prone to molecular degeneration associated with suppressed recombination. In the fungus Neurospora t...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.197244.115

    authors: Corcoran P,Anderson JL,Jacobson DJ,Sun Y,Ni P,Lascoux M,Johannesson H

    更新日期:2016-04-01 00:00:00

  • Transcription factor binding and modified histones in human bidirectional promoters.

    abstract::Bidirectional promoters have received considerable attention because of their ability to regulate two downstream genes (divergent genes). They are also highly abundant, directing the transcription of approximately 11% of genes in the human genome. We categorized the presence of DNA sequence motifs, binding of transcri...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5623407

    authors: Lin JM,Collins PJ,Trinklein ND,Fu Y,Xi H,Myers RM,Weng Z

    更新日期:2007-06-01 00:00:00

  • Genome-scale cloning and expression of individual open reading frames using topoisomerase I-mediated ligation.

    abstract::The in vitro cloning of DNA molecules traditionally uses PCR amplification or site-specific restriction endonucleases to generate linear DNA inserts with defined termini and requires DNA ligase to covalently join those inserts to vectors with the corresponding ends. We have used the properties of Vaccinia DNA topoisom...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:

    authors: Heyman JA,Cornthwaite J,Foncerrada L,Gilmore JR,Gontang E,Hartman KJ,Hernandez CL,Hood R,Hull HM,Lee WY,Marcil R,Marsh EJ,Mudd KM,Patino MJ,Purcell TJ,Rowland JJ,Sindici ML,Hoeffler JP

    更新日期:1999-04-01 00:00:00