Accurate typing of short tandem repeats from genome-wide sequencing data and its applications.

Abstract:

:Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution.

journal_name

Genome Res

journal_title

Genome research

authors

Fungtammasan A,Ananda G,Hile SE,Su MS,Sun C,Harris R,Medvedev P,Eckert K,Makova KD

doi

10.1101/gr.185892.114

subject

Has Abstract

pub_date

2015-05-01 00:00:00

pages

736-49

issue

5

eissn

1088-9051

issn

1549-5469

pii

gr.185892.114

journal_volume

25

pub_type

杂志文章
  • Coding and noncoding variants in HFM1, MLH3, MSH4, MSH5, RNF212, and RNF212B affect recombination rate in cattle.

    abstract::We herein study genetic recombination in three cattle populations from France, New Zealand, and the Netherlands. We identify 2,395,177 crossover (CO) events in 94,516 male gametes, and 579,996 CO events in 25,332 female gametes. The average number of COs was found to be larger in males (23.3) than in females (21.4). T...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.204214.116

    authors: Kadri NK,Harland C,Faux P,Cambisano N,Karim L,Coppieters W,Fritz S,Mullaart E,Baurain D,Boichard D,Spelman R,Charlier C,Georges M,Druet T

    更新日期:2016-10-01 00:00:00

  • Bacillus subtilis during feast and famine: visualization of the overall regulation of protein synthesis during glucose starvation by proteome analysis.

    abstract::Dual channel imaging and warping of two-dimensional (2D) protein gels were used to visualize global changes of the gene expression patterns in growing Bacillus subtilis cells during entry into the stationary phase as triggered by glucose exhaustion. The 2D gels only depict single moments during the cells' growth cycle...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.905003

    authors: Bernhardt J,Weibezahn J,Scharf C,Hecker M

    更新日期:2003-02-01 00:00:00

  • High-resolution quantification of specific mRNA levels in human brain autopsies and biopsies.

    abstract::Quantification of mRNA levels in human cortical brain biopsies and autopsies was performed using a fluorogenic 5' nuclease assay. The reproducibility of the assay using replica plates was 97%-99%. Relative quantities of mRNA from 16 different genes were evaluated using a statistical approach based on ANCOVA analysis. ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.10.8.1219

    authors: Castensson A,Emilsson L,Preece P,Jazin EE

    更新日期:2000-08-01 00:00:00

  • Phylogeny-wide analysis of social amoeba genomes highlights ancient origins for complex intercellular communication.

    abstract::Dictyostelium discoideum (DD), an extensively studied model organism for cell and developmental biology, belongs to the most derived group 4 of social amoebas, a clade of altruistic multicellular organisms. To understand genome evolution over long time periods and the genetic basis of social evolution, we sequenced th...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.121137.111

    authors: Heidel AJ,Lawal HM,Felder M,Schilde C,Helps NR,Tunggal B,Rivero F,John U,Schleicher M,Eichinger L,Platzer M,Noegel AA,Schaap P,Glöckner G

    更新日期:2011-11-01 00:00:00

  • Genomic organization of the sex-determining and adjacent regions of the sex chromosomes of medaka.

    abstract::Sequencing of the human Y chromosome has uncovered the peculiarities of the genomic organization of a heterogametic sex chromosome of old evolutionary age, and has led to many insights into the evolutionary changes that occurred during its long history. We have studied the genomic organization of the medaka fish Y chr...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5016106

    authors: Kondo M,Hornung U,Nanda I,Imai S,Sasaki T,Shimizu A,Asakawa S,Hori H,Schmid M,Shimizu N,Schartl M

    更新日期:2006-07-01 00:00:00

  • Multiple waves of recent DNA transposon activity in the bat, Myotis lucifugus.

    abstract::DNA transposons, or class 2 transposable elements, have successfully propagated in a wide variety of genomes. However, it is widely believed that DNA transposon activity has ceased in mammalian genomes for at least the last 40 million years. We recently reported evidence for the relatively recent activity of hAT and H...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.071886.107

    authors: Ray DA,Feschotte C,Pagan HJ,Smith JD,Pritham EJ,Arensburger P,Atkinson PW,Craig NL

    更新日期:2008-05-01 00:00:00

  • Gene and alternative splicing annotation with AIR.

    abstract::Designing effective and accurate tools for identifying the functional and structural elements in a genome remains at the frontier of genome annotation owing to incompleteness and inaccuracy of the data, limitations in the computational models, and shifting paradigms in genomics, such as alternative splicing. We presen...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2889405

    authors: Florea L,Di Francesco V,Miller J,Turner R,Yao A,Harris M,Walenz B,Mobarry C,Merkulov GV,Charlab R,Dew I,Deng Z,Istrail S,Li P,Sutton G

    更新日期:2005-01-01 00:00:00

  • Localization of a long-range cis-regulatory element of IL13 by allelic transcript ratio mapping.

    abstract::It appears that, for many genes, the two alleles possessed by an individual may produce different amounts of transcript. When such allelic differences in transcription are observed for some individuals but not others, a plausible explanation is genetic variation in the cis-acting elements that regulate the gene in que...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5663007

    authors: Forton JT,Udalova IA,Campino S,Rockett KA,Hull J,Kwiatkowski DP

    更新日期:2007-01-01 00:00:00

  • The TAGteam motif facilitates binding of 21 sequence-specific transcription factors in the Drosophila embryo.

    abstract::Highly overlapping patterns of genome-wide binding of many distinct transcription factors have been observed in worms, insects, and mammals, but the origins and consequences of this overlapping binding remain unclear. While analyzing chromatin immunoprecipitation data sets from 21 sequence-specific transcription facto...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.130682.111

    authors: Satija R,Bradley RK

    更新日期:2012-04-01 00:00:00

  • Distal CpG islands can serve as alternative promoters to transcribe genes with silenced proximal promoters.

    abstract::DNA methylation at the promoter of a gene is presumed to render it silent, yet a sizable fraction of genes with methylated proximal promoters exhibit elevated expression. Here, we show, through extensive analysis of the methylome and transcriptome in 34 tissues, that in many such cases, transcription is initiated by a...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.212050.116

    authors: Sarda S,Das A,Vinson C,Hannenhalli S

    更新日期:2017-04-01 00:00:00

  • Gene loss and movement in the maize genome.

    abstract::Maize (Zea mays L. ssp. mays), one of the most important agricultural crops in the world, originated by hybridization of two closely related progenitors. To investigate the fate of its genes after tetraploidization, we analyzed the sequence of five duplicated regions from different chromosomal locations. We also compa...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2701104

    authors: Lai J,Ma J,Swigonová Z,Ramakrishna W,Linton E,Llaca V,Tanyolac B,Park YJ,Jeong OY,Bennetzen JL,Messing J

    更新日期:2004-10-01 00:00:00

  • Integrated mapping, chromosomal sequencing and sequence analysis of Cryptosporidium parvum.

    abstract::The apicomplexan Cryptosporidium parvum is one of the most prevalent protozoan parasites of humans. We report the physical mapping of the genome of the Iowa isolate, sequencing and analysis of chromosome 6, and approximately 0.9 Mbp of sequence sampled from the remainder of the genome. To construct a robust physical m...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1555203

    authors: Bankier AT,Spriggs HF,Fartmann B,Konfortov BA,Madera M,Vogel C,Teichmann SA,Ivens A,Dear PH

    更新日期:2003-08-01 00:00:00

  • Reconstructing complex regions of genomes using long-read sequencing technology.

    abstract::Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger s...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.168450.113

    authors: Huddleston J,Ranade S,Malig M,Antonacci F,Chaisson M,Hon L,Sudmant PH,Graves TA,Alkan C,Dennis MY,Wilson RK,Turner SW,Korlach J,Eichler EE

    更新日期:2014-04-01 00:00:00

  • Unique DNA methylome profiles in CpG island methylator phenotype colon cancers.

    abstract::A subset of colorectal cancers was postulated to have the CpG island methylator phenotype (CIMP), a higher propensity for CpG island DNA methylation. The validity of CIMP, its molecular basis, and its prognostic value remain highly controversial. Using MBD-isolated genome sequencing, we mapped and compared genome-wide...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.122788.111

    authors: Xu Y,Hu B,Choi AJ,Gopalan B,Lee BH,Kalady MF,Church JM,Ting AH

    更新日期:2012-02-01 00:00:00

  • Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli.

    abstract::Differences in gene repertoire among bacterial genomes are usually ascribed to gene loss or to lateral gene transfer from unrelated cellular organisms. However, most bacteria contain large numbers of ORFans, that is, annotated genes that are restricted to a particular genome and that possess no known homologs. The uni...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2231904

    authors: Daubin V,Ochman H

    更新日期:2004-06-01 00:00:00

  • Sequencing of cDNA clones from the genetic map of tomato (Lycopersicon esculentum).

    abstract::The dense RFLP linkage map of tomato (Lycopersicon esculentum) contains >300 anonymous cDNA clones. Of those clones, 272 were partially or completely sequenced. The sequences were compared at the DNA and protein level to known genes in databases. For 57% of the clones, a significant match to previously described genes...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.8.8.842

    authors: Ganal MW,Czihal R,Hannappel U,Kloos DU,Polley A,Ling HQ

    更新日期:1998-08-01 00:00:00

  • Analysis of Arabidopsis genome-wide variations before and after meiosis and meiotic recombination by resequencing Landsberg erecta and all four products of a single meiosis.

    abstract::Meiotic recombination, including crossovers (COs) and gene conversions (GCs), impacts natural variation and is an important evolutionary force. COs increase genetic diversity by redistributing existing variation, whereas GCs can alter allelic frequency. Here, we sequenced Arabidopsis Landsberg erecta (Ler) and two set...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.127522.111

    authors: Lu P,Han X,Qi J,Yang J,Wijeratne AJ,Li T,Ma H

    更新日期:2012-03-01 00:00:00

  • Detecting ancient positive selection in humans using extended lineage sorting.

    abstract::Natural selection that affected modern humans early in their evolution has likely shaped some of the traits that set present-day humans apart from their closest extinct and living relatives. The ability to detect ancient natural selection in the human genome could provide insights into the molecular basis for these hu...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.219493.116

    authors: Peyrégne S,Boyle MJ,Dannemann M,Prüfer K

    更新日期:2017-09-01 00:00:00

  • Widespread plasticity in CTCF occupancy linked to DNA methylation.

    abstract::CTCF is a ubiquitously expressed regulator of fundamental genomic processes including transcription, intra- and interchromosomal interactions, and chromatin structure. Because of its critical role in genome function, CTCF binding patterns have long been assumed to be largely invariant across different cellular environ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.136101.111

    authors: Wang H,Maurano MT,Qu H,Varley KE,Gertz J,Pauli F,Lee K,Canfield T,Weaver M,Sandstrom R,Thurman RE,Kaul R,Myers RM,Stamatoyannopoulos JA

    更新日期:2012-09-01 00:00:00

  • A positive but complex association between meiotic double-strand break hotspots and open chromatin in Saccharomyces cerevisiae.

    abstract::During meiosis, chromatin undergoes extensive changes to facilitate recombination, homolog pairing, and chromosome segregation. To investigate the relationship between chromatin organization and meiotic processes, we used formaldehyde-assisted isolation of regulatory elements (FAIRE) to map open chromatin during the t...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.096297.109

    authors: Berchowitz LE,Hanlon SE,Lieb JD,Copenhaver GP

    更新日期:2009-12-01 00:00:00

  • Wolbachia genome integrated in an insect chromosome: evolution and fate of laterally transferred endosymbiont genes.

    abstract::Recent accumulation of microbial genome data has demonstrated that lateral gene transfers constitute an important and universal evolutionary process in prokaryotes, while those in multicellular eukaryotes are still regarded as unusual, except for endosymbiotic gene transfers from mitochondria and plastids. Here we tho...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.7144908

    authors: Nikoh N,Tanaka K,Shibata F,Kondo N,Hizume M,Shimada M,Fukatsu T

    更新日期:2008-02-01 00:00:00

  • Pathway Processor: a tool for integrating whole-genome expression results into metabolic networks.

    abstract::We have developed a new tool to visualize expression data on metabolic pathways and to evaluate which metabolic pathways are most affected by transcriptional changes in whole-genome expression experiments. Using the Fisher Exact Test, the method scores biochemical pathways according to the probability that as many or ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.226602

    authors: Grosu P,Townsend JP,Hartl DL,Cavalieri D

    更新日期:2002-07-01 00:00:00

  • Preference of DNA methyltransferases for CpG islands in mouse embryonic stem cells.

    abstract::Many CpG islands have tissue-dependent and differentially methylated regions (T-DMRs) in normal cells and tissues. To elucidate how DNA methyltransferases (Dnmts) participate in methylation of the genomic components, we investigated the genome-wide DNA methylation pattern of the T-DMRs with Dnmt1-, Dnmt3a-, and/or Dnm...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2431504

    authors: Hattori N,Abe T,Hattori N,Suzuki M,Matsuyama T,Yoshida S,Li E,Shiota K

    更新日期:2004-09-01 00:00:00

  • A nuclear matrix attachment site in the 4q35 locus has an enhancer-blocking activity in vivo: implications for the facio-scapulo-humeral dystrophy.

    abstract::Facio-scapulo-humeral dystrophy (FSHD), a muscular hereditary disease with a prevalence of 1 in 20,000, is caused by a partial deletion of a subtelomeric repeat array on chromosome 4q. Earlier, we demonstrated the existence in the vicinity of the D4Z4 repeat of a nuclear matrix attachment site, FR-MAR, efficient in no...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6620908

    authors: Petrov A,Allinne J,Pirozhkova I,Laoudj D,Lipinski M,Vassetzky YS

    更新日期:2008-01-01 00:00:00

  • Polycomb preferentially targets stalled promoters of coding and noncoding transcripts.

    abstract::The Polycomb group (PcG) and Trithorax group (TrxG) of proteins are required for stable and heritable maintenance of repressed and active gene expression states. Their antagonistic function on gene control, repression for PcG and activity for TrxG, is mediated by binding to chromatin and subsequent epigenetic modifica...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.114348.110

    authors: Enderle D,Beisel C,Stadler MB,Gerstung M,Athri P,Paro R

    更新日期:2011-02-01 00:00:00

  • Genetically indistinguishable SNPs and their influence on inferring the location of disease-associated variants.

    abstract::As part of a recent high-density linkage disequilibrium (LD) study of chromosome 20, we obtained genotypes for approximately 30,000 SNPs at a density of 1 SNP/2 kb on four different population samples (47 CEPH founders; 91 UK unrelateds [unrelated white individuals of western European ancestry]; 97 African Americans; ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4217605

    authors: Lawrence R,Evans DM,Morris AP,Ke X,Hunt S,Paolucci M,Ragoussis J,Deloukas P,Bentley D,Cardon LR

    更新日期:2005-11-01 00:00:00

  • Two breakpoint clusters at fragile site FRA3B form phased nucleosomes.

    abstract::Fragile sites are gaps and breaks in metaphase chromosomes generated by specific culture conditions. Fragile site FRA3B is the most unstable site and is directly involved in the breakpoints of deletion and translocation in a wide spectrum of cancers. To learn about the general characteristics of common fragile sites, ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2304404

    authors: Mulvihill DJ,Wang YH

    更新日期:2004-07-01 00:00:00

  • Evolutionary conservation of Y Chromosome ampliconic gene families despite extensive structural variation.

    abstract::Despite claims that the mammalian Y Chromosome is on a path to extinction, comparative sequence analysis of primate Y Chromosomes has shown the decay of the ancestral single-copy genes has all but ceased in this eutherian lineage. The suite of single-copy Y-linked genes is highly conserved among the majority of euther...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.237586.118

    authors: Brashear WA,Raudsepp T,Murphy WJ

    更新日期:2018-12-01 00:00:00

  • A biometrical genome search in rats reveals the multigenic basis of blood pressure variation.

    abstract::A genome-wide search for multiple loci influencing salt-loaded systolic blood pressure (NaSBP) variation among 188 F2 progeny from a cross between the Brown-Norway and spontaneously hypertensive rat strains was pursued in an effort to gain insight into the polygenic basis of blood pressure regulation. The results sugg...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5.2.164

    authors: Schork NJ,Krieger JE,Trolliet MR,Franchini KG,Koike G,Krieger EM,Lander ES,Dzau VJ,Jacob HJ

    更新日期:1995-09-01 00:00:00

  • A simplified procedure for developing multiplex PCRs.

    abstract::We have developed a simplified method for multiplex PCR based on the use of chimeric primers. Each primer contains a 3' region complementary to sequence-specific recognition sites and a 5' region made up of an unrelated 20-nucleotide sequence. Identical reaction conditions, cycling times, and annealing temperatures ha...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5.5.488

    authors: Shuber AP,Grondin VJ,Klinger KW

    更新日期:1995-12-01 00:00:00