Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.

Abstract:

:Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here, we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and extremely repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807 nucleotides. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy.

journal_name

Genome Res

journal_title

Genome research

authors

Zimin AV,Puiu D,Luo MC,Zhu T,Koren S,Marçais G,Yorke JA,Dvořák J,Salzberg SL

doi

10.1101/gr.213405.116

subject

Has Abstract

pub_date

2017-05-01 00:00:00

pages

787-792

issue

5

eissn

1088-9051

issn

1549-5469

pii

gr.213405.116

journal_volume

27

pub_type

杂志文章
  • Species-specific class I gene expansions formed the telomeric 1 mb of the mouse major histocompatibility complex.

    abstract::We have determined the complete sequence of 951,695 bp from the class I region of H2, the mouse major histocompatibility complex (Mhc) from strain 129/Sv (haplotype bc). The sequence contains 26 genes. The sequence spans from the last 50 kb of the H2-T region, including 2 class I genes and 3 class I pseudogenes, and i...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.975303

    authors: Takada T,Kumánovics A,Amadou C,Yoshino M,Jones EP,Athanasiou M,Evans GA,Fischer Lindahl K

    更新日期:2003-04-01 00:00:00

  • Gene amplification as double minutes or homogeneously staining regions in solid tumors: origin and structure.

    abstract::Double minutes (dmin) and homogeneously staining regions (hsr) are the cytogenetic hallmarks of genomic amplification in cancer. Different mechanisms have been proposed to explain their genesis. Recently, our group showed that the MYC-containing dmin in leukemia cases arise by excision and amplification (episome model...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.106252.110

    authors: Storlazzi CT,Lonoce A,Guastadisegni MC,Trombetta D,D'Addabbo P,Daniele G,L'Abbate A,Macchia G,Surace C,Kok K,Ullmann R,Purgato S,Palumbo O,Carella M,Ambros PF,Rocchi M

    更新日期:2010-09-01 00:00:00

  • Systematic identification of novel protein domain families associated with nuclear functions.

    abstract::A systematic computational analysis of protein sequences containing known nuclear domains led to the identification of 28 novel domain families. This represents a 26% increase in the starting set of 107 known nuclear domain families used for the analysis. Most of the novel domains are present in all major eukaryotic l...

    journal_title:Genome research

    pub_type: 信件

    doi:10.1101/gr.203201

    authors: Doerks T,Copley RR,Schultz J,Ponting CP,Bork P

    更新日期:2002-01-01 00:00:00

  • Complex genomic rearrangements lead to novel primate gene function.

    abstract::Orthologous genes that maintain a single-copy status in a broad range of species may indicate a selection against gene duplication. If this is the case, then duplicates of such genes that do survive may have escaped the dosage control by rapid and sizable changes in their function. To test this hypothesis and to devel...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3266405

    authors: Ciccarelli FD,von Mering C,Suyama M,Harrington ED,Izaurralde E,Bork P

    更新日期:2005-03-01 00:00:00

  • Pattern of sequence variation across 213 environmental response genes.

    abstract::To promote the clinical and epidemiological studies that improve our understanding of human genetic susceptibility to environmental exposure, the Environmental Genome Project (EGP) has scanned 213 environmental response genes involved in DNA repair, cell cycle regulation, apoptosis, and metabolism for single nucleotid...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2730004

    authors: Livingston RJ,von Niederhausern A,Jegga AG,Crawford DC,Carlson CS,Rieder MJ,Gowrisankar S,Aronow BJ,Weiss RB,Nickerson DA

    更新日期:2004-10-01 00:00:00

  • Exploring the human genome with functional maps.

    abstract::Human genomic data of many types are readily available, but the complexity and scale of human molecular biology make it difficult to integrate this body of data, understand it from a systems level, and apply it to the study of specific pathways or genetic disorders. An investigator could best explore a particular prot...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.082214.108

    authors: Huttenhower C,Haley EM,Hibbs MA,Dumeaux V,Barrett DR,Coller HA,Troyanskaya OG

    更新日期:2009-06-01 00:00:00

  • Introgression maintains the genetic integrity of the mating-type determining chromosome of the fungus Neurospora tetrasperma.

    abstract::Genome evolution is driven by a complex interplay of factors, including selection, recombination, and introgression. The regions determining sexual identity are particularly dynamic parts of eukaryotic genomes that are prone to molecular degeneration associated with suppressed recombination. In the fungus Neurospora t...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.197244.115

    authors: Corcoran P,Anderson JL,Jacobson DJ,Sun Y,Ni P,Lascoux M,Johannesson H

    更新日期:2016-04-01 00:00:00

  • The marine bacterium Pseudoalteromonas haloplanktis has a complex genome structure composed of two separate genetic units.

    abstract::The genome size of Pseudoalteromonas haloplanktis, a ubiquitous and easily cultured marine bacterium, was measured as a step toward estimating the genome complexity of marine bacterioplankton. To determine total genome size, we digested P. haloplanktis DNA with the restriction endonucleases Notl and Sfil, separated th...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6.12.1160

    authors: Lanoil BD,Ciuffetti LM,Giovannoni SJ

    更新日期:1996-12-01 00:00:00

  • Evaluation of predicted network modules in yeast metabolism using NMR-based metabolite profiling.

    abstract::Genome-scale metabolic models promise important insights into cell function. However, the definition of pathways and functional network modules within these models, and in the biochemical literature in general, is often based on intuitive reasoning. Although mathematical methods have been proposed to identify modules,...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5662207

    authors: Bundy JG,Papp B,Harmston R,Browne RA,Clayson EM,Burton N,Reece RJ,Oliver SG,Brindle KM

    更新日期:2007-04-01 00:00:00

  • Functional DNA methylation differences between tissues, cell types, and across individuals discovered using the M&M algorithm.

    abstract::DNA methylation plays key roles in diverse biological processes such as X chromosome inactivation, transposable element repression, genomic imprinting, and tissue-specific gene expression. Sequencing-based DNA methylation profiling provides an unprecedented opportunity to map and compare complete DNA methylomes. This ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.156539.113

    authors: Zhang B,Zhou Y,Lin N,Lowdon RF,Hong C,Nagarajan RP,Cheng JB,Li D,Stevens M,Lee HJ,Xing X,Zhou J,Sundaram V,Elliott G,Gu J,Shi T,Gascard P,Sigaroudinia M,Tlsty TD,Kadlecek T,Weiss A,O'Geen H,Farnham PJ,Maire

    更新日期:2013-09-01 00:00:00

  • Genome-wide analyses of alternative splicing in plants: opportunities and challenges.

    abstract::Alternative splicing (AS) creates multiple mRNA transcripts from a single gene. While AS is known to contribute to gene regulation and proteome diversity in animals, the study of its importance in plants is in its early stages. However, recently available plant genome and transcript sequence data sets are enabling a g...

    journal_title:Genome research

    pub_type: 杂志文章,评审

    doi:10.1101/gr.053678.106

    authors: Barbazuk WB,Fu Y,McGinnis KM

    更新日期:2008-09-01 00:00:00

  • Identification of complex genomic rearrangements in cancers using CouGaR.

    abstract::The genomic alterations associated with cancers are numerous and varied, involving both isolated and large-scale complex genomic rearrangements (CGRs). Although the underlying mechanisms are not well understood, CGRs have been implicated in tumorigenesis. Here, we introduce CouGaR, a novel method for characterizing th...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.211201.116

    authors: Dzamba M,Ramani AK,Buczkowicz P,Jiang Y,Yu M,Hawkins C,Brudno M

    更新日期:2017-01-01 00:00:00

  • Long noncoding RNAs in C. elegans.

    abstract::Thousands of long noncoding RNAs (lncRNAs) have been found in vertebrate animals, a few of which have known biological roles. To better understand the genomics and features of lncRNAs in invertebrates, we used available RNA-seq, poly(A)-site, and ribosome-mapping data to identify lncRNAs of Caenorhabditis elegans. We ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.140475.112

    authors: Nam JW,Bartel DP

    更新日期:2012-12-01 00:00:00

  • Retroelement distributions in the human genome: variations associated with age and proximity to genes.

    abstract::Remnants of more than 3 million transposable elements, primarily retroelements, comprise nearly half of the human genome and have generated much speculation concerning their evolutionary significance. We have exploited the draft human genome sequence to examine the distributions of retroelements on a genome-wide scale...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.388902

    authors: Medstrand P,van de Lagemaat LN,Mager DL

    更新日期:2002-10-01 00:00:00

  • Retrotransposon Ty1 integration targets specifically positioned asymmetric nucleosomal DNA segments in tRNA hotspots.

    abstract::The Saccharomyces cerevisiae genome contains about 35 copies of dispersed retrotransposons called Ty1 elements. Ty1 elements target regions upstream of tRNA genes and other Pol III-transcribed genes when retrotransposing to new sites. We used deep sequencing of Ty1-flanking sequence amplicons to characterize Ty1 integ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.129460.111

    authors: Mularoni L,Zhou Y,Bowen T,Gangadharan S,Wheelan SJ,Boeke JD

    更新日期:2012-04-01 00:00:00

  • Construction of a linkage map of the medaka (Oryzias latipes) and mapping of the Da mutant locus defective in dorsoventral patterning.

    abstract::Double anal fin (Da) is a medaka with an autosomal semidominant mutation that causes mirror image duplication of the ventral region concentrating on the caudal region. The chromosomal location of the Da gene and its sequence have remained unknown. We constructed a medaka linkage map as a first step to approach positio...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.9.12.1277

    authors: Ohtsuka M,Makino S,Yoda K,Wada H,Naruse K,Mitani H,Shima A,Ozato K,Kimura M,Inoko H

    更新日期:1999-12-01 00:00:00

  • DNA enrichment by allele-specific hybridization (DEASH): a novel method for haplotyping and for detecting low-frequency base substitutional variants and recombinant DNA molecules.

    abstract::Detecting rare sequence variants in genomic DNA is central to the analysis of de novo mutation and recombination events and the detection of rare pathological mutations in mixed cell populations. Current PCR techniques suffer from noise that limits detection to variants present at a frequency of at least 10(-4)-10(-5)...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1214603

    authors: Jeffreys AJ,May CA

    更新日期:2003-10-01 00:00:00

  • Genome dynamics in aging mice.

    abstract::Random spontaneous genome rearrangements are difficult to detect in vivo, especially in postmitotic tissues. Using a lacZ-plasmid reporter mouse model, we have previously presented evidence for the accumulation of large genome rearrangements in various tissues, including postmitotic tissues, during aging. These rearra...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.125502

    authors: Dollé ME,Vijg J

    更新日期:2002-11-01 00:00:00

  • Short-insert libraries as a method of problem solving in genome sequencing.

    abstract::As the Human Genome Project moves into its sequencing phase, a serious problem has arisen. The same problem has been increasingly vexing in the closing phase of the Caenorhabditis elegans project. The difficulty lies in sequencing efficiently through certain regions in which the templates (DNA substrates for the seque...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.8.5.562

    authors: McMurray AA,Sulston JE,Quail MA

    更新日期:1998-05-01 00:00:00

  • An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data.

    abstract::Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 5×) sequencing of po...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.146084.112

    authors: Wang Y,Lu J,Yu J,Gibbs RA,Yu F

    更新日期:2013-05-01 00:00:00

  • Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context.

    abstract::Gene order in prokaryotes is conserved to a much lesser extent than protein sequences. Only several operons, primarily those that code for physically interacting proteins, are conserved in all or most of the bacterial and archaeal genomes. Nevertheless, even the limited conservation of operon organization that is obse...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.gr-1619r

    authors: Wolf YI,Rogozin IB,Kondrashov AS,Koonin EV

    更新日期:2001-03-01 00:00:00

  • The identification and functional annotation of RNA structures conserved in vertebrates.

    abstract::Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than seq...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.208652.116

    authors: Seemann SE,Mirza AH,Hansen C,Bang-Berthelsen CH,Garde C,Christensen-Dalsgaard M,Torarinsson E,Yao Z,Workman CT,Pociot F,Nielsen H,Tommerup N,Ruzzo WL,Gorodkin J

    更新日期:2017-08-01 00:00:00

  • Extensive variation and low heritability of DNA methylation identified in a twin study.

    abstract::Disturbance of DNA methylation leading to aberrant gene expression has been implicated in the etiology of many diseases. Whereas variation at the genetic level has been studied extensively, less is known about the extent and function of epigenetic variation. To explore variation and heritability of DNA methylation, we...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.119685.110

    authors: Gervin K,Hammerø M,Akselsen HE,Moe R,Nygård H,Brandt I,Gjessing HK,Harris JR,Undlien DE,Lyle R

    更新日期:2011-11-01 00:00:00

  • The genome sequence of Mycoplasma mycoides subsp. mycoides SC type strain PG1T, the causative agent of contagious bovine pleuropneumonia (CBPP).

    abstract::Mycoplasma mycoides subsp. mycoidesSC (MmymySC)is the etiological agent of contagious bovine pleuropneumonia (CBPP), a highly contagious respiratory disease in cattle. The genome of Mmymy SC type strain PG1(T) has been sequenced to map all the genes and to facilitate further studies regarding the cell function of the ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1673304

    authors: Westberg J,Persson A,Holmberg A,Goesmann A,Lundeberg J,Johansson KE,Pettersson B,Uhlén M

    更新日期:2004-02-01 00:00:00

  • Interaction between the X chromosome and an autosome regulates size sexual dimorphism in Portuguese Water Dogs.

    abstract::Size sexual dimorphism occurs in almost all mammals. In Portuguese Water Dogs, much of the difference in skeletal size between females and males is due to the interaction between a Quantitative Trait Locus (QTL) on the X-chromosome and a QTL linked to Insulin-like Growth Factor 1 (IGF-1) on the CFA 15 autosome. In fem...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3712705

    authors: Chase K,Carrier DR,Adler FR,Ostrander EA,Lark KG

    更新日期:2005-12-01 00:00:00

  • Widespread somatic L1 retrotransposition occurs early during gastrointestinal cancer evolution.

    abstract::Somatic L1 retrotransposition events have been shown to occur in epithelial cancers. Here, we attempted to determine how early somatic L1 insertions occurred during the development of gastrointestinal (GI) cancers. Using L1-targeted resequencing (L1-seq), we studied different stages of four colorectal cancers arising ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.196238.115

    authors: Ewing AD,Gacita A,Wood LD,Ma F,Xing D,Kim MS,Manda SS,Abril G,Pereira G,Makohon-Moore A,Looijenga LH,Gillis AJ,Hruban RH,Anders RA,Romans KE,Pandey A,Iacobuzio-Donahue CA,Vogelstein B,Kinzler KW,Kazazian HH Jr,Sol

    更新日期:2015-10-01 00:00:00

  • Thermophilic bacteria strictly obey Szybalski's transcription direction rule and politely purine-load RNAs with both adenine and guanine.

    abstract::When transcription is to the right of the promoter, the "top," mRNA-synonymous strand of DNA tends to be purine-rich. When transcription is to the left of the promoter, the top, mRNA-template strand tends to be pyrimidine-rich. This transcription-direction rule suggests that there has been an evolutionary selection pr...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.10.2.228

    authors: Lao PJ,Forsdyke DR

    更新日期:2000-02-01 00:00:00

  • CADLIVE dynamic simulator: direct link of biochemical networks to dynamic models.

    abstract::We have developed the CADLIVE (Computer-Aided Design of LIVing systEms) Simulator that provided a rule-based automatic way to convert biochemical network maps into dynamic models, which enables simulating their dynamics without going through all of the reactions down to the details of exact kinetic parameters. The sim...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3463705

    authors: Kurata H,Masaki K,Sumida Y,Iwasaki R

    更新日期:2005-04-01 00:00:00

  • Coding and noncoding variants in HFM1, MLH3, MSH4, MSH5, RNF212, and RNF212B affect recombination rate in cattle.

    abstract::We herein study genetic recombination in three cattle populations from France, New Zealand, and the Netherlands. We identify 2,395,177 crossover (CO) events in 94,516 male gametes, and 579,996 CO events in 25,332 female gametes. The average number of COs was found to be larger in males (23.3) than in females (21.4). T...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.204214.116

    authors: Kadri NK,Harland C,Faux P,Cambisano N,Karim L,Coppieters W,Fritz S,Mullaart E,Baurain D,Boichard D,Spelman R,Charlier C,Georges M,Druet T

    更新日期:2016-10-01 00:00:00

  • Pathway Processor: a tool for integrating whole-genome expression results into metabolic networks.

    abstract::We have developed a new tool to visualize expression data on metabolic pathways and to evaluate which metabolic pathways are most affected by transcriptional changes in whole-genome expression experiments. Using the Fisher Exact Test, the method scores biochemical pathways according to the probability that as many or ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.226602

    authors: Grosu P,Townsend JP,Hartl DL,Cavalieri D

    更新日期:2002-07-01 00:00:00