Abstract:
:Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here, we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and extremely repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807 nucleotides. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy.
journal_name
Genome Resjournal_title
Genome researchauthors
Zimin AV,Puiu D,Luo MC,Zhu T,Koren S,Marçais G,Yorke JA,Dvořák J,Salzberg SLdoi
10.1101/gr.213405.116subject
Has Abstractpub_date
2017-05-01 00:00:00pages
787-792issue
5eissn
1088-9051issn
1549-5469pii
gr.213405.116journal_volume
27pub_type
杂志文章相关文献
GENOME RESEARCH文献大全abstract::We have determined the complete sequence of 951,695 bp from the class I region of H2, the mouse major histocompatibility complex (Mhc) from strain 129/Sv (haplotype bc). The sequence contains 26 genes. The sequence spans from the last 50 kb of the H2-T region, including 2 class I genes and 3 class I pseudogenes, and i...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.975303
更新日期:2003-04-01 00:00:00
abstract::Double minutes (dmin) and homogeneously staining regions (hsr) are the cytogenetic hallmarks of genomic amplification in cancer. Different mechanisms have been proposed to explain their genesis. Recently, our group showed that the MYC-containing dmin in leukemia cases arise by excision and amplification (episome model...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.106252.110
更新日期:2010-09-01 00:00:00
abstract::A systematic computational analysis of protein sequences containing known nuclear domains led to the identification of 28 novel domain families. This represents a 26% increase in the starting set of 107 known nuclear domain families used for the analysis. Most of the novel domains are present in all major eukaryotic l...
journal_title:Genome research
pub_type: 信件
doi:10.1101/gr.203201
更新日期:2002-01-01 00:00:00
abstract::Orthologous genes that maintain a single-copy status in a broad range of species may indicate a selection against gene duplication. If this is the case, then duplicates of such genes that do survive may have escaped the dosage control by rapid and sizable changes in their function. To test this hypothesis and to devel...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.3266405
更新日期:2005-03-01 00:00:00
abstract::To promote the clinical and epidemiological studies that improve our understanding of human genetic susceptibility to environmental exposure, the Environmental Genome Project (EGP) has scanned 213 environmental response genes involved in DNA repair, cell cycle regulation, apoptosis, and metabolism for single nucleotid...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.2730004
更新日期:2004-10-01 00:00:00
abstract::Human genomic data of many types are readily available, but the complexity and scale of human molecular biology make it difficult to integrate this body of data, understand it from a systems level, and apply it to the study of specific pathways or genetic disorders. An investigator could best explore a particular prot...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.082214.108
更新日期:2009-06-01 00:00:00
abstract::Genome evolution is driven by a complex interplay of factors, including selection, recombination, and introgression. The regions determining sexual identity are particularly dynamic parts of eukaryotic genomes that are prone to molecular degeneration associated with suppressed recombination. In the fungus Neurospora t...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.197244.115
更新日期:2016-04-01 00:00:00
abstract::The genome size of Pseudoalteromonas haloplanktis, a ubiquitous and easily cultured marine bacterium, was measured as a step toward estimating the genome complexity of marine bacterioplankton. To determine total genome size, we digested P. haloplanktis DNA with the restriction endonucleases Notl and Sfil, separated th...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.6.12.1160
更新日期:1996-12-01 00:00:00
abstract::Genome-scale metabolic models promise important insights into cell function. However, the definition of pathways and functional network modules within these models, and in the biochemical literature in general, is often based on intuitive reasoning. Although mathematical methods have been proposed to identify modules,...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.5662207
更新日期:2007-04-01 00:00:00
abstract::DNA methylation plays key roles in diverse biological processes such as X chromosome inactivation, transposable element repression, genomic imprinting, and tissue-specific gene expression. Sequencing-based DNA methylation profiling provides an unprecedented opportunity to map and compare complete DNA methylomes. This ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.156539.113
更新日期:2013-09-01 00:00:00
abstract::Alternative splicing (AS) creates multiple mRNA transcripts from a single gene. While AS is known to contribute to gene regulation and proteome diversity in animals, the study of its importance in plants is in its early stages. However, recently available plant genome and transcript sequence data sets are enabling a g...
journal_title:Genome research
pub_type: 杂志文章,评审
doi:10.1101/gr.053678.106
更新日期:2008-09-01 00:00:00
abstract::The genomic alterations associated with cancers are numerous and varied, involving both isolated and large-scale complex genomic rearrangements (CGRs). Although the underlying mechanisms are not well understood, CGRs have been implicated in tumorigenesis. Here, we introduce CouGaR, a novel method for characterizing th...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.211201.116
更新日期:2017-01-01 00:00:00
abstract::Thousands of long noncoding RNAs (lncRNAs) have been found in vertebrate animals, a few of which have known biological roles. To better understand the genomics and features of lncRNAs in invertebrates, we used available RNA-seq, poly(A)-site, and ribosome-mapping data to identify lncRNAs of Caenorhabditis elegans. We ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.140475.112
更新日期:2012-12-01 00:00:00
abstract::Remnants of more than 3 million transposable elements, primarily retroelements, comprise nearly half of the human genome and have generated much speculation concerning their evolutionary significance. We have exploited the draft human genome sequence to examine the distributions of retroelements on a genome-wide scale...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.388902
更新日期:2002-10-01 00:00:00
abstract::The Saccharomyces cerevisiae genome contains about 35 copies of dispersed retrotransposons called Ty1 elements. Ty1 elements target regions upstream of tRNA genes and other Pol III-transcribed genes when retrotransposing to new sites. We used deep sequencing of Ty1-flanking sequence amplicons to characterize Ty1 integ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.129460.111
更新日期:2012-04-01 00:00:00
abstract::Double anal fin (Da) is a medaka with an autosomal semidominant mutation that causes mirror image duplication of the ventral region concentrating on the caudal region. The chromosomal location of the Da gene and its sequence have remained unknown. We constructed a medaka linkage map as a first step to approach positio...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.9.12.1277
更新日期:1999-12-01 00:00:00
abstract::Detecting rare sequence variants in genomic DNA is central to the analysis of de novo mutation and recombination events and the detection of rare pathological mutations in mixed cell populations. Current PCR techniques suffer from noise that limits detection to variants present at a frequency of at least 10(-4)-10(-5)...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.1214603
更新日期:2003-10-01 00:00:00
abstract::Random spontaneous genome rearrangements are difficult to detect in vivo, especially in postmitotic tissues. Using a lacZ-plasmid reporter mouse model, we have previously presented evidence for the accumulation of large genome rearrangements in various tissues, including postmitotic tissues, during aging. These rearra...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.125502
更新日期:2002-11-01 00:00:00
abstract::As the Human Genome Project moves into its sequencing phase, a serious problem has arisen. The same problem has been increasingly vexing in the closing phase of the Caenorhabditis elegans project. The difficulty lies in sequencing efficiently through certain regions in which the templates (DNA substrates for the seque...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.8.5.562
更新日期:1998-05-01 00:00:00
abstract::Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 5×) sequencing of po...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.146084.112
更新日期:2013-05-01 00:00:00
abstract::Gene order in prokaryotes is conserved to a much lesser extent than protein sequences. Only several operons, primarily those that code for physically interacting proteins, are conserved in all or most of the bacterial and archaeal genomes. Nevertheless, even the limited conservation of operon organization that is obse...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.gr-1619r
更新日期:2001-03-01 00:00:00
abstract::Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than seq...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.208652.116
更新日期:2017-08-01 00:00:00
abstract::Disturbance of DNA methylation leading to aberrant gene expression has been implicated in the etiology of many diseases. Whereas variation at the genetic level has been studied extensively, less is known about the extent and function of epigenetic variation. To explore variation and heritability of DNA methylation, we...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.119685.110
更新日期:2011-11-01 00:00:00
abstract::Mycoplasma mycoides subsp. mycoidesSC (MmymySC)is the etiological agent of contagious bovine pleuropneumonia (CBPP), a highly contagious respiratory disease in cattle. The genome of Mmymy SC type strain PG1(T) has been sequenced to map all the genes and to facilitate further studies regarding the cell function of the ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.1673304
更新日期:2004-02-01 00:00:00
abstract::Size sexual dimorphism occurs in almost all mammals. In Portuguese Water Dogs, much of the difference in skeletal size between females and males is due to the interaction between a Quantitative Trait Locus (QTL) on the X-chromosome and a QTL linked to Insulin-like Growth Factor 1 (IGF-1) on the CFA 15 autosome. In fem...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.3712705
更新日期:2005-12-01 00:00:00
abstract::Somatic L1 retrotransposition events have been shown to occur in epithelial cancers. Here, we attempted to determine how early somatic L1 insertions occurred during the development of gastrointestinal (GI) cancers. Using L1-targeted resequencing (L1-seq), we studied different stages of four colorectal cancers arising ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.196238.115
更新日期:2015-10-01 00:00:00
abstract::When transcription is to the right of the promoter, the "top," mRNA-synonymous strand of DNA tends to be purine-rich. When transcription is to the left of the promoter, the top, mRNA-template strand tends to be pyrimidine-rich. This transcription-direction rule suggests that there has been an evolutionary selection pr...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.10.2.228
更新日期:2000-02-01 00:00:00
abstract::We have developed the CADLIVE (Computer-Aided Design of LIVing systEms) Simulator that provided a rule-based automatic way to convert biochemical network maps into dynamic models, which enables simulating their dynamics without going through all of the reactions down to the details of exact kinetic parameters. The sim...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.3463705
更新日期:2005-04-01 00:00:00
abstract::We herein study genetic recombination in three cattle populations from France, New Zealand, and the Netherlands. We identify 2,395,177 crossover (CO) events in 94,516 male gametes, and 579,996 CO events in 25,332 female gametes. The average number of COs was found to be larger in males (23.3) than in females (21.4). T...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.204214.116
更新日期:2016-10-01 00:00:00
abstract::We have developed a new tool to visualize expression data on metabolic pathways and to evaluate which metabolic pathways are most affected by transcriptional changes in whole-genome expression experiments. Using the Fisher Exact Test, the method scores biochemical pathways according to the probability that as many or ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.226602
更新日期:2002-07-01 00:00:00