Whole-genome sequence assembly for mammalian genomes: Arachne 2.

Abstract:

:We previously described the whole-genome assembly program Arachne, presenting assemblies of simulated data for small to mid-sized genomes. Here we describe algorithmic adaptations to the program, allowing for assembly of mammalian-size genomes, and also improving the assembly of smaller genomes. Three principal changes were simultaneously made and applied to the assembly of the mouse genome, during a six-month period of development: (1) Supercontigs (scaffolds) were iteratively broken and rejoined using several criteria, yielding a 64-fold increase in length (N50), and apparent elimination of all global misjoins; (2) gaps between contigs in supercontigs were filled (partially or completely) by insertion of reads, as suggested by pairing within the supercontig, increasing the N50 contig length by 50%; (3) memory usage was reduced fourfold. The outcome of this mouse assembly and its analysis are described in (Mouse Genome Sequencing Consortium 2002).

journal_name

Genome Res

journal_title

Genome research

authors

Jaffe DB,Butler J,Gnerre S,Mauceli E,Lindblad-Toh K,Mesirov JP,Zody MC,Lander ES

doi

10.1101/gr.828403

subject

Has Abstract

pub_date

2003-01-01 00:00:00

pages

91-6

issue

1

eissn

1088-9051

issn

1549-5469

journal_volume

13

pub_type

杂志文章
  • A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction.

    abstract::The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the k-mer set memory (KSM), which consist...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.226852.117

    authors: Guo Y,Tian K,Zeng H,Guo X,Gifford DK

    更新日期:2018-06-01 00:00:00

  • Computational and experimental identification of mirtrons in Drosophila melanogaster and Caenorhabditis elegans.

    abstract::Mirtrons are intronic hairpin substrates of the dicing machinery that generate functional microRNAs. In this study, we describe experimental assays that defined the essential requirements for entry of introns into the mirtron pathway. These data informed a bioinformatic screen that effectively identified functional mi...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.113050.110

    authors: Chung WJ,Agius P,Westholm JO,Chen M,Okamura K,Robine N,Leslie CS,Lai EC

    更新日期:2011-02-01 00:00:00

  • A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast.

    abstract::It is widely accepted that newly arisen duplicate gene pairs experience an altered selective regime that is often manifested as an increase in the rate of protein sequence evolution. Many details about the nature of the rate acceleration remain unknown, however, including its typical magnitude and duration, and whethe...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6341207

    authors: Scannell DR,Wolfe KH

    更新日期:2008-01-01 00:00:00

  • Eukaryotic regulatory element conservation analysis and identification using comparative genomics.

    abstract::Comparative genomics is a promising approach to the challenging problem of eukaryotic regulatory element identification, because functional noncoding sequences may be conserved across species from evolutionary constraints. We systematically analyzed known human and Saccharomyces cerevisiae regulatory elements and disc...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1327604

    authors: Liu Y,Liu XS,Wei L,Altman RB,Batzoglou S

    更新日期:2004-03-01 00:00:00

  • Integration of the rat recombination and EST maps in the rat genomic sequence and comparative mapping analysis with the mouse genome.

    abstract::Inbred strains of the laboratory rat are widely used for identifying genetic regions involved in the control of complex quantitative phenotypes of biomedical importance. The draft genomic sequence of the rat now provides essential information for annotating rat quantitative trait locus (QTL) maps. Following the survey...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2001604

    authors: Wilder SP,Bihoreau MT,Argoud K,Watanabe TK,Lathrop M,Gauguier D

    更新日期:2004-04-01 00:00:00

  • Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee.

    abstract::To accelerate the molecular analysis of behavior in the honey bee (Apis mellifera), we created expressed sequence tag (EST) and cDNA microarray resources for the bee brain. Over 20,000 cDNA clones were partially sequenced from a normalized (and subsequently subtracted) library generated from adult A. mellifera brains....

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5302

    authors: Whitfield CW,Band MR,Bonaldo MF,Kumar CG,Liu L,Pardinas JR,Robertson HM,Soares MB,Robinson GE

    更新日期:2002-04-01 00:00:00

  • The TAGteam motif facilitates binding of 21 sequence-specific transcription factors in the Drosophila embryo.

    abstract::Highly overlapping patterns of genome-wide binding of many distinct transcription factors have been observed in worms, insects, and mammals, but the origins and consequences of this overlapping binding remain unclear. While analyzing chromatin immunoprecipitation data sets from 21 sequence-specific transcription facto...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.130682.111

    authors: Satija R,Bradley RK

    更新日期:2012-04-01 00:00:00

  • Efficient approach to unique single-nucleotide polymorphism discovery.

    abstract::Single-nucleotide polymorphisms (SNPs) are the most frequently found DNA sequence variations in the human genome. It has been argued that a dense set of SNP markers can be used to identify genetic factors associated with complex disease traits. Because all high-throughput genotyping methods require precise sequence kn...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:

    authors: Taillon-Miller P,Piernot EE,Kwok PY

    更新日期:1999-05-01 00:00:00

  • Broad-spectrum respiratory tract pathogen identification using resequencing DNA microarrays.

    abstract::The exponential growth of pathogen nucleic acid sequences available in public domain databases has invited their direct use in pathogen detection, identification, and surveillance strategies. DNA microarray technology has offered the potential for the direct DNA sequence analysis of a broad spectrum of pathogens of in...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4337206

    authors: Lin B,Wang Z,Vora GJ,Thornton JA,Schnur JM,Thach DC,Blaney KM,Ligler AG,Malanoski AP,Santiago J,Walter EA,Agan BK,Metzgar D,Seto D,Daum LT,Kruzelock R,Rowley RK,Hanson EH,Tibbetts C,Stenger DA

    更新日期:2006-04-01 00:00:00

  • HD-Marker: a highly multiplexed and flexible approach for targeted genotyping of more than 10,000 genes in a single-tube assay.

    abstract::Targeted genotyping of transcriptome-scale genetic markers is highly attractive for genetic, ecological, and evolutionary studies, but achieving this goal in a cost-effective manner remains a major challenge, especially for laboratories working on nonmodel organisms. Here, we develop a high-throughput, sequencing-base...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.235820.118

    authors: Lv J,Jiao W,Guo H,Liu P,Wang R,Zhang L,Zeng Q,Hu X,Bao Z,Wang S

    更新日期:2018-12-01 00:00:00

  • Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error.

    abstract::It is known that sequencing error can bias estimation of evolutionary or population genetic parameters. This problem is more prominent in deep resequencing studies because of their large sample size n, and a higher probability of error at each nucleotide site. We propose a new method based on the composite likelihood ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.097543.109

    authors: Liu X,Fu YX,Maxwell TJ,Boerwinkle E

    更新日期:2010-01-01 00:00:00

  • Rate of elongation by RNA polymerase II is associated with specific gene features and epigenetic modifications.

    abstract::The rate of transcription elongation plays an important role in the timing of expression of full-length transcripts as well as in the regulation of alternative splicing. In this study, we coupled Bru-seq technology with 5,6-dichlorobenzimidazole 1-β-D-ribofuranoside (DRB) to estimate the elongation rates of over 2000 ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.171405.113

    authors: Veloso A,Kirkconnell KS,Magnuson B,Biewen B,Paulsen MT,Wilson TE,Ljungman M

    更新日期:2014-06-01 00:00:00

  • Mutation scanning by meltMADGE: validations using BRCA1 and LDLR, and demonstration of the potential to identify severe, moderate, silent, rare, and paucimorphic mutations in the general population.

    abstract::We have developed a mutation-scanning approach suitable for whole population screening for unknown mutations. The method, meltMADGE, combines thermal ramp electrophoresis with MADGE to achieve suitable cost efficiency and throughput. The sensitivity was tested in blind trials using 54 amplicons representing the BRCA1 ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3313405

    authors: Alharbi KK,Aldahmesh MA,Spanakis E,Haddad L,Whittall RA,Chen XH,Rassoulian H,Smith MJ,Sillibourne J,Ball NJ,Graham NJ,Briggs PJ,Simpson IA,Phillips DI,Lawlor DA,Ye S,Humphries SE,Cooper C,Smith GD,Ebrahim S,Eccles

    更新日期:2005-07-01 00:00:00

  • A dynamic H3K27ac signature identifies VEGFA-stimulated endothelial enhancers and requires EP300 activity.

    abstract::Histone modifications are now well-established mediators of transcriptional programs that distinguish cell states. However, the kinetics of histone modification and their role in mediating rapid, signal-responsive gene expression changes has been little studied on a genome-wide scale. Vascular endothelial growth facto...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.149674.112

    authors: Zhang B,Day DS,Ho JW,Song L,Cao J,Christodoulou D,Seidman JG,Crawford GE,Park PJ,Pu WT

    更新日期:2013-06-01 00:00:00

  • Genome-wide map of regulatory interactions in the human genome.

    abstract::Increasing evidence suggests that interactions between regulatory genomic elements play an important role in regulating gene expression. We generated a genome-wide interaction map of regulatory elements in human cells (ENCODE tier 1 cells, K562, GM12878) using Chromatin Interaction Analysis by Paired-End Tag sequencin...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.176586.114

    authors: Heidari N,Phanstiel DH,He C,Grubert F,Jahanbani F,Kasowski M,Zhang MQ,Snyder MP

    更新日期:2014-12-01 00:00:00

  • The origins and evolution of chromosomes, dosage compensation, and mechanisms underlying venom regulation in snakes.

    abstract::Here we use a chromosome-level genome assembly of a prairie rattlesnake (Crotalus viridis), together with Hi-C, RNA-seq, and whole-genome resequencing data, to study key features of genome biology and evolution in reptiles. We identify the rattlesnake Z Chromosome, including the recombining pseudoautosomal region, and...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.240952.118

    authors: Schield DR,Card DC,Hales NR,Perry BW,Pasquesi GM,Blackmon H,Adams RH,Corbin AB,Smith CF,Ramesh B,Demuth JP,Betrán E,Tollis M,Meik JM,Mackessy SP,Castoe TA

    更新日期:2019-04-01 00:00:00

  • Parente2: a fast and accurate method for detecting identity by descent.

    abstract::Identity-by-descent (IBD) inference is the problem of establishing a genetic connection between two individuals through a genomic segment that is inherited by both individuals from a recent common ancestor. IBD inference is an important preceding step in a variety of population genomic studies, ranging from demographi...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.173641.114

    authors: Rodriguez JM,Bercovici S,Huang L,Frostig R,Batzoglou S

    更新日期:2015-02-01 00:00:00

  • Assessment of genome-wide protein function classification for Drosophila melanogaster.

    abstract::The functional classification of genes on a genome-wide scale is now in its infancy, and we make a first attempt to assess existing methods and identify sources of error. To this end, we compared two independent efforts for associating proteins with functions, one implemented by FlyBase and the other by PANTHER at Cel...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.771603

    authors: Mi H,Vandergriff J,Campbell M,Narechania A,Majoros W,Lewis S,Thomas PD,Ashburner M

    更新日期:2003-09-01 00:00:00

  • Software for automated analysis of DNA fingerprinting gels.

    abstract::Here we describe software tools for the automated detection of DNA restriction fragments resolved on agarose fingerprinting gels. We present a mathematical model for the location and shape of the restriction fragments as a function of fragment size, with model parameters determined empirically from "marker" lanes cont...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.904303

    authors: Fuhrmann DR,Krzywinski MI,Chiu R,Saeedi P,Schein JE,Bosdet IE,Chinwalla A,Hillier LW,Waterston RH,McPherson JD,Jones SJ,Marra MA

    更新日期:2003-05-01 00:00:00

  • Phenotypically distinct female castes in honey bees are defined by alternative chromatin states during larval development.

    abstract::The capacity of the honey bee to produce three phenotypically distinct organisms (two female castes; queens and sterile workers, and haploid male drones) from one genotype represents one of the most remarkable examples of developmental plasticity in any phylum. The queen-worker morphological and reproductive divide is...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.236497.118

    authors: Wojciechowski M,Lowe R,Maleszka J,Conn D,Maleszka R,Hurd PJ

    更新日期:2018-10-01 00:00:00

  • Mouse population-guided resequencing reveals that variants in CD44 contribute to acetaminophen-induced liver injury in humans.

    abstract::Interindividual variability in response to chemicals and drugs is a common regulatory concern. It is assumed that xenobiotic-induced adverse reactions have a strong genetic basis, but many mechanism-based investigations have not been successful in identifying susceptible individuals. While recent advances in pharmacog...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.090241.108

    authors: Harrill AH,Watkins PB,Su S,Ross PK,Harbourt DE,Stylianou IM,Boorman GA,Russo MW,Sackler RS,Harris SC,Smith PC,Tennant R,Bogue M,Paigen K,Harris C,Contractor T,Wiltshire T,Rusyn I,Threadgill DW

    更新日期:2009-09-01 00:00:00

  • The nonessentiality of essential genes in yeast provides therapeutic insights into a human disease.

    abstract::Essential genes refer to those whose null mutation leads to lethality or sterility. Theoretical reasoning and empirical data both suggest that the fatal effect of inactivating an essential gene can be attributed to either the loss of indispensable core cellular function (Type I), or the gain of fatal side effects afte...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.205955.116

    authors: Chen P,Wang D,Chen H,Zhou Z,He X

    更新日期:2016-10-01 00:00:00

  • Shuffling of genes within low-copy repeats on 22q11 (LCR22) by Alu-mediated recombination events during evolution.

    abstract::Low-copy repeats, or segmental duplications, are highly dynamic regions in the genome. The low-copy repeats on chromosome 22q11.2 (LCR22) are a complex mosaic of genes and pseudogenes formed by duplication processes; they mediate chromosome rearrangements associated with velo-cardio-facial syndrome/DiGeorge syndrome, ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1549503

    authors: Babcock M,Pavlicek A,Spiteri E,Kashork CD,Ioshikhes I,Shaffer LG,Jurka J,Morrow BE

    更新日期:2003-12-01 00:00:00

  • Selective enrichment of damaged DNA molecules for ancient genome sequencing.

    abstract::Contamination by present-day human and microbial DNA is one of the major hindrances for large-scale genomic studies using ancient biological material. We describe a new molecular method, U selection, which exploits one of the most distinctive features of ancient DNA--the presence of deoxyuracils--for selective enrichm...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.174201.114

    authors: Gansauge MT,Meyer M

    更新日期:2014-09-01 00:00:00

  • Recompleting the Caenorhabditis elegans genome.

    abstract::Caenorhabditis elegans was the first multicellular eukaryotic genome sequenced to apparent completion. Although this assembly employed a standard C. elegans strain (N2), it used sequence data from several laboratories, with DNA propagated in bacteria and yeast. Thus, the N2 assembly has many differences from any C. el...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.244830.118

    authors: Yoshimura J,Ichikawa K,Shoura MJ,Artiles KL,Gabdank I,Wahba L,Smith CL,Edgley ML,Rougvie AE,Fire AZ,Morishita S,Schwarz EM

    更新日期:2019-06-01 00:00:00

  • Estimating coarse gene network structure from large-scale gene perturbation data.

    abstract::Large scale gene perturbation experiments generate information about the number of genes whose activity is directly or indirectly affected by a gene perturbation. From this information, one can numerically estimate coarse structural network features such as the total number of direct regulatory interactions and the nu...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.193902

    authors: Wagner A

    更新日期:2002-02-01 00:00:00

  • Extensive variation and low heritability of DNA methylation identified in a twin study.

    abstract::Disturbance of DNA methylation leading to aberrant gene expression has been implicated in the etiology of many diseases. Whereas variation at the genetic level has been studied extensively, less is known about the extent and function of epigenetic variation. To explore variation and heritability of DNA methylation, we...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.119685.110

    authors: Gervin K,Hammerø M,Akselsen HE,Moe R,Nygård H,Brandt I,Gjessing HK,Harris JR,Undlien DE,Lyle R

    更新日期:2011-11-01 00:00:00

  • HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient.

    abstract::Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance dependence. We present HiCRep, a framework for assessin...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.220640.117

    authors: Yang T,Zhang F,Yardımcı GG,Song F,Hardison RC,Noble WS,Yue F,Li Q

    更新日期:2017-11-01 00:00:00

  • Schizosaccharomyces pombe essential genes: a pilot study.

    abstract::After completion of the Schizosaccharomyces pombe genome sequence, we have carried out a pilot gene deletion project to assess the feasibility of a genome-wide deletion project and to estimate the percentage of essential genes. Using a PCR-based gene deletion procedure, we investigated 100 genes within a 253-kb region...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.636103

    authors: Decottignies A,Sanchez-Perez I,Nurse P

    更新日期:2003-03-01 00:00:00

  • Comparative sequence analyses reveal rapid and divergent evolutionary changes of the WFDC locus in the primate lineage.

    abstract::The initial comparison of the human and chimpanzee genome sequences revealed 16 genomic regions with an unusually high density of rapidly evolving genes. One such region is the whey acidic protein (WAP) four-disulfide core domain locus (or WFDC locus), which contains 14 WFDC genes organized in two subloci on human chr...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6004607

    authors: Hurle B,Swanson W,NISC Comparative Sequencing Program.,Green ED

    更新日期:2007-03-01 00:00:00