Detecting copy number variation with mated short reads.

Abstract:

:The development of high-throughput sequencing (HTS) technologies has opened the door to novel methods for detecting copy number variants (CNVs) in the human genome. While in the past CNVs have been detected based on array CGH data, recent studies have shown that depth-of-coverage information from HTS technologies can also be used for the reliable identification of large copy-variable regions. Such methods, however, are hindered by sequencing biases that lead certain regions of the genome to be over- or undersampled, lowering their resolution and ability to accurately identify the exact breakpoints of the variants. In this work, we develop a method for CNV detection that supplements the depth-of-coverage with paired-end mapping information, where mate pairs mapping discordantly to the reference serve to indicate the presence of variation. Our algorithm, called CNVer, combines this information within a unified computational framework called the donor graph, allowing us to better mitigate the sequencing biases that cause uneven local coverage and accurately predict CNVs. We use CNVer to detect 4879 CNVs in the recently described genome of a Yoruban individual. Most of the calls (77%) coincide with previously known variants within the Database of Genomic Variants, while 81% of deletion copy number variants previously known for this individual coincide with one of our loss calls. Furthermore, we demonstrate that CNVer can reconstruct the absolute copy counts of segments of the donor genome and evaluate the feasibility of using CNVer with low coverage datasets.

journal_name

Genome Res

journal_title

Genome research

authors

Medvedev P,Fiume M,Dzamba M,Smith T,Brudno M

doi

10.1101/gr.106344.110

subject

Has Abstract

pub_date

2010-11-01 00:00:00

pages

1613-22

issue

11

eissn

1088-9051

issn

1549-5469

pii

gr.106344.110

journal_volume

20

pub_type

杂志文章
  • Impact of genomic structural variation in Drosophila melanogaster based on population-scale sequencing.

    abstract::Genomic structural variation (SV) is a major determinant for phenotypic variation. Although it has been extensively studied in humans, the nucleotide resolution structure of SVs within the widely used model organism Drosophila remains unknown. We report a highly accurate, densely validated map of unbalanced SVs compri...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.142646.112

    authors: Zichner T,Garfield DA,Rausch T,Stütz AM,Cannavó E,Braun M,Furlong EE,Korbel JO

    更新日期:2013-03-01 00:00:00

  • Genomic organization of the dog dystroglycan gene DAG1 locus on chromosome 20q15.1-q15.2.

    abstract::Dystroglycan is a laminin binding protein, which provides a structural link between the subsarcolemmal cytoskeleton and the extracellular matrix. It is also involved in the organization of basement membranes. So far the genomic organization of the dystroglycan gene DAG1 has not been completely investigated. Here we re...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.10.3.295

    authors: Leeb T,Neumann S,Deppe A,Breen M,Brenig B

    更新日期:2000-03-01 00:00:00

  • Large-scale sequencing in human chromosome 12p13: experimental and computational gene structure determination.

    abstract::The detailed genomic organization of a gene-dense region at human chromosome 12p13, spanning 223 kb of contiguous sequence, was determined. This region is composed of 20 genes and several other expressed sequences. Experimental tools including RT-PCR and cDNA sequencing, combined with gene prediction programs, were ut...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.7.3.268

    authors: Ansari-Lari MA,Shen Y,Muzny DM,Lee W,Gibbs RA

    更新日期:1997-03-01 00:00:00

  • General gene movement off the X chromosome in the Drosophila genus.

    abstract::In Drosophila melanogaster, there is an excess of genes duplicated by retroposition from the X chromosome to the autosomes. Most of those retrogenes that originated on the X chromosome have testis expression pattern. These observations could be explained by natural selection favoring genes that avoided spermatogenesis...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.088609.108

    authors: Vibranovski MD,Zhang Y,Long M

    更新日期:2009-05-01 00:00:00

  • Nurture trumps nature in a longitudinal survey of salivary bacterial communities in twins from early adolescence to early adulthood.

    abstract::Variation in the composition of the human oral microbiome in health and disease has been observed. We have characterized inter- and intra-individual variation of microbial communities of 107 individuals in one of the largest cohorts to date (264 saliva samples), using culture-independent 16S rRNA pyrosequencing. We ex...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.140608.112

    authors: Stahringer SS,Clemente JC,Corley RP,Hewitt J,Knights D,Walters WA,Knight R,Krauter KS

    更新日期:2012-11-01 00:00:00

  • Whole population, genome-wide mapping of hidden relatedness.

    abstract::We present GERMLINE, a robust algorithm for identifying segmental sharing indicative of recent common ancestry between pairs of individuals. Unlike methods with comparable objectives, GERMLINE scales linearly with the number of samples, enabling analysis of whole-genome data in large cohorts. Our approach is based on ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.081398.108

    authors: Gusev A,Lowe JK,Stoffel M,Daly MJ,Altshuler D,Breslow JL,Friedman JM,Pe'er I

    更新日期:2009-02-01 00:00:00

  • HD-Marker: a highly multiplexed and flexible approach for targeted genotyping of more than 10,000 genes in a single-tube assay.

    abstract::Targeted genotyping of transcriptome-scale genetic markers is highly attractive for genetic, ecological, and evolutionary studies, but achieving this goal in a cost-effective manner remains a major challenge, especially for laboratories working on nonmodel organisms. Here, we develop a high-throughput, sequencing-base...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.235820.118

    authors: Lv J,Jiao W,Guo H,Liu P,Wang R,Zhang L,Zeng Q,Hu X,Bao Z,Wang S

    更新日期:2018-12-01 00:00:00

  • Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database.

    abstract::We present a novel web-based resource, Gene3D, of precalculated structural assignments to gene sequences and whole genomes. This resource assigns structural domains from the CATH database to whole genes and links these to their curated functional and structural annotations within the CATH domain structure database, th...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.213802

    authors: Buchan DW,Shepherd AJ,Lee D,Pearl FM,Rison SC,Thornton JM,Orengo CA

    更新日期:2002-03-01 00:00:00

  • Closing the gaps on human chromosome 19 revealed genes with a high density of repetitive tandemly arrayed elements.

    abstract::The reported human genome sequence includes about 400 gaps of unknown sequence that were not found in the bacterial artificial chromosome (BAC) and cosmid libraries used for sequencing of the genome. These missing sequences correspond to approximately 1% of euchromatic regions of the human genome. Gap filling is a lab...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1929904

    authors: Leem SH,Kouprina N,Grimwood J,Kim JH,Mullokandov M,Yoon YH,Chae JY,Morgan J,Lucas S,Richardson P,Detter C,Glavina T,Rubin E,Barrett JC,Larionov V

    更新日期:2004-02-01 00:00:00

  • Genome-wide mapping of human DNA-replication origins: levels of transcription at ORC1 sites regulate origin selection and replication timing.

    abstract::We report the genome-wide mapping of ORC1 binding sites in mammals, by chromatin immunoprecipitation and parallel sequencing (ChIP-seq). ORC1 binding sites in HeLa cells were validated as active DNA replication origins (ORIs) using Repli-seq, a method that allows identification of ORI-containing regions by parallel se...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.142331.112

    authors: Dellino GI,Cittaro D,Piccioni R,Luzi L,Banfi S,Segalla S,Cesaroni M,Mendoza-Maldonado R,Giacca M,Pelicci PG

    更新日期:2013-01-01 00:00:00

  • The genome-wide determinants of human and chimpanzee microsatellite evolution.

    abstract::Mutation rates of microsatellites vary greatly among loci. The causes of this heterogeneity remain largely enigmatic yet are crucial for understanding numerous human neurological diseases and genetic instability in cancer. In this first genome-wide study, the relative contributions of intrinsic features and regional g...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.7113408

    authors: Kelkar YD,Tyekucheva S,Chiaromonte F,Makova KD

    更新日期:2008-01-01 00:00:00

  • Capture of a functionally active methyl-CpG binding domain by an arthropod retrotransposon family.

    abstract::The repressive capacity of cytosine DNA methylation is mediated by recruitment of silencing complexes by methyl-CpG binding domain (MBD) proteins. Despite MBD proteins being associated with silencing, we discovered that a family of arthropod Copia retrotransposons have incorporated a host-derived MBD. We functionally ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.243774.118

    authors: de Mendoza A,Pflueger J,Lister R

    更新日期:2019-08-01 00:00:00

  • Shuffling of genes within low-copy repeats on 22q11 (LCR22) by Alu-mediated recombination events during evolution.

    abstract::Low-copy repeats, or segmental duplications, are highly dynamic regions in the genome. The low-copy repeats on chromosome 22q11.2 (LCR22) are a complex mosaic of genes and pseudogenes formed by duplication processes; they mediate chromosome rearrangements associated with velo-cardio-facial syndrome/DiGeorge syndrome, ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1549503

    authors: Babcock M,Pavlicek A,Spiteri E,Kashork CD,Ioshikhes I,Shaffer LG,Jurka J,Morrow BE

    更新日期:2003-12-01 00:00:00

  • Population genetic inference from genomic sequence variation.

    abstract::Population genetics has evolved from a theory-driven field with little empirical data into a data-driven discipline in which genome-scale data sets test the limits of available models and computational analysis methods. In humans and a few model organisms, analyses of whole-genome sequence polymorphism data are curren...

    journal_title:Genome research

    pub_type: 杂志文章,评审

    doi:10.1101/gr.079509.108

    authors: Pool JE,Hellmann I,Jensen JD,Nielsen R

    更新日期:2010-03-01 00:00:00

  • Genome-wide A-to-I RNA editing in fungi independent of ADAR enzymes.

    abstract::Yeasts and filamentous fungi do not have adenosine deaminase acting on RNA (ADAR) orthologs and are believed to lack A-to-I RNA editing, which is the most prevalent editing of mRNA in animals. However, during this study with the PUK1(FGRRES_01058) pseudokinase gene important for sexual reproduction in Fusarium gramine...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.199877.115

    authors: Liu H,Wang Q,He Y,Chen L,Hao C,Jiang C,Li Y,Dai Y,Kang Z,Xu JR

    更新日期:2016-04-01 00:00:00

  • Optical mapping of BAC clones from the human Y chromosome DAZ locus.

    abstract::The accurate mapping of clones derived from genomic regions containing complex arrangements of repeated elements presents special problems for DNA sequencers. Recent advances in the automation of optical mapping have enabled us to map a set of 16 BAC clones derived from the DAZ locus of the human Y chromosome long arm...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.112100

    authors: Giacalone J,Delobette S,Gibaja V,Ni L,Skiadas Y,Qi R,Edington J,Lai Z,Gebauer D,Zhao H,Anantharaman T,Mishra B,Brown LG,Saxena R,Page DC,Schwartz DC

    更新日期:2000-09-01 00:00:00

  • Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.

    abstract::Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data al...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.213405.116

    authors: Zimin AV,Puiu D,Luo MC,Zhu T,Koren S,Marçais G,Yorke JA,Dvořák J,Salzberg SL

    更新日期:2017-05-01 00:00:00

  • CBX3 regulates efficient RNA processing genome-wide.

    abstract::CBX5, CBX1, and CBX3 (HP1α, β, and γ, respectively) play an evolutionarily conserved role in the formation and maintenance of heterochromatin. In addition, CBX5, CBX1, and CBX3 may also participate in transcriptional regulation of genes. Recently, CBX3 binding to the bodies of a subset of genes has been observed in hu...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.124818.111

    authors: Smallwood A,Hon GC,Jin F,Henry RE,Espinosa JM,Ren B

    更新日期:2012-08-01 00:00:00

  • Centromere repositioning.

    abstract::Primate pericentromeric regions recently have been shown to exhibit extraordinary evolutionary plasticity. In this paper we report an additional peculiar feature of these regions that we discovered while analyzing, by FISH, the evolutionary conservation of primate phylogenetic chromosome IX. If the position of the cen...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.9.12.1184

    authors: Montefalcone G,Tempesta S,Rocchi M,Archidiacono N

    更新日期:1999-12-01 00:00:00

  • Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP.

    abstract::In the attempt to understand human variation and the genetic basis of complex disease, a tremendous number of single nucleotide polymorphisms (SNPs) have been discovered and deposited into NCBI's dbSNP public database. More than 2.7 million SNPs in the database have genotype information. This data provides an invaluab...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4297805

    authors: Zaitlen NA,Kang HM,Feolo ML,Sherry ST,Halperin E,Eskin E

    更新日期:2005-11-01 00:00:00

  • Genomics and hearing impairment.

    abstract::Hearing impairment is clinically and genetically heterogeneous. There are >400 disorders in which hearing impairment is a characteristic of the syndrome, and family studies demonstrate that there are at least 30 autosomal loci for nonsyndromic hearing impairment. The genes that have been identified encode diaphanous (...

    journal_title:Genome research

    pub_type: 历史文章,杂志文章,评审

    doi:

    authors: Keats BJ,Berlin CI

    更新日期:1999-01-01 00:00:00

  • The effect of genotype and in utero environment on interindividual variation in neonate DNA methylomes.

    abstract::Integrating the genotype with epigenetic marks holds the promise of better understanding the biology that underlies the complex interactions of inherited and environmental components that define the developmental origins of a range of disorders. The quality of the in utero environment significantly influences health o...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.171439.113

    authors: Teh AL,Pan H,Chen L,Ong ML,Dogra S,Wong J,MacIsaac JL,Mah SM,McEwen LM,Saw SM,Godfrey KM,Chong YS,Kwek K,Kwoh CK,Soh SE,Chong MF,Barton S,Karnani N,Cheong CY,Buschdorf JP,Stünkel W,Kobor MS,Meaney MJ,Gluckma

    更新日期:2014-07-01 00:00:00

  • rVista for comparative sequence-based discovery of functional transcription factor binding sites.

    abstract::Identifying transcriptional regulatory elements represents a significant challenge in annotating the genomes of higher vertebrates. We have developed a computational tool, rVista, for high-throughput discovery of cis-regulatory elements that combines clustering of predicted transcription factor binding sites (TFBSs) a...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.225502

    authors: Loots GG,Ovcharenko I,Pachter L,Dubchak I,Rubin EM

    更新日期:2002-05-01 00:00:00

  • End Sequence Analysis Toolkit (ESAT) expands the extractable information from single-cell RNA-seq data.

    abstract::RNA-seq protocols that focus on transcript termini are well suited for applications in which template quantity is limiting. Here we show that, when applied to end-sequencing data, analytical methods designed for global RNA-seq produce computational artifacts. To remedy this, we created the End Sequence Analysis Toolki...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.207902.116

    authors: Derr A,Yang C,Zilionis R,Sergushichev A,Blodgett DM,Redick S,Bortell R,Luban J,Harlan DM,Kadener S,Greiner DL,Klein A,Artyomov MN,Garber M

    更新日期:2016-10-01 00:00:00

  • Species-specific class I gene expansions formed the telomeric 1 mb of the mouse major histocompatibility complex.

    abstract::We have determined the complete sequence of 951,695 bp from the class I region of H2, the mouse major histocompatibility complex (Mhc) from strain 129/Sv (haplotype bc). The sequence contains 26 genes. The sequence spans from the last 50 kb of the H2-T region, including 2 class I genes and 3 class I pseudogenes, and i...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.975303

    authors: Takada T,Kumánovics A,Amadou C,Yoshino M,Jones EP,Athanasiou M,Evans GA,Fischer Lindahl K

    更新日期:2003-04-01 00:00:00

  • Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment.

    abstract::A new algorithm, WABA, was developed for doing large-scale alignments between genomic DNA of different species. WABA was used to align 8 million bases of Caenorhabditis briggsae genomic DNA against the entire 97-million-base Caenorhabditis elegans genome. The alignment, including C. briggsae homologs of 154 geneticall...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.10.8.1115

    authors: Kent WJ,Zahler AM

    更新日期:2000-08-01 00:00:00

  • High-salt-recovered sequences are associated with the active chromosomal compartment and with large ribonucleoprotein complexes including nuclear bodies.

    abstract::The mammalian cell nucleus contains numerous discrete suborganelles named nuclear bodies. While recruitment of specific genomic regions into these large ribonucleoprotein (RNP) complexes critically contributes to higher-order functional chromatin organization, such regions remain ill-defined. We have developed the hig...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.237073.118

    authors: Baudement MO,Cournac A,Court F,Seveno M,Parrinello H,Reynes C,Sabatier R,Bouschet T,Yi Z,Sallis S,Tancelin M,Rebouissou C,Cathala G,Lesne A,Mozziconacci J,Journot L,Forné T

    更新日期:2018-11-01 00:00:00

  • Evolution of transcript modification by N6-methyladenosine in primates.

    abstract::Phenotypic differences within populations and between closely related species are often driven by variation and evolution of gene expression. However, most analyses have focused on the effects of genomic variation at cis-regulatory elements such as promoters and enhancers that control transcriptional activity, and lit...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.212563.116

    authors: Ma L,Zhao B,Chen K,Thomas A,Tuteja JH,He X,He C,White KP

    更新日期:2017-03-01 00:00:00

  • Dynamic changes in replication timing and gene expression during lineage specification of human pluripotent stem cells.

    abstract::Duplication of the genome in mammalian cells occurs in a defined temporal order referred to as its replication-timing (RT) program. RT changes dynamically during development, regulated in units of 400-800 kb referred to as replication domains (RDs). Changes in RT are generally coordinated with transcriptional competen...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.187989.114

    authors: Rivera-Mulia JC,Buckley Q,Sasaki T,Zimmerman J,Didier RA,Nazor K,Loring JF,Lian Z,Weissman S,Robins AJ,Schulz TC,Menendez L,Kulik MJ,Dalton S,Gabr H,Kahveci T,Gilbert DM

    更新日期:2015-08-01 00:00:00

  • An abundance of bidirectional promoters in the human genome.

    abstract::The alignment of full-length human cDNA sequences to the finished sequence of the human genome provides a unique opportunity to study the distribution of genes throughout the genome. By analyzing the distances between 23,752 genes, we identified a class of divergently transcribed gene pairs, representing more than 10%...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1982804

    authors: Trinklein ND,Aldred SF,Hartman SJ,Schroeder DI,Otillar RP,Myers RM

    更新日期:2004-01-01 00:00:00