Abstract:
:The development of high-throughput sequencing (HTS) technologies has opened the door to novel methods for detecting copy number variants (CNVs) in the human genome. While in the past CNVs have been detected based on array CGH data, recent studies have shown that depth-of-coverage information from HTS technologies can also be used for the reliable identification of large copy-variable regions. Such methods, however, are hindered by sequencing biases that lead certain regions of the genome to be over- or undersampled, lowering their resolution and ability to accurately identify the exact breakpoints of the variants. In this work, we develop a method for CNV detection that supplements the depth-of-coverage with paired-end mapping information, where mate pairs mapping discordantly to the reference serve to indicate the presence of variation. Our algorithm, called CNVer, combines this information within a unified computational framework called the donor graph, allowing us to better mitigate the sequencing biases that cause uneven local coverage and accurately predict CNVs. We use CNVer to detect 4879 CNVs in the recently described genome of a Yoruban individual. Most of the calls (77%) coincide with previously known variants within the Database of Genomic Variants, while 81% of deletion copy number variants previously known for this individual coincide with one of our loss calls. Furthermore, we demonstrate that CNVer can reconstruct the absolute copy counts of segments of the donor genome and evaluate the feasibility of using CNVer with low coverage datasets.
journal_name
Genome Resjournal_title
Genome researchauthors
Medvedev P,Fiume M,Dzamba M,Smith T,Brudno Mdoi
10.1101/gr.106344.110subject
Has Abstractpub_date
2010-11-01 00:00:00pages
1613-22issue
11eissn
1088-9051issn
1549-5469pii
gr.106344.110journal_volume
20pub_type
杂志文章相关文献
GENOME RESEARCH文献大全abstract::Genomic structural variation (SV) is a major determinant for phenotypic variation. Although it has been extensively studied in humans, the nucleotide resolution structure of SVs within the widely used model organism Drosophila remains unknown. We report a highly accurate, densely validated map of unbalanced SVs compri...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.142646.112
更新日期:2013-03-01 00:00:00
abstract::Dystroglycan is a laminin binding protein, which provides a structural link between the subsarcolemmal cytoskeleton and the extracellular matrix. It is also involved in the organization of basement membranes. So far the genomic organization of the dystroglycan gene DAG1 has not been completely investigated. Here we re...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.10.3.295
更新日期:2000-03-01 00:00:00
abstract::The detailed genomic organization of a gene-dense region at human chromosome 12p13, spanning 223 kb of contiguous sequence, was determined. This region is composed of 20 genes and several other expressed sequences. Experimental tools including RT-PCR and cDNA sequencing, combined with gene prediction programs, were ut...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.7.3.268
更新日期:1997-03-01 00:00:00
abstract::In Drosophila melanogaster, there is an excess of genes duplicated by retroposition from the X chromosome to the autosomes. Most of those retrogenes that originated on the X chromosome have testis expression pattern. These observations could be explained by natural selection favoring genes that avoided spermatogenesis...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.088609.108
更新日期:2009-05-01 00:00:00
abstract::Variation in the composition of the human oral microbiome in health and disease has been observed. We have characterized inter- and intra-individual variation of microbial communities of 107 individuals in one of the largest cohorts to date (264 saliva samples), using culture-independent 16S rRNA pyrosequencing. We ex...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.140608.112
更新日期:2012-11-01 00:00:00
abstract::We present GERMLINE, a robust algorithm for identifying segmental sharing indicative of recent common ancestry between pairs of individuals. Unlike methods with comparable objectives, GERMLINE scales linearly with the number of samples, enabling analysis of whole-genome data in large cohorts. Our approach is based on ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.081398.108
更新日期:2009-02-01 00:00:00
abstract::Targeted genotyping of transcriptome-scale genetic markers is highly attractive for genetic, ecological, and evolutionary studies, but achieving this goal in a cost-effective manner remains a major challenge, especially for laboratories working on nonmodel organisms. Here, we develop a high-throughput, sequencing-base...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.235820.118
更新日期:2018-12-01 00:00:00
abstract::We present a novel web-based resource, Gene3D, of precalculated structural assignments to gene sequences and whole genomes. This resource assigns structural domains from the CATH database to whole genes and links these to their curated functional and structural annotations within the CATH domain structure database, th...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.213802
更新日期:2002-03-01 00:00:00
abstract::The reported human genome sequence includes about 400 gaps of unknown sequence that were not found in the bacterial artificial chromosome (BAC) and cosmid libraries used for sequencing of the genome. These missing sequences correspond to approximately 1% of euchromatic regions of the human genome. Gap filling is a lab...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.1929904
更新日期:2004-02-01 00:00:00
abstract::We report the genome-wide mapping of ORC1 binding sites in mammals, by chromatin immunoprecipitation and parallel sequencing (ChIP-seq). ORC1 binding sites in HeLa cells were validated as active DNA replication origins (ORIs) using Repli-seq, a method that allows identification of ORI-containing regions by parallel se...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.142331.112
更新日期:2013-01-01 00:00:00
abstract::Mutation rates of microsatellites vary greatly among loci. The causes of this heterogeneity remain largely enigmatic yet are crucial for understanding numerous human neurological diseases and genetic instability in cancer. In this first genome-wide study, the relative contributions of intrinsic features and regional g...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.7113408
更新日期:2008-01-01 00:00:00
abstract::The repressive capacity of cytosine DNA methylation is mediated by recruitment of silencing complexes by methyl-CpG binding domain (MBD) proteins. Despite MBD proteins being associated with silencing, we discovered that a family of arthropod Copia retrotransposons have incorporated a host-derived MBD. We functionally ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.243774.118
更新日期:2019-08-01 00:00:00
abstract::Low-copy repeats, or segmental duplications, are highly dynamic regions in the genome. The low-copy repeats on chromosome 22q11.2 (LCR22) are a complex mosaic of genes and pseudogenes formed by duplication processes; they mediate chromosome rearrangements associated with velo-cardio-facial syndrome/DiGeorge syndrome, ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.1549503
更新日期:2003-12-01 00:00:00
abstract::Population genetics has evolved from a theory-driven field with little empirical data into a data-driven discipline in which genome-scale data sets test the limits of available models and computational analysis methods. In humans and a few model organisms, analyses of whole-genome sequence polymorphism data are curren...
journal_title:Genome research
pub_type: 杂志文章,评审
doi:10.1101/gr.079509.108
更新日期:2010-03-01 00:00:00
abstract::Yeasts and filamentous fungi do not have adenosine deaminase acting on RNA (ADAR) orthologs and are believed to lack A-to-I RNA editing, which is the most prevalent editing of mRNA in animals. However, during this study with the PUK1(FGRRES_01058) pseudokinase gene important for sexual reproduction in Fusarium gramine...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.199877.115
更新日期:2016-04-01 00:00:00
abstract::The accurate mapping of clones derived from genomic regions containing complex arrangements of repeated elements presents special problems for DNA sequencers. Recent advances in the automation of optical mapping have enabled us to map a set of 16 BAC clones derived from the DAZ locus of the human Y chromosome long arm...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.112100
更新日期:2000-09-01 00:00:00
abstract::Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data al...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.213405.116
更新日期:2017-05-01 00:00:00
abstract::CBX5, CBX1, and CBX3 (HP1α, β, and γ, respectively) play an evolutionarily conserved role in the formation and maintenance of heterochromatin. In addition, CBX5, CBX1, and CBX3 may also participate in transcriptional regulation of genes. Recently, CBX3 binding to the bodies of a subset of genes has been observed in hu...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.124818.111
更新日期:2012-08-01 00:00:00
abstract::Primate pericentromeric regions recently have been shown to exhibit extraordinary evolutionary plasticity. In this paper we report an additional peculiar feature of these regions that we discovered while analyzing, by FISH, the evolutionary conservation of primate phylogenetic chromosome IX. If the position of the cen...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.9.12.1184
更新日期:1999-12-01 00:00:00
abstract::In the attempt to understand human variation and the genetic basis of complex disease, a tremendous number of single nucleotide polymorphisms (SNPs) have been discovered and deposited into NCBI's dbSNP public database. More than 2.7 million SNPs in the database have genotype information. This data provides an invaluab...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.4297805
更新日期:2005-11-01 00:00:00
abstract::Hearing impairment is clinically and genetically heterogeneous. There are >400 disorders in which hearing impairment is a characteristic of the syndrome, and family studies demonstrate that there are at least 30 autosomal loci for nonsyndromic hearing impairment. The genes that have been identified encode diaphanous (...
journal_title:Genome research
pub_type: 历史文章,杂志文章,评审
doi:
更新日期:1999-01-01 00:00:00
abstract::Integrating the genotype with epigenetic marks holds the promise of better understanding the biology that underlies the complex interactions of inherited and environmental components that define the developmental origins of a range of disorders. The quality of the in utero environment significantly influences health o...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.171439.113
更新日期:2014-07-01 00:00:00
abstract::Identifying transcriptional regulatory elements represents a significant challenge in annotating the genomes of higher vertebrates. We have developed a computational tool, rVista, for high-throughput discovery of cis-regulatory elements that combines clustering of predicted transcription factor binding sites (TFBSs) a...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.225502
更新日期:2002-05-01 00:00:00
abstract::RNA-seq protocols that focus on transcript termini are well suited for applications in which template quantity is limiting. Here we show that, when applied to end-sequencing data, analytical methods designed for global RNA-seq produce computational artifacts. To remedy this, we created the End Sequence Analysis Toolki...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.207902.116
更新日期:2016-10-01 00:00:00
abstract::We have determined the complete sequence of 951,695 bp from the class I region of H2, the mouse major histocompatibility complex (Mhc) from strain 129/Sv (haplotype bc). The sequence contains 26 genes. The sequence spans from the last 50 kb of the H2-T region, including 2 class I genes and 3 class I pseudogenes, and i...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.975303
更新日期:2003-04-01 00:00:00
abstract::A new algorithm, WABA, was developed for doing large-scale alignments between genomic DNA of different species. WABA was used to align 8 million bases of Caenorhabditis briggsae genomic DNA against the entire 97-million-base Caenorhabditis elegans genome. The alignment, including C. briggsae homologs of 154 geneticall...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.10.8.1115
更新日期:2000-08-01 00:00:00
abstract::The mammalian cell nucleus contains numerous discrete suborganelles named nuclear bodies. While recruitment of specific genomic regions into these large ribonucleoprotein (RNP) complexes critically contributes to higher-order functional chromatin organization, such regions remain ill-defined. We have developed the hig...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.237073.118
更新日期:2018-11-01 00:00:00
abstract::Phenotypic differences within populations and between closely related species are often driven by variation and evolution of gene expression. However, most analyses have focused on the effects of genomic variation at cis-regulatory elements such as promoters and enhancers that control transcriptional activity, and lit...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.212563.116
更新日期:2017-03-01 00:00:00
abstract::Duplication of the genome in mammalian cells occurs in a defined temporal order referred to as its replication-timing (RT) program. RT changes dynamically during development, regulated in units of 400-800 kb referred to as replication domains (RDs). Changes in RT are generally coordinated with transcriptional competen...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.187989.114
更新日期:2015-08-01 00:00:00
abstract::The alignment of full-length human cDNA sequences to the finished sequence of the human genome provides a unique opportunity to study the distribution of genes throughout the genome. By analyzing the distances between 23,752 genes, we identified a class of divergently transcribed gene pairs, representing more than 10%...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.1982804
更新日期:2004-01-01 00:00:00