Turning Vice into Virtue: Using Batch-Effects to Detect Errors in Large Genomic Data Sets.

Abstract:

:It is often unavoidable to combine data from different sequencing centers or sequencing platforms when compiling data sets with a large number of individuals. However, the different data are likely to contain specific systematic errors that will appear as SNPs. Here, we devise a method to detect systematic errors in combined data sets. To measure quality differences between individual genomes, we study pairs of variants that reside on different chromosomes and co-occur in individuals. The abundance of these pairs of variants in different genomes is then used to detect systematic errors due to batch effects. Applying our method to the 1000 Genomes data set, we find that coding regions are enriched for errors, where ∼1% of the higher frequency variants are predicted to be erroneous, whereas errors outside of coding regions are much rarer (<0.001%). As expected, predicted errors are found less often than other variants in a data set that was generated with a different sequencing technology, indicating that many of the candidates are indeed errors. However, predicted 1000 Genomes errors are also found in other large data sets; our observation is thus not specific to the 1000 Genomes data set. Our results show that batch effects can be turned into a virtue by using the resulting variation in large scale data sets to detect systematic errors.

journal_name

Genome Biol Evol

authors

Mafessoni F,Prasad RB,Groop L,Hansson O,Prüfer K

doi

10.1093/gbe/evy199

subject

Has Abstract

pub_date

2018-10-01 00:00:00

pages

2697-2708

issue

10

issn

1759-6653

pii

5094764

journal_volume

10

pub_type

杂志文章
  • Molecular phylogeny of sequenced Saccharomycetes reveals polyphyly of the alternative yeast codon usage.

    abstract::The universal genetic code defines the translation of nucleotide triplets, called codons, into amino acids. In many Saccharomycetes a unique alteration of this code affects the translation of the CUG codon, which is normally translated as leucine. Most of the species encoding CUG alternatively as serine belong to the ...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evu152

    authors: Mühlhausen S,Kollmar M

    更新日期:2014-07-22 00:00:00

  • Expansion and Functional Divergence of the SHORT VEGETATIVE PHASE (SVP) Genes in Eudicots.

    abstract::SHORT VEGETATIVE PHASE (SVP) genes are members of the well-known MADS-box gene family that regulates vital developmental processes in plants. In Arabidopsis, there are two SVP paralogs, SVP/AGAMOUS-LIKE22 (SVP/AGL22) and AGL24. SVP protein suppresses the flowering process, whereas AGL24 acts as a flowering activator. ...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evy235

    authors: Liu X,Sun Z,Dong W,Wang Z,Zhang L

    更新日期:2018-11-01 00:00:00

  • Evolutionary Remodeling of the Cell Envelope in Bacteria of the Planctomycetes Phylum.

    abstract::Bacteria of the Planctomycetes phylum have many unique cellular features, such as extensive membrane invaginations and the ability to import macromolecules. These features raise intriguing questions about the composition of their cell envelopes. In this study, we have used microscopy, phylogenomics, and proteomics to ...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evaa159

    authors: Mahajan M,Seeger C,Yee B,Andersson SGE

    更新日期:2020-09-01 00:00:00

  • Early Metazoan Origin and Multiple Losses of a Novel Clade of RIM Presynaptic Calcium Channel Scaffolding Protein Homologs.

    abstract::The precise localization of CaV2 voltage-gated calcium channels at the synapse active zone requires various interacting proteins, of which, Rab3-interacting molecule or RIM is considered particularly important. In vertebrates, RIM interacts with CaV2 channels in vitro via a PDZ domain that binds to the extreme C-termi...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evaa097

    authors: Piekut T,Wong YY,Walker SE,Smith CL,Gauberg J,Harracksingh AN,Lowden C,Novogradac BB,Cheng HM,Spencer GE,Senatore A

    更新日期:2020-08-01 00:00:00

  • Defense Response in Brazilian Honey Bees (Apis mellifera scutellata × spp.) Is Underpinned by Complex Patterns of Admixture.

    abstract::In 1957, an invasive and highly defensive honey bee began to spread across Brazil. In the previous year, Brazilian researchers hoped to produce a subtropical-adapted honey bee by crossing local commercial honey bees (of European origin) with a South African honey bee subspecies (Apis mellifera scutellata; an A-lineage...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evaa128

    authors: Harpur BA,Kadri SM,Orsi RO,Whitfield CW,Zayed A

    更新日期:2020-08-01 00:00:00

  • Toward genome-wide identification of Bateson-Dobzhansky-Muller incompatibilities in yeast: a simulation study.

    abstract::The Bateson-Dobzhansky-Muller (BDM) model of reproductive isolation by genetic incompatibility is a widely accepted model of speciation. Because of the exceptionally rich biological information about the budding yeast Saccharomyces cerevisiae, the identification of BDM incompatibilities in yeast would greatly deepen o...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evt091

    authors: Li C,Wang Z,Zhang J

    更新日期:2013-01-01 00:00:00

  • Comparative Genomics of Pathogenic and Nonpathogenic Beetle-Vectored Fungi in the Genus Geosmithia.

    abstract::Geosmithia morbida is an emerging fungal pathogen which serves as a model for examining the evolutionary processes behind pathogenicity because it is one of two known pathogens within a genus of mostly saprophytic, beetle-associated, fungi. This pathogen causes thousand cankers disease in black walnut trees and is vec...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evx242

    authors: Schuelke TA,Wu G,Westbrook A,Woeste K,Plachetzki DC,Broders K,MacManes MD

    更新日期:2017-12-01 00:00:00

  • Genome-wide analysis of adaptive molecular evolution in the carnivorous plant Utricularia gibba.

    abstract::The genome of the bladderwort Utricularia gibba provides an unparalleled opportunity to uncover the adaptive landscape of an aquatic carnivorous plant with unique phenotypic features such as absence of roots, development of water-filled suction bladders, and a highly ramified branching pattern. Despite its tiny size, ...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evu288

    authors: Carretero-Paulet L,Chang TH,Librado P,Ibarra-Laclette E,Herrera-Estrella L,Rozas J,Albert VA

    更新日期:2015-01-09 00:00:00

  • Functional shifts in insect microRNA evolution.

    abstract::MicroRNAs (miRNAs) are short endogenous RNA molecules that regulate gene expression at the posttranscriptional level and have been shown to play critical roles during animal development. The identification and comparison of miRNAs in metazoan species are therefore paramount for our understanding of the evolution of bo...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evq053

    authors: Marco A,Hui JH,Ronshaugen M,Griffiths-Jones S

    更新日期:2010-01-01 00:00:00

  • Draft Genome Sequences of Two Closely Related Aflatoxigenic Aspergillus Species Obtained from the Ivory Coast.

    abstract::Aspergillus ochraceoroseus and Aspergillus rambellii were isolated from soil detritus in Taï National Park, Ivory Coast, Africa. The Type strain for each species happens to be the only representative ever sampled. Both species secrete copious amounts of aflatoxin B1 and sterigmatocystin, because each of their genomes ...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evv246

    authors: Moore GG,Mack BM,Beltz SB

    更新日期:2015-12-03 00:00:00

  • Developmental Progression in the Coral Acropora digitifera Is Controlled by Differential Expression of Distinct Regulatory Gene Networks.

    abstract::Corals belong to the most basal class of the Phylum Cnidaria, which is considered the sister group of bilaterian animals, and thus have become an emerging model to study the evolution of developmental mechanisms. Although cell renewal, differentiation, and maintenance of pluripotency are cellular events shared by mult...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evw042

    authors: Reyes-Bermudez A,Villar-Briones A,Ramirez-Portilla C,Hidaka M,Mikheyev AS

    更新日期:2016-03-23 00:00:00

  • Lineage-Specific Expression Divergence in Grasses Is Associated with Male Reproduction, Host-Pathogen Defense, and Domestication.

    abstract::Poaceae (grasses) is an agriculturally important and widely distributed family of plants with extraordinary phenotypic diversity, much of which was generated under recent lineage-specific evolution. Yet, little is known about the genes and functional modules involved in the lineage-specific divergence of grasses. Here...

    journal_title:Genome biology and evolution

    pub_type: 信件

    doi:10.1093/gbe/evy245

    authors: Assis R

    更新日期:2019-01-01 00:00:00

  • A Novel Terminal-Repeat Retrotransposon in Miniature (TRIM) Is Massively Expressed in Echinococcus multilocularis Stem Cells.

    abstract::Taeniid cestodes (including the human parasites Echinococcus spp. and Taenia solium) have very few mobile genetic elements (MGEs) in their genome, despite lacking a canonical PIWI pathway. The MGEs of these parasites are virtually unexplored, and nothing is known about their expression and silencing. In this work, we ...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evv126

    authors: Koziol U,Radio S,Smircich P,Zarowiecki M,Fernández C,Brehm K

    更新日期:2015-07-01 00:00:00

  • The Site-Specific Amino Acid Preferences of Homologous Proteins Depend on Sequence Divergence.

    abstract::The propensity of protein sites to be occupied by any of the 20 amino acids is known as site-specific amino acid preferences (SSAP). Under the assumption that SSAP are conserved among homologs, they can be used to parameterize evolutionary models for the reconstruction of accurate phylogenetic trees. However, simulati...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evy261

    authors: Ferrada E

    更新日期:2019-01-01 00:00:00

  • Massive genomic decay in Serratia symbiotica, a recently evolved symbiont of aphids.

    abstract::All vertically transmitted bacterial symbionts undergo a process of genome reduction over time, resulting in tiny, gene-dense genomes. Comparison of genomes of ancient bacterial symbionts gives only limited information about the early stages in the transition from a free-living to symbiotic lifestyle because many chan...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evr002

    authors: Burke GR,Moran NA

    更新日期:2011-01-01 00:00:00

  • Evolution of proteasome regulators in eukaryotes.

    abstract::All living organisms require protein degradation to terminate biological processes and remove damaged proteins. One such machine is the 20S proteasome, a specialized barrel-shaped and compartmentalized multicatalytic protease. The activity of the 20S proteasome generally requires the binding of regulators/proteasome a...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evv068

    authors: Fort P,Kajava AV,Delsuc F,Coux O

    更新日期:2015-05-04 00:00:00

  • Evolutionary Rate Heterogeneity of Primary and Secondary Metabolic Pathway Genes in Arabidopsis thaliana.

    abstract::Primary metabolism is essential to plants for growth and development, and secondary metabolism helps plants to interact with the environment. Many plant metabolites are industrially important. These metabolites are produced by plants through complex metabolic pathways. Lack of knowledge about these pathways is hinderi...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evv217

    authors: Mukherjee D,Mukherjee A,Ghosh TC

    更新日期:2015-11-10 00:00:00

  • Studying genome heterogeneity within the arbuscular mycorrhizal fungal cytoplasm.

    abstract::Although heterokaryons have been reported in nature, multicellular organisms are generally assumed genetically homogeneous. Here, we investigate the case of arbuscular mycorrhizal fungi (AMF) that form symbiosis with plant roots. The growth advantages they confer to their hosts are of great potential benefit to sustai...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evv002

    authors: Boon E,Halary S,Bapteste E,Hijri M

    更新日期:2015-01-07 00:00:00

  • Comparative Phylogenomic Assessment of Mitochondrial Introgression among Several Species of Chipmunks (Tamias).

    abstract::Many species are not completely reproductively isolated, resulting in hybridization and genetic introgression. Organellar genomes, such as those derived from mitochondria (mtDNA) and chloroplasts, introgress frequently in natural systems; however, the forces shaping patterns of introgression are not always clear. Here...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evw254

    authors: Sarver BA,Demboski JR,Good JM,Forshee N,Hunter SS,Sullivan J

    更新日期:2017-01-01 00:00:00

  • Genetic Competence Drives Genome Diversity in Bacillus subtilis.

    abstract::Prokaryote genomes are the result of a dynamic flux of genes, with increases achieved via horizontal gene transfer and reductions occurring through gene loss. The ecological and selective forces that drive this genomic flexibility vary across species. Bacillus subtilis is a naturally competent bacterium that occupies ...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evx270

    authors: Brito PH,Chevreux B,Serra CR,Schyns G,Henriques AO,Pereira-Leal JB

    更新日期:2018-01-01 00:00:00

  • CRISPR System Acquisition and Evolution of an Obligate Intracellular Chlamydia-Related Bacterium.

    abstract::Recently, a new Chlamydia-related organism, Protochlamydia naegleriophila KNic, was discovered within a Naegleria amoeba. To decipher the mechanisms at play in the modeling of genomes from the Protochlamydia genus, we sequenced the full genome of Pr. naegleriophila, which includes a 2,885,090 bp chromosome and a 145,2...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evw138

    authors: Bertelli C,Cissé OH,Rusconi B,Kebbi-Beghdadi C,Croxatto A,Goesmann A,Collyn F,Greub G

    更新日期:2016-08-25 00:00:00

  • Functional convergence in reduced genomes of bacterial symbionts spanning 200 My of evolution.

    abstract::The main genomic changes in the evolution of host-restricted microbial symbionts are ongoing inactivation and loss of genes combined with rapid sequence evolution and extreme structural stability; these changes reflect high levels of genetic drift due to small population sizes and strict clonality. This genomic erosio...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evq055

    authors: McCutcheon JP,Moran NA

    更新日期:2010-01-01 00:00:00

  • Sequence-level mechanisms of human epigenome evolution.

    abstract::DNA methylation and chromatin states play key roles in development and disease. However, the extent of recent evolutionary divergence in the human epigenome and the influential factors that have shaped it are poorly understood. To determine the links between genome sequence and human epigenome evolution, we examined t...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evu142

    authors: Prendergast JG,Chambers EV,Semple CA

    更新日期:2014-06-24 00:00:00

  • Contrasting Patterns of Evolutionary Diversification in the Olfactory Repertoires of Reptile and Bird Genomes.

    abstract::Olfactory receptors (ORs) are membrane proteins that mediate the detection of odorants in the environment, and are the largest vertebrate gene family. Comparative studies of mammalian genomes indicate that OR repertoires vary widely, even between closely related lineages, as a consequence of frequent OR gains and loss...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evw013

    authors: Vandewege MW,Mangum SF,Gabaldón T,Castoe TA,Ray DA,Hoffmann FG

    更新日期:2016-02-09 00:00:00

  • Complete Genome Sequence of the Biocontrol Agent Bacillus velezensis UFLA258 and Its Comparison with Related Species: Diversity within the Commons.

    abstract::In this study, the full genome sequence of Bacillus velezensis strain UFLA258, a biological control agent of plant pathogens was obtained, assembled, and annotated. With a comparative genomics approach, in silico analyses of all complete genomes of B. velezensis and closely related species available in the database we...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evz208

    authors: Silva FJ,Ferreira LC,Campos VP,Cruz-Magalhães V,Barros AF,Andrade JP,Roberts DP,de Souza JT

    更新日期:2019-10-01 00:00:00

  • Transcriptome Differences between Alternative Sex Determining Genotypes in the House Fly, Musca domestica.

    abstract::Sex determination evolves rapidly, often because of turnover of the genes at the top of the pathway. The house fly, Musca domestica, has a multifactorial sex determination system, allowing us to identify the selective forces responsible for the evolutionary turnover of sex determination in action. There is a male dete...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evv128

    authors: Meisel RP,Scott JG,Clark AG

    更新日期:2015-07-02 00:00:00

  • Reconstructing the Phylogeny of Corynebacteriales while Accounting for Horizontal Gene Transfer.

    abstract::Horizontal gene transfer is a common mechanism in Bacteria that has contributed to the genomic content of existing organisms. Traditional methods for estimating bacterial phylogeny, however, assume only vertical inheritance in the evolution of homologous genes, which may result in errors in the estimated phylogenies. ...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evaa058

    authors: Coimbra NDR,Goes-Neto A,Azevedo V,Ouangraoua A

    更新日期:2020-04-01 00:00:00

  • Transition to an Aquatic Habitat Permitted the Repeated Loss of the Pleiotropic KLK8 Gene in Mammals.

    abstract::Kallikrein related peptidase 8 (KLK8; also called neuropsin) is a serine protease that plays distinct roles in the skin and hippocampus. In the skin, KLK8 influences keratinocyte proliferation and desquamation, and activates antimicrobial peptides in sweat. In the hippocampus, KLK8 affects memory acquisition. Here, we...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evx239

    authors: Hecker N,Sharma V,Hiller M

    更新日期:2017-11-01 00:00:00

  • Adaptive Prediction Emerges Over Short Evolutionary Time Scales.

    abstract::Adaptive prediction is a capability of diverse organisms, including microbes, to sense a cue and prepare in advance to deal with a future environmental challenge. Here, we investigated the timeframe over which adaptive prediction emerges when an organism encounters an environment with novel structure. We subjected yea...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evx116

    authors: López García de Lomana A,Kaur A,Turkarslan S,Beer KD,Mast FD,Smith JJ,Aitchison JD,Baliga NS

    更新日期:2017-06-01 00:00:00

  • Shared Signature of Recent Positive Selection on the TSBP1-BTNL2-HLA-DRA Genes in Five Native Populations from North Borneo.

    abstract::North Borneo (NB) is home to more than 40 native populations. These natives are believed to have undergone local adaptation in response to environmental challenges such as the mosquito-abundant tropical rainforest. We attempted to trace the footprints of natural selection from the genomic data of NB native populations...

    journal_title:Genome biology and evolution

    pub_type: 杂志文章

    doi:10.1093/gbe/evaa207

    authors: Hoh BP,Zhang X,Deng L,Yuan K,Yew CW,Saw WY,Hoque MZ,Aghakhanian F,Phipps ME,Teo YY,Subbiah VK,Xu S

    更新日期:2020-12-06 00:00:00