Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.


BACKGROUND:Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing information about allele frequencies associated with different human populations and can work directly with DNA sequence reads. RESULTS:We describe a fast method for estimating the relative contribution of known reference populations to an individual's genetic ancestry. Our method utilizes allele frequencies from the reference populations and individual genotype or sequence data to obtain a maximum likelihood estimate of the global admixture proportions using the BFGS optimization algorithm. It accounts for the uncertainty in genotypes present in sequence data by using genotype likelihoods and does not require individual genotype data from external reference panels. Simulation studies and application of the method to real datasets demonstrate that our method is significantly times faster than previous methods and has comparable accuracy. Using data from the 1000 Genomes project, we show that estimates of the genome-wide average ancestry for admixed individuals are consistent between exome sequence data and whole-genome low-coverage sequence data. Finally, we demonstrate that our method can be used to estimate admixture proportions using pooled sequence data making it a valuable tool for controlling for population stratification in sequencing based association studies that utilize DNA pooling. CONCLUSIONS:Our method is an efficient and versatile tool for estimating ancestry from DNA sequence data and is available from .


BMC Bioinformatics


BMC bioinformatics


Bansal V,Libiger O




Has Abstract


2015-01-16 00:00:00










  • Assessing stationary distributions derived from chromatin contact maps.

    abstract:BACKGROUND:The spatial configuration of chromosomes is essential to various cellular processes, notably gene regulation, while architecture related alterations, such as translocations and gene fusions, are often cancer drivers. Thus, eliciting chromatin conformation is important, yet challenging due to compaction, dyna...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Segal MR,Fletez-Brant K

    更新日期:2020-02-24 00:00:00

  • Components of the antigen processing and presentation pathway revealed by gene expression microarray analysis following B cell antigen receptor (BCR) stimulation.

    abstract:BACKGROUND:Activation of naïve B lymphocytes by extracellular ligands, e.g. antigen, lipopolysaccharide (LPS) and CD40 ligand, induces a combination of common and ligand-specific phenotypic changes through complex signal transduction pathways. For example, although all three of these ligands induce proliferation, only ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Lee JA,Sinkovits RS,Mock D,Rab EL,Cai J,Yang P,Saunders B,Hsueh RC,Choi S,Subramaniam S,Scheuermann RH,Alliance for Cellular Signaling.

    更新日期:2006-05-02 00:00:00

  • Maximum expected accuracy structural neighbors of an RNA secondary structure.

    abstract:BACKGROUND:Since RNA molecules regulate genes and control alternative splicing by allostery, it is important to develop algorithms to predict RNA conformational switches. Some tools, such as paRNAss, RNAshapes and RNAbor, can be used to predict potential conformational switches; nevertheless, no existent tool can detec...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Clote P,Lou F,Lorenz WA

    更新日期:2012-04-12 00:00:00

  • Effects of Mecp2 loss of function in embryonic cortical neurons: a bioinformatics strategy to sort out non-neuronal cells variability from transcriptome profiling.

    abstract:BACKGROUND:Mecp2 null mice model Rett syndrome (RTT) a human neurological disorder affecting females after apparent normal pre- and peri-natal developmental periods. Neuroanatomical studies in cerebral cortex of RTT mouse models revealed delayed maturation of neuronal morphology and autonomous as well as non-cell auton...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Vacca M,Tripathi KP,Speranza L,Aiese Cigliano R,Scalabrì F,Marracino F,Madonna M,Sanseverino W,Perrone-Capano C,Guarracino MR,D'Esposito M

    更新日期:2016-01-20 00:00:00

  • Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.

    abstract:BACKGROUND:One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Scheeff ED,Bourne PE

    更新日期:2006-09-14 00:00:00

  • Integrating biological knowledge into variable selection: an empirical Bayes approach with an application in cancer biology.

    abstract:BACKGROUND:An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biolo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Hill SM,Neve RM,Bayani N,Kuo WL,Ziyad S,Spellman PT,Gray JW,Mukherjee S

    更新日期:2012-05-11 00:00:00

  • Simple binary segmentation frameworks for identifying variation in DNA copy number.

    abstract:BACKGROUND:Variation in DNA copy number, due to gains and losses of chromosome segments, is common. A first step for analyzing DNA copy number data is to identify amplified or deleted regions in individuals. To locate such regions, we propose a circular binary segmentation procedure, which is based on a sequence of nes...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Yang TY

    更新日期:2012-10-30 00:00:00

  • TAMEE: data management and analysis for tissue microarrays.

    abstract:BACKGROUND:With the introduction of tissue microarrays (TMAs) researchers can investigate gene and protein expression in tissues on a high-throughput scale. TMAs generate a wealth of data calling for extended, high level data management. Enhanced data analysis and systematic data management are required for traceabilit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Thallinger GG,Baumgartner K,Pirklbauer M,Uray M,Pauritsch E,Mehes G,Buck CR,Zatloukal K,Trajanoski Z

    更新日期:2007-03-07 00:00:00

  • The scoring of poses in protein-protein docking: current capabilities and future directions.

    abstract:BACKGROUND:Protein-protein docking, which aims to predict the structure of a protein-protein complex from its unbound components, remains an unresolved challenge in structural bioinformatics. An important step is the ranking of docked poses using a scoring function, for which many methods have been developed. There is ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Moal IH,Torchala M,Bates PA,Fernández-Recio J

    更新日期:2013-10-01 00:00:00

  • Roar: detecting alternative polyadenylation with standard mRNA sequencing libraries.

    abstract:BACKGROUND:Post-transcriptional regulation is a complex mechanism that plays a central role in defining multiple cellular identities starting from a common genome. Modifications in the length of 3'UTRs have been found to play an important role in this context, since alternative 3' UTRs could lead to differences for exa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Grassi E,Mariella E,Lembo A,Molineris I,Provero P

    更新日期:2016-10-18 00:00:00

  • Integration of shot-gun proteomics and bioinformatics analysis to explore plant hormone responses.

    abstract:BACKGROUND:Multidimensional protein identification technology (MudPIT)-based shot-gun proteomics has been proven to be an effective platform for functional proteomics. In particular, the various sample preparation methods and bioinformatics tools can be integrated to improve the proteomics platform for applications lik...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Zhang Y,Liu S,Dai SY,Yuan JS

    更新日期:2012-01-01 00:00:00

  • A novel parametric approach to mine gene regulatory relationship from microarray datasets.

    abstract:BACKGROUND:Microarray has been widely used to measure the gene expression level on the genome scale in the current decade. Many algorithms have been developed to reconstruct gene regulatory networks based on microarray data. Unfortunately, most of these models and algorithms focus on global properties of the expression...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Liu W,Li D,Liu Q,Zhu Y,He F

    更新日期:2010-12-14 00:00:00

  • Taking U out, with two nucleases?

    abstract:BACKGROUND:REX1 and REX2 are protein components of the RNA editing complex (the editosome) and function as exouridylylases. The exact roles of REX1 and REX2 in the editosome are unclear and the consequences of the presence of two related proteins are not fully understood. Here, a variety of computational studies were p...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Mian IS,Worthey EA,Salavati R

    更新日期:2006-06-16 00:00:00

  • Quantitative prediction of the effect of genetic variation using hidden Markov models.

    abstract:BACKGROUND:With the development of sequencing technologies, more and more sequence variants are available for investigation. Different classes of variants in the human genome have been identified, including single nucleotide substitutions, insertion and deletion, and large structural variations such as duplications and...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Liu M,Watson LT,Zhang L

    更新日期:2014-01-09 00:00:00

  • Genotype calling in tetraploid species from bi-allelic marker data using mixture models.

    abstract:BACKGROUND:Automated genotype calling in tetraploid species was until recently not possible, which hampered genetic analysis. Modern genotyping assays often produce two signals, one for each allele of a bi-allelic marker. While ample software is available to obtain genotypes (homozygous for either allele, or heterozygo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Voorrips RE,Gort G,Vosman B

    更新日期:2011-05-19 00:00:00

  • INBIA: a boosting methodology for proteomic network inference.

    abstract:BACKGROUND:The analysis of tissue-specific protein interaction networks and their functional enrichment in pathological and normal tissues provides insights on the etiology of diseases. The Pan-cancer proteomic project, in The Cancer Genome Atlas, collects protein expressions in human cancers and it is a reference reso...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Sardina DS,Micale G,Ferro A,Pulvirenti A,Giugno R

    更新日期:2018-07-09 00:00:00

  • The reactive metabolite target protein database (TPDB)--a web-accessible resource.

    abstract:BACKGROUND:The toxic effects of many simple organic compounds stem from their biotransformation to chemically reactive metabolites which bind covalently to cellular proteins. To understand the mechanisms of cytotoxic responses it may be important to know which proteins become adducted and whether some may be common tar...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Hanzlik RP,Koen YM,Theertham B,Dong Y,Fang J

    更新日期:2007-03-16 00:00:00

  • Multi-label literature classification based on the Gene Ontology graph.

    abstract:BACKGROUND:The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Jin B,Muller B,Zhai C,Lu X

    更新日期:2008-12-08 00:00:00

  • Membrane protein orientation and refinement using a knowledge-based statistical potential.

    abstract:BACKGROUND:Recent increases in the number of deposited membrane protein crystal structures necessitate the use of automated computational tools to position them within the lipid bilayer. Identifying the correct orientation allows us to study the complex relationship between sequence, structure and the lipid environment...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Nugent T,Jones DT

    更新日期:2013-09-18 00:00:00

  • A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    abstract:BACKGROUND:Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Thakur S,Guttman DS

    更新日期:2016-06-30 00:00:00

  • Computational approaches to protein inference in shotgun proteomics.

    abstract::Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughp...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章,评审


    authors: Li YF,Radivojac P

    更新日期:2012-01-01 00:00:00

  • Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida.

    abstract:BACKGROUND:Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to envi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Pirooznia M,Gong P,Guan X,Inouye LS,Yang K,Perkins EJ,Deng Y

    更新日期:2007-11-01 00:00:00

  • methCancer-gen: a DNA methylome dataset generator for user-specified cancer type based on conditional variational autoencoder.

    abstract:BACKGROUND:Recently, DNA methylation has drawn great attention due to its strong correlation with abnormal gene activities and informative representation of the cancer status. As a number of studies focus on DNA methylation signatures in cancer, demand for utilizing publicly available methylome dataset has been increas...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Choi J,Chae H

    更新日期:2020-05-11 00:00:00

  • The development of a comparison approach for Illumina bead chips unravels unexpected challenges applying newest generation microarrays.

    abstract:BACKGROUND:The MAQC project demonstrated that microarrays with comparable content show inter- and intra-platform reproducibility. However, since the content of gene databases still increases, the development of new generations of microarrays covering new content is mandatory. To better understand the potential challeng...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Eggle D,Debey-Pascher S,Beyer M,Schultze JL

    更新日期:2009-06-18 00:00:00

  • Algorithm-driven artifacts in median polish summarization of microarray data.

    abstract:BACKGROUND:High-throughput measurement of transcript intensities using Affymetrix type oligonucleotide microarrays has produced a massive quantity of data during the last decade. Different preprocessing techniques exist to convert the raw signal intensities measured by these chips into gene expression estimates. Althou...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Giorgi FM,Bolger AM,Lohse M,Usadel B

    更新日期:2010-11-11 00:00:00

  • Intestinal microbiota domination under extreme selective pressures characterized by metagenomic read cloud sequencing and assembly.

    abstract:BACKGROUND:Low diversity of the gut microbiome, often progressing to the point of intestinal domination by a single species, has been linked to poor outcomes in patients undergoing hematopoietic cell transplantation (HCT). Our ability to understand how certain organisms attain intestinal domination over others has been...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Kang JB,Siranosian BA,Moss EL,Banaei N,Andermann TM,Bhatt AS

    更新日期:2019-12-02 00:00:00

  • Statistical shape analysis of tap roots: a methodological case study on laser scanned sugar beets.

    abstract:BACKGROUND:The efficient and robust statistical analysis of the shape of plant organs of different cultivars is an important investigation issue in plant breeding and enables a robust cultivar description within the breeding progress. Laserscanning is a highly accurate and high resolution technique to acquire the 3D sh...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Heeren B,Paulus S,Goldbach H,Kuhlmann H,Mahlein AK,Rumpf M,Wirth B

    更新日期:2020-07-29 00:00:00

  • Protein network prediction and topological analysis in Leishmania major as a tool for drug target selection.

    abstract:BACKGROUND:Leishmaniasis is a virulent parasitic infection that causes a worldwide disease burden. Most treatments have toxic side-effects and efficacy has decreased due to the emergence of resistant strains. The outlook is worsened by the absence of promising drug targets for this disease. We have taken a computationa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Flórez AF,Park D,Bhak J,Kim BC,Kuchinsky A,Morris JH,Espinosa J,Muskus C

    更新日期:2010-09-27 00:00:00

  • Low degree metabolites explain essential reactions and enhance modularity in biological networks.

    abstract:BACKGROUND:Recently there has been a lot of interest in identifying modules at the level of genetic and metabolic networks of organisms, as well as in identifying single genes and reactions that are essential for the organism. A goal of computational and systems biology is to go beyond identification towards an explana...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Samal A,Singh S,Giri V,Krishna S,Raghuram N,Jain S

    更新日期:2006-03-08 00:00:00

  • Unsupervised fuzzy pattern discovery in gene expression data.

    abstract:BACKGROUND:Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Wu GP,Chan KC,Wong AK

    更新日期:2011-01-01 00:00:00