A cautionary note on the use of unsupervised machine learning algorithms to characterise malaria parasite population structure from genetic distance matrices.

Abstract:

:Genetic surveillance of malaria parasites supports malaria control programmes, treatment guidelines and elimination strategies. Surveillance studies often pose questions about malaria parasite ancestry (e.g. how antimalarial resistance has spread) and employ statistical methods that characterise parasite population structure. Many of the methods used to characterise structure are unsupervised machine learning algorithms which depend on a genetic distance matrix, notably principal coordinates analysis (PCoA) and hierarchical agglomerative clustering (HAC). PCoA and HAC are sensitive to both the definition of genetic distance and algorithmic specification. Importantly, neither algorithm infers malaria parasite ancestry. As such, PCoA and HAC can inform (e.g. via exploratory data visualisation and hypothesis generation), but not answer comprehensively, key questions about malaria parasite ancestry. We illustrate the sensitivity of PCoA and HAC using 393 Plasmodium falciparum whole genome sequences collected from Cambodia and neighbouring regions (where antimalarial resistance has emerged and spread recently) and we provide tentative guidance for the use and interpretation of PCoA and HAC in malaria parasite genetic epidemiology. This guidance includes a call for fully transparent and reproducible analysis pipelines that feature (i) a clearly outlined scientific question; (ii) a clear justification of analytical methods used to answer the scientific question along with discussion of any inferential limitations; (iii) publicly available genetic distance matrices when downstream analyses depend on them; and (iv) sensitivity analyses. To bridge the inferential disconnect between the output of non-inferential unsupervised learning algorithms and the scientific questions of interest, tailor-made statistical models are needed to infer malaria parasite ancestry. In the absence of such models speculative reasoning should feature only as discussion but not as results.

journal_name

PLoS Genet

journal_title

PLoS genetics

authors

Watson JA,Taylor AR,Ashley EA,Dondorp A,Buckee CO,White NJ,Holmes CC

doi

10.1371/journal.pgen.1009037

subject

Has Abstract

pub_date

2020-10-09 00:00:00

pages

e1009037

issue

10

eissn

1553-7390

issn

1553-7404

pii

PGENETICS-D-20-00579

journal_volume

16

pub_type

杂志文章
  • Dissemination of cephalosporin resistance genes between Escherichia coli strains from farm animals and humans by specific plasmid lineages.

    abstract::Third-generation cephalosporins are a class of β-lactam antibiotics that are often used for the treatment of human infections caused by Gram-negative bacteria, especially Escherichia coli. Worryingly, the incidence of human infections caused by third-generation cephalosporin-resistant E. coli is increasing worldwide. ...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1004776

    authors: de Been M,Lanza VF,de Toro M,Scharringa J,Dohmen W,Du Y,Hu J,Lei Y,Li N,Tooming-Klunderud A,Heederik DJ,Fluit AC,Bonten MJ,Willems RJ,de la Cruz F,van Schaik W

    更新日期:2014-12-18 00:00:00

  • Proteins in the nutrient-sensing and DNA damage checkpoint pathways cooperate to restrain mitotic progression following DNA damage.

    abstract::Checkpoint pathways regulate genomic integrity in part by blocking anaphase until all chromosomes have been completely replicated, repaired, and correctly aligned on the spindle. In Saccharomyces cerevisiae, DNA damage and mono-oriented or unattached kinetochores trigger checkpoint pathways that bifurcate to regulate ...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1002176

    authors: Searle JS,Wood MD,Kaur M,Tobin DV,Sanchez Y

    更新日期:2011-07-01 00:00:00

  • Coevolution of interacting fertilization proteins.

    abstract::Reproductive proteins are among the fastest evolving in the proteome, often due to the consequences of positive selection, and their rapid evolution is frequently attributed to a coevolutionary process between interacting female and male proteins. Such a process could leave characteristic signatures at coevolving gene...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1000570

    authors: Clark NL,Gasper J,Sekino M,Springer SA,Aquadro CF,Swanson WJ

    更新日期:2009-07-01 00:00:00

  • Pathways of distinction analysis: a new technique for multi-SNP analysis of GWAS data.

    abstract::Genome-wide association studies (GWAS) have become increasingly common due to advances in technology and have permitted the identification of differences in single nucleotide polymorphism (SNP) alleles that are associated with diseases. However, while typical GWAS analysis techniques treat markers individually, comple...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1002101

    authors: Braun R,Buetow K

    更新日期:2011-06-01 00:00:00

  • Selection on a Subunit of the NURF Chromatin Remodeler Modifies Life History Traits in a Domesticated Strain of Caenorhabditis elegans.

    abstract::Evolutionary life history theory seeks to explain how reproductive and survival traits are shaped by selection through allocations of an individual's resources to competing life functions. Although life-history traits evolve rapidly, little is known about the genetic and cellular mechanisms that control and couple the...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1006219

    authors: Large EE,Xu W,Zhao Y,Brady SC,Long L,Butcher RA,Andersen EC,McGrath PT

    更新日期:2016-07-28 00:00:00

  • Different roles of eukaryotic MutS and MutL complexes in repair of small insertion and deletion loops in yeast.

    abstract::DNA mismatch repair greatly increases genome fidelity by recognizing and removing replication errors. In order to understand how this fidelity is maintained, it is important to uncover the relative specificities of the different components of mismatch repair. There are two major mispair recognition complexes in eukary...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1003920

    authors: Romanova NV,Crouse GF

    更新日期:2013-10-01 00:00:00

  • Defects in the GINS complex increase the instability of repetitive sequences via a recombination-dependent mechanism.

    abstract::Faithful replication and repair of DNA lesions ensure genome maintenance. During replication in eukaryotic cells, DNA is unwound by the CMG helicase complex, which is composed of three major components: the Cdc45 protein, Mcm2-7, and the GINS complex. The CMG in complex with DNA polymerase epsilon (CMG-E) participates...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1008494

    authors: Jedrychowska M,Denkiewicz-Kruk M,Alabrudzinska M,Skoneczna A,Jonczyk P,Dmowski M,Fijalkowska IJ

    更新日期:2019-12-09 00:00:00

  • Rfx2 Stabilizes Foxj1 Binding at Chromatin Loops to Enable Multiciliated Cell Gene Expression.

    abstract::Cooperative transcription factor binding at cis-regulatory sites in the genome drives robust eukaryotic gene expression, and many such sites must be coordinated to produce coherent transcriptional programs. The transcriptional program leading to motile cilia formation requires members of the DNA-binding forkhead (Fox)...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1006538

    authors: Quigley IK,Kintner C

    更新日期:2017-01-19 00:00:00

  • Sex-specific genetic structure and social organization in Central Asia: insights from a multi-locus study.

    abstract::In the last two decades, mitochondrial DNA (mtDNA) and the non-recombining portion of the Y chromosome (NRY) have been extensively used in order to measure the maternally and paternally inherited genetic structure of human populations, and to infer sex-specific demography and history. Most studies converge towards the...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1000200

    authors: Ségurel L,Martínez-Cruz B,Quintana-Murci L,Balaresque P,Georges M,Hegay T,Aldashev A,Nasyrova F,Jobling MA,Heyer E,Vitalis R

    更新日期:2008-09-26 00:00:00

  • Bayesian multiple logistic regression for case-control GWAS.

    abstract::Genetic variants in genome-wide association studies (GWAS) are tested for disease association mostly using simple regression, one variant at a time. Standard approaches to improve power in detecting disease-associated SNPs use multiple regression with Bayesian variable selection in which a sparsity-enforcing prior on ...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1007856

    authors: Banerjee S,Zeng L,Schunkert H,Söding J

    更新日期:2018-12-31 00:00:00

  • Transcriptome and epigenome diversity and plasticity of muscle stem cells following transplantation.

    abstract::Adult skeletal muscles are maintained during homeostasis and regenerated upon injury by muscle stem cells (MuSCs). A heterogeneity in self-renewal, differentiation and regeneration properties has been reported for MuSCs based on their anatomical location. Although MuSCs derived from extraocular muscles (EOM) have a hi...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1009022

    authors: Evano B,Gill D,Hernando-Herraez I,Comai G,Stubbs TM,Commere PH,Reik W,Tajbakhsh S

    更新日期:2020-10-30 00:00:00

  • EVA-1 functions as an UNC-40 Co-receptor to enhance attraction to the MADD-4 guidance cue in Caenorhabditis elegans.

    abstract::We recently discovered a secreted and diffusible midline cue called MADD-4 (an ADAMTSL) that guides migrations along the dorsoventral axis of the nematode Caenorhabditis elegans. We showed that the transmembrane receptor, UNC-40 (DCC), whose canonical ligand is the UNC-6 (netrin) guidance cue, is required for extensio...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1004521

    authors: Chan KK,Seetharaman A,Bagg R,Selman G,Zhang Y,Kim J,Roy PJ

    更新日期:2014-08-14 00:00:00

  • A cis-regulatory signature for chordate anterior neuroectodermal genes.

    abstract::One of the striking findings of comparative developmental genetics was that expression patterns of core transcription factors are extraordinarily conserved in bilaterians. However, it remains unclear whether cis-regulatory elements of their target genes also exhibit common signatures associated with conserved embryoni...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1000912

    authors: Haeussler M,Jaszczyszyn Y,Christiaen L,Joly JS

    更新日期:2010-04-15 00:00:00

  • Three SRA-domain methylcytosine-binding proteins cooperate to maintain global CpG methylation and epigenetic silencing in Arabidopsis.

    abstract::Methylcytosine-binding proteins decipher the epigenetic information encoded by DNA methylation and provide a link between DNA methylation, modification of chromatin structure, and gene silencing. VARIANT IN METHYLATION 1 (VIM1) encodes an SRA (SET- and RING-associated) domain methylcytosine-binding protein in Arabidop...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1000156

    authors: Woo HR,Dittmer TA,Richards EJ

    更新日期:2008-08-15 00:00:00

  • Proofreading activity of DNA polymerase Pol2 mediates 3'-end processing during nonhomologous end joining in yeast.

    abstract::Genotoxic agents that cause double-strand breaks (DSBs) often generate damage at the break termini. Processing enzymes, including nucleases and polymerases, must remove damaged bases and/or add new bases before completion of repair. Artemis is a nuclease involved in mammalian nonhomologous end joining (NHEJ), but in S...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1000060

    authors: Tseng SF,Gabriel A,Teng SC

    更新日期:2008-04-25 00:00:00

  • Yeast pol4 promotes tel1-regulated chromosomal translocations.

    abstract::DNA double-strand breaks (DSBs) are one of the most dangerous DNA lesions, since their erroneous repair by nonhomologous end-joining (NHEJ) can generate harmful chromosomal rearrangements. PolX DNA polymerases are well suited to extend DSB ends that cannot be directly ligated due to their particular ability to bind to...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1003656

    authors: Ruiz JF,Pardo B,Sastre-Moreno G,Aguilera A,Blanco L

    更新日期:2013-01-01 00:00:00

  • Broad-specificity mRNA-rRNA complementarity in efficient protein translation.

    abstract::Studies of synthetic, well-defined biomolecular systems can elucidate inherent capabilities that may be difficult to uncover in a native biological context. Here, we used a minimal, reconstituted translation system from Escherichia coli to identify efficient ribosome binding sites (RBSs) in an unbiased, high-throughpu...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1002598

    authors: Barendt PA,Shah NA,Barendt GA,Sarkar CA

    更新日期:2012-01-01 00:00:00

  • BRCA1 and BRCA2 tumor suppressors in neural crest cells are essential for craniofacial bone development.

    abstract::Craniofacial abnormalities, including facial skeletal defects, comprise approximately one-third of all birth defects in humans. Since most bones in the face derive from cranial neural crest cells (CNCCs), which are multipotent stem cells, craniofacial bone disorders are largely attributed to defects in CNCCs. However,...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1007340

    authors: Kitami K,Kitami M,Kaku M,Wang B,Komatsu Y

    更新日期:2018-05-02 00:00:00

  • Two modes of transvection at the eyes absent gene of Drosophila demonstrate plasticity in transcriptional regulatory interactions in cis and in trans.

    abstract::For many genes, proper gene expression requires coordinated and dynamic interactions between multiple regulatory elements, each of which can either promote or silence transcription. In Drosophila, the complexity of the regulatory landscape is further complicated by the tight physical pairing of homologous chromosomes,...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1008152

    authors: Tian K,Henderson RE,Parker R,Brown A,Johnson JE,Bateman JR

    更新日期:2019-05-10 00:00:00

  • Mapping the fitness landscape of gene expression uncovers the cause of antagonism and sign epistasis between adaptive mutations.

    abstract::How do adapting populations navigate the tensions between the costs of gene expression and the benefits of gene products to optimize the levels of many genes at once? Here we combined independently-arising beneficial mutations that altered enzyme levels in the central metabolism of Methylobacterium extorquens to uncov...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1004149

    authors: Chou HH,Delaney NF,Draghi JA,Marx CJ

    更新日期:2014-02-27 00:00:00

  • The monothiol glutaredoxin GrxD is essential for sensing iron starvation in Aspergillus fumigatus.

    abstract::Efficient adaptation to iron starvation is an essential virulence determinant of the most common human mold pathogen, Aspergillus fumigatus. Here, we demonstrate that the cytosolic monothiol glutaredoxin GrxD plays an essential role in iron sensing in this fungus. Our studies revealed that (i) GrxD is essential for gr...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1008379

    authors: Misslinger M,Scheven MT,Hortschansky P,López-Berges MS,Heiss K,Beckmann N,Heigl T,Hermann M,Krüger T,Kniemeyer O,Brakhage AA,Haas H

    更新日期:2019-09-16 00:00:00

  • Polymorphisms in the yeast galactose sensor underlie a natural continuum of nutrient-decision phenotypes.

    abstract::In nature, microbes often need to "decide" which of several available nutrients to utilize, a choice that depends on a cell's inherent preference and external nutrient levels. While natural environments can have mixtures of different nutrients, phenotypic variation in microbes' decisions of which nutrient to utilize i...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1006766

    authors: Lee KB,Wang J,Palme J,Escalante-Chong R,Hua B,Springer M

    更新日期:2017-05-24 00:00:00

  • A drastic reduction in the life span of cystatin C L68Q carriers due to life-style changes during the last two centuries.

    abstract::Hereditary cystatin C amyloid angiopathy (HCCAA) is an autosomal dominant disease with high penetrance, manifest by brain hemorrhages in young normotensive adults. In Iceland, this condition is caused by the L68Q mutation in the cystatin C gene, with contemporary carriers reaching an average age of only 30 years. Here...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1000099

    authors: Palsdottir A,Helgason A,Palsson S,Bjornsson HT,Bragason BT,Gretarsdottir S,Thorsteinsdottir U,Olafsson E,Stefansson K

    更新日期:2008-06-20 00:00:00

  • Genetic Interactions Implicating Postreplicative Repair in Okazaki Fragment Processing.

    abstract::Ubiquitination of the replication clamp proliferating cell nuclear antigen (PCNA) at the conserved residue lysine (K)164 triggers postreplicative repair (PRR) to fill single-stranded gaps that result from stalled DNA polymerases. However, it has remained elusive as to whether cells engage PRR in response to replicatio...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1005659

    authors: Becker JR,Pons C,Nguyen HD,Costanzo M,Boone C,Myers CL,Bielinsky AK

    更新日期:2015-11-06 00:00:00

  • Parallel evolution of a type IV secretion system in radiating lineages of the host-restricted bacterial pathogen Bartonella.

    abstract::Adaptive radiation is the rapid origination of multiple species from a single ancestor as the result of concurrent adaptation to disparate environments. This fundamental evolutionary process is considered to be responsible for the genesis of a great portion of the diversity of life. Bacteria have evolved enormous biol...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1001296

    authors: Engel P,Salzburger W,Liesch M,Chang CC,Maruyama S,Lanz C,Calteau A,Lajus A,Médigue C,Schuster SC,Dehio C

    更新日期:2011-02-10 00:00:00

  • Improved statistics for genome-wide interaction analysis.

    abstract::Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under ...

    journal_title:PLoS genetics

    pub_type: 评论,杂志文章

    doi:10.1371/journal.pgen.1002625

    authors: Ueki M,Cordell HJ

    更新日期:2012-01-01 00:00:00

  • Regulation of Gap Junction Dynamics by UNC-44/ankyrin and UNC-33/CRMP through VAB-8 in C. elegans Neurons.

    abstract::Gap junctions are present in both vertebrates and invertebrates from nematodes to mammals. Although the importance of gap junctions has been documented in many biological processes, the molecular mechanisms underlying gap junction dynamics remain unclear. Here, using the C. elegans PLM neurons as a model, we show that...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1005948

    authors: Meng L,Chen CH,Yan D

    更新日期:2016-03-25 00:00:00

  • Developmental loss of neurofibromin across distributed neuronal circuits drives excessive grooming in Drosophila.

    abstract::Neurofibromatosis type 1 is a monogenetic disorder that predisposes individuals to tumor formation and cognitive and behavioral symptoms. The neuronal circuitry and developmental events underlying these neurological symptoms are unknown. To better understand how mutations of the underlying gene (NF1) drive behavioral ...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1008920

    authors: King LB,Boto T,Botero V,Aviles AM,Jomsky BM,Joseph C,Walker JA,Tomchik SM

    更新日期:2020-07-22 00:00:00

  • Retinoic acid activates two pathways required for meiosis in mice.

    abstract::In all sexually reproducing organisms, cells of the germ line must transition from mitosis to meiosis. In mice, retinoic acid (RA), the extrinsic signal for meiotic initiation, activates transcription of Stra8, which is required for meiotic DNA replication and the subsequent processes of meiotic prophase. Here we repo...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1004541

    authors: Koubova J,Hu YC,Bhattacharyya T,Soh YQ,Gill ME,Goodheart ML,Hogarth CA,Griswold MD,Page DC

    更新日期:2014-08-07 00:00:00

  • Bayesian multivariate reanalysis of large genetic studies identifies many new associations.

    abstract::Genome-wide association studies (GWAS) have now been conducted for hundreds of phenotypes of relevance to human health. Many such GWAS involve multiple closely-related phenotypes collected on the same samples. However, the vast majority of these GWAS have been analyzed using simple univariate analyses, which consider ...

    journal_title:PLoS genetics

    pub_type: 杂志文章

    doi:10.1371/journal.pgen.1008431

    authors: Turchin MC,Stephens M

    更新日期:2019-10-09 00:00:00