Mining statistically-solid k-mers for accurate NGS error correction.

Abstract:

BACKGROUND:NGS data contains many machine-induced errors. The most advanced methods for the error correction heavily depend on the selection of solid k-mers. A solid k-mer is a k-mer frequently occurring in NGS reads. The other k-mers are called weak k-mers. A solid k-mer does not likely contain errors, while a weak k-mer most likely contains errors. An intensively investigated problem is to find a good frequency cutoff f0 to balance the numbers of solid and weak k-mers. Once the cutoff is determined, a more challenging but less-studied problem is to: (i) remove a small subset of solid k-mers that are likely to contain errors, and (ii) add a small subset of weak k-mers, that are likely to contain no errors, into the remaining set of solid k-mers. Identification of these two subsets of k-mers can improve the correction performance. RESULTS:We propose to use a Gamma distribution to model the frequencies of erroneous k-mers and a mixture of Gaussian distributions to model correct k-mers, and combine them to determine f0. To identify the two special subsets of k-mers, we use the z-score of k-mers which measures the number of standard deviations a k-mer's frequency is from the mean. Then these statistically-solid k-mers are used to construct a Bloom filter for error correction. Our method is markedly superior to the state-of-art methods, tested on both real and synthetic NGS data sets. CONCLUSION:The z-score is adequate to distinguish solid k-mers from weak k-mers, particularly useful for pinpointing out solid k-mers having very low frequency. Applying z-score on k-mer can markedly improve the error correction accuracy.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Zhao L,Xie J,Bai L,Chen W,Wang M,Zhang Z,Wang Y,Zhao Z,Li J

doi

10.1186/s12864-018-5272-y

subject

Has Abstract

pub_date

2018-12-31 00:00:00

pages

912

issue

Suppl 10

issn

1471-2164

pii

10.1186/s12864-018-5272-y

journal_volume

19

pub_type

杂志文章
  • Functional elucidation of the non-coding RNAs of Kluyveromyces marxianus in the exponential growth phase.

    abstract:BACKGROUND:Non-coding RNAs (ncRNAs), which perform diverse regulatory roles, have been found in organisms from all superkingdoms of life. However, there have been limited numbers of studies on the functions of ncRNAs, especially in nonmodel organisms such as Kluyveromyces marxianus that is widely used in the field of i...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2474-z

    authors: Cho YB,Lee EJ,Cho S,Kim TY,Park JH,Cho BK

    更新日期:2016-02-29 00:00:00

  • Development and validation of a high density SNP genotyping array for Atlantic salmon (Salmo salar).

    abstract:BACKGROUND:Dense single nucleotide polymorphism (SNP) genotyping arrays provide extensive information on polymorphic variation across the genome of species of interest. Such information can be used in studies of the genetic architecture of quantitative traits and to improve the accuracy of selection in breeding program...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-90

    authors: Houston RD,Taggart JB,Cézard T,Bekaert M,Lowe NR,Downing A,Talbot R,Bishop SC,Archibald AL,Bron JE,Penman DJ,Davassi A,Brew F,Tinch AE,Gharbi K,Hamilton A

    更新日期:2014-02-06 00:00:00

  • Genetic diversity of male and female Chinese bayberry (Myrica rubra) populations and identification of sex-associated markers.

    abstract:BACKGROUND:Chinese bayberry (Myrica rubra Sieb. & Zucc.) is an important subtropical evergreen fruit tree in southern China. Generally dioecious, the female plants are cultivated for fruit and have been studied extensively, but male plants have received very little attention. Knowledge of males may have a major impact ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1602-5

    authors: Jia HM,Jiao Y,Wang GY,Li YH,Jia HJ,Wu HX,Chai CY,Dong X,Guo Y,Zhang L,Gao QK,Chen W,Song LJ,van de Weg E,Gao ZS

    更新日期:2015-05-19 00:00:00

  • Analysis of the Pantoea ananatis pan-genome reveals factors underlying its ability to colonize and interact with plant, insect and vertebrate hosts.

    abstract:BACKGROUND:Pantoea ananatis is found in a wide range of natural environments, including water, soil, as part of the epi- and endophytic flora of various plant hosts, and in the insect gut. Some strains have proven effective as biological control agents and plant-growth promoters, while other strains have been implicate...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-404

    authors: De Maayer P,Chan WY,Rubagotti E,Venter SN,Toth IK,Birch PR,Coutinho TA

    更新日期:2014-05-27 00:00:00

  • Comparative transcript profiling of the fertile and sterile flower buds of pol CMS in B. napus.

    abstract:BACKGROUND:The Polima (pol) system of cytoplasmic male sterility (CMS) and its fertility restoration gene Rfp have been used in hybrid breeding in Brassica napus, which has greatly improved the yield of rapeseed. However, the mechanism of the male sterility transition in pol CMS remains to be determined. RESULTS:To in...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-258

    authors: An H,Yang Z,Yi B,Wen J,Shen J,Tu J,Ma C,Fu T

    更新日期:2014-04-03 00:00:00

  • Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding.

    abstract:BACKGROUND:The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar h...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-57

    authors: Ralph SG,Chun HJ,Cooper D,Kirkpatrick R,Kolosova N,Gunter L,Tuskan GA,Douglas CJ,Holt RA,Jones SJ,Marra MA,Bohlmann J

    更新日期:2008-01-29 00:00:00

  • Diversity and evolution of multiple orc/cdc6-adjacent replication origins in haloarchaea.

    abstract:BACKGROUND:While multiple replication origins have been observed in archaea, considerably less is known about their evolutionary processes. Here, we performed a comparative analysis of the predicted (proved in part) orc/cdc6-associated replication origins in 15 completely sequenced haloarchaeal genomes to investigate t...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-478

    authors: Wu Z,Liu H,Liu J,Liu X,Xiang H

    更新日期:2012-09-14 00:00:00

  • Comparative transcriptomics reveals suppressed expression of genes related to auxin and the cell cycle contributes to the resistance of cucumber against Meloidogyne incognita.

    abstract:BACKGROUND:Meloidogyne incognita is a devastating nematode that causes significant losses in cucumber production worldwide. Although numerous studies have emphasized on the susceptible response of plants after nematode infection, the exact regulation mechanism of M. incognita-resistance in cucumber remains elusive. Ver...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4979-0

    authors: Wang X,Cheng C,Zhang K,Tian Z,Xu J,Yang S,Lou Q,Li J,Chen JF

    更新日期:2018-08-03 00:00:00

  • Development of the first marmoset-specific DNA microarray (EUMAMA): a new genetic tool for large-scale expression profiling in a non-human primate.

    abstract:BACKGROUND:The common marmoset monkey (Callithrix jacchus), a small non-endangered New World primate native to eastern Brazil, is becoming increasingly used as a non-human primate model in biomedical research, drug development and safety assessment. In contrast to the growing interest for the marmoset as an animal mode...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-190

    authors: Datson NA,Morsink MC,Atanasova S,Armstrong VW,Zischler H,Schlumbohm C,Dutilh BE,Huynen MA,Waegele B,Ruepp A,de Kloet ER,Fuchs E

    更新日期:2007-06-25 00:00:00

  • Genome-wide in silico screen for CCCH-type zinc finger proteins of Trypanosoma brucei, Trypanosoma cruzi and Leishmania major.

    abstract:BACKGROUND:CCCH type zinc finger proteins are RNA binding proteins with regulatory functions at all stages of mRNA metabolism. The best-characterized member, tritetraproline (TTP), binds to AU rich elements in 3' UTRs of unstable mRNAs, mediating their degradation. In kinetoplastids, CCCH type zinc finger proteins have...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-283

    authors: Kramer S,Kimblin NC,Carrington M

    更新日期:2010-05-05 00:00:00

  • Quantitative analysis of chromatin interaction changes upon a 4.3 Mb deletion at mouse 4E2.

    abstract:BACKGROUND:Circular chromosome conformation capture (4C) has provided important insights into three dimensional (3D) genome organization and its critical impact on the regulation of gene expression. We developed a new quantitative framework based on polymer physics for the analysis of paired-end sequencing 4C (PE-4Cseq...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-2137-5

    authors: Zepeda-Mendoza CJ,Mukhopadhyay S,Wong ES,Harder N,Splinter E,de Wit E,Eckersley-Maslin MA,Ried T,Eils R,Rohr K,Mills A,de Laat W,Flicek P,Sengupta AM,Spector DL

    更新日期:2015-11-21 00:00:00

  • InCoB celebrates its tenth anniversary as first joint conference with ISCB-Asia.

    abstract::In 2009 the International Society for Computational Biology (ISCB) started to roll out regional bioinformatics conferences in Africa, Latin America and Asia. The open and competitive bid for the first meeting in Asia (ISCB-Asia) was awarded to Asia-Pacific Bioinformatics Network (APBioNet) which has been running the I...

    journal_title:BMC genomics

    pub_type:

    doi:10.1186/1471-2164-12-S3-S1

    authors: Schönbach C,Tan TW,Kelso J,Rost B,Nathan S,Ranganathan S

    更新日期:2011-11-30 00:00:00

  • Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing.

    abstract:BACKGROUND:Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearra...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-561

    authors: Jo YD,Choi Y,Kim DH,Kim BD,Kang BC

    更新日期:2014-07-04 00:00:00

  • Transcriptomic time-series analysis of early development in olive from germinated embryos to juvenile tree.

    abstract:BACKGROUND:Despite its relevance, almost no studies account for the genetic control in the early stages of tree development, i.e. from germination on. This study seeks to make a quite complete transcriptome for olive development and to elucidate the dynamic regulation of the transcriptomic response during the early-juv...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5232-6

    authors: Jiménez-Ruiz J,de la O Leyva-Pérez M,Vidoy-Mercado I,Barceló A,Luque F

    更新日期:2018-11-19 00:00:00

  • Oocyte-somatic cells interactions, lessons from evolution.

    abstract:BACKGROUND:Despite the known importance of somatic cells for oocyte developmental competence acquisition, the overall mechanisms underlying the acquisition of full developmental competence are far from being understood, especially in non-mammalian species. The present work aimed at identifying key molecular signals fro...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-560

    authors: Charlier C,Montfort J,Chabrol O,Brisard D,Nguyen T,Le Cam A,Richard-Parpaillon L,Moreews F,Pontarotti P,Uzbekova S,Chesnel F,Bobe J

    更新日期:2012-10-19 00:00:00

  • Comparative miRNAome analysis revealed different miRNA expression profiles in bovine sera and exosomes.

    abstract:BACKGROUND:Extra-cellular components, such as serum and exosome, have drawn great attention as a readily accessible source of biomarkers for mammalian health. However, the contribution of different blood components to the signature of respective microRNAs (miRNAs) remains unknown, especially in cattle. In this study we...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2962-1

    authors: Zhao K,Liang G,Sun X,Guan le L

    更新日期:2016-08-12 00:00:00

  • Pseudomonas aeruginosa clinical and environmental isolates constitute a single population with high phenotypic diversity.

    abstract:BACKGROUND:Pseudomonas aeruginosa is an opportunistic pathogen with a high incidence of hospital infections that represents a threat to immune compromised patients. Genomic studies have shown that, in contrast to other pathogenic bacteria, clinical and environmental isolates do not show particular genomic differences. ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-318

    authors: Grosso-Becerra MV,Santos-Medellín C,González-Valdez A,Méndez JL,Delgado G,Morales-Espinosa R,Servín-González L,Alcaraz LD,Soberón-Chávez G

    更新日期:2014-04-28 00:00:00

  • Suppression of genetic recombination in the pseudoautosomal region and at subtelomeres in mice with a hypomorphic Spo11 allele.

    abstract:BACKGROUND:Homologous recombination is the key process that generates genetic diversity and drives evolution. SPO11 protein triggers recombination by introducing DNA double stranded breaks at discreet areas of the genome called recombination hotspots. The hotspot locations are largely determined by the DNA binding spec...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-493

    authors: Smagulova F,Brick K,Pu Y,Sengupta U,Camerini-Otero RD,Petukhova GV

    更新日期:2013-07-22 00:00:00

  • Separating the effects of mutation and selection in producing DNA skew in bacterial chromosomes.

    abstract:BACKGROUND:Many bacterial chromosomes display nucleotide asymmetry, or skew, between the leading and lagging strands of replication. Mutational differences between these strands result in an overall pattern of skew that is centered about the origin of replication. Such a pattern could also arise from selection coupled ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-369

    authors: Morton RA,Morton BR

    更新日期:2007-10-12 00:00:00

  • Pan genome and CRISPR analyses of the bacterial fish pathogen Moritella viscosa.

    abstract:BACKGROUND:Winter-ulcer Moritella viscosa infections continue to be a significant burden in Atlantic salmon (Salmo salar L.) farming. M. viscosa comprises two main clusters that differ in genetic variation and phenotypes including virulence. Horizontal gene transfer through acquisition and loss of mobile genetic elemen...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-3693-7

    authors: Karlsen C,Hjerde E,Klemetsen T,Willassen NP

    更新日期:2017-04-20 00:00:00

  • RNA-seq analysis of galaninergic neurons from ventrolateral preoptic nucleus identifies expression changes between sleep and wake.

    abstract:BACKGROUND:Previous studies show that galanin neurons in ventrolateral preoptic nucleus (VLPO-Gal) are essential for sleep regulation. Here, we explored the transcriptional regulation of the VLPO-Gal neurons in sleep by comparing their transcriptional responses between sleeping mice and those kept awake, sacrificed at ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-07050-7

    authors: Guo X,Gao X,Keenan BT,Zhu J,Sarantopoulou D,Lian J,Galante RJ,Grant GR,Pack AI

    更新日期:2020-09-14 00:00:00

  • A genomics-based systems approach towards drug repositioning for rheumatoid arthritis.

    abstract:BACKGROUND:Rheumatoid arthritis (RA) is a chronic autoimmune disease characterized by inflammation and destruction of synovial joints. RA affects up to 1 % of the population worldwide. Currently, there are no drugs that can cure RA or achieve sustained remission. The unknown cause of the disease represents a significan...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2910-0

    authors: Xu R,Wang Q

    更新日期:2016-08-22 00:00:00

  • Regulatory network changes between cell lines and their tissues of origin.

    abstract:BACKGROUND:Cell lines are an indispensable tool in biomedical research and often used as surrogates for tissues. Although there are recognized important cellular and transcriptomic differences between cell lines and tissues, a systematic overview of the differences between the regulatory processes of a cell line and th...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4111-x

    authors: Lopes-Ramos CM,Paulson JN,Chen CY,Kuijjer ML,Fagny M,Platig J,Sonawane AR,DeMeo DL,Quackenbush J,Glass K

    更新日期:2017-09-12 00:00:00

  • Stress-mediated convergence of splicing landscapes in male and female rock doves.

    abstract:BACKGROUND:The process of alternative splicing provides a unique mechanism by which eukaryotes are able to produce numerous protein products from the same gene. Heightened variability in the proteome has been thought to potentiate increased behavioral complexity and response flexibility to environmental stimuli, thus c...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-6600-6

    authors: Lang AS,Austin SH,Harris RM,Calisi RM,MacManes MD

    更新日期:2020-03-23 00:00:00

  • Analysis of gene expression in soybean (Glycine max) roots in response to the root knot nematode Meloidogyne incognita using microarrays and KEGG pathways.

    abstract:BACKGROUND:Root-knot nematodes are sedentary endoparasites that can infect more than 3000 plant species. Root-knot nematodes cause an estimated $100 billion annual loss worldwide. For successful establishment of the root-knot nematode in its host plant, it causes dramatic morphological and physiological changes in plan...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-220

    authors: Ibrahim HM,Hosseini P,Alkharouf NW,Hussein EH,Gamal El-Din Ael K,Aly MA,Matthews BF

    更新日期:2011-05-10 00:00:00

  • A consensus linkage map of the grass carp (Ctenopharyngodon idella) based on microsatellites and SNPs.

    abstract:BACKGROUND:Grass carp (Ctenopharyngodon idella) belongs to the family Cyprinidae which includes more than 2000 fish species. It is one of the most important freshwater food fish species in world aquaculture. A linkage map is an essential framework for mapping traits of interest and is often the first step towards under...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-135

    authors: Xia JH,Liu F,Zhu ZY,Fu J,Feng J,Li J,Yue GH

    更新日期:2010-02-24 00:00:00

  • Differential gene expression and alternative splicing in insect immune specificity.

    abstract:BACKGROUND:Ecological studies routinely show genotype-genotype interactions between insects and their parasites. The mechanisms behind these interactions are not clearly understood. Using the bumblebee Bombus terrestris/trypanosome Crithidia bombi model system (two bumblebee colonies by two Crithidia strains), we have ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-1031

    authors: Riddell CE,Lobaton Garces JD,Adams S,Barribeau SM,Twell D,Mallon EB

    更新日期:2014-11-27 00:00:00

  • Comparative performance of transcriptome assembly methods for non-model organisms.

    abstract:BACKGROUND:The technological revolution in next-generation sequencing has brought unprecedented opportunities to study any organism of interest at the genomic or transcriptomic level. Transcriptome assembly is a crucial first step for studying the molecular basis of phenotypes of interest using RNA-Sequencing (RNA-Seq)...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2923-8

    authors: Huang X,Chen XG,Armbruster PA

    更新日期:2016-07-27 00:00:00

  • Identification and characterization of long non-coding RNAs in subcutaneous adipose tissue from castrated and intact full-sib pair Huainan male pigs.

    abstract:BACKGROUND:Long non-coding RNAs (lncRNAs) regulate adipose tissue metabolism, however, their function on testosterone deficiency related obesity in humans is less understood. For this research, intact and castrated male pigs are the best model animal because of their similar proportional organ sizes, cardiovascular sys...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-3907-z

    authors: Wang J,Hua L,Chen J,Zhang J,Bai X,Gao B,Li C,Shi Z,Sheng W,Gao Y,Xing B

    更新日期:2017-07-19 00:00:00

  • A comprehensive survey of integron-associated genes present in metagenomes.

    abstract:BACKGROUND:Integrons are genomic elements that mediate horizontal gene transfer by inserting and removing genetic material using site-specific recombination. Integrons are commonly found in bacterial genomes, where they maintain a large and diverse set of genes that plays an important role in adaptation and evolution. ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-06830-5

    authors: Buongermino Pereira M,Österlund T,Eriksson KM,Backhaus T,Axelson-Fisk M,Kristiansson E

    更新日期:2020-07-20 00:00:00