Abstract:
BACKGROUND:NGS data contains many machine-induced errors. The most advanced methods for the error correction heavily depend on the selection of solid k-mers. A solid k-mer is a k-mer frequently occurring in NGS reads. The other k-mers are called weak k-mers. A solid k-mer does not likely contain errors, while a weak k-mer most likely contains errors. An intensively investigated problem is to find a good frequency cutoff f0 to balance the numbers of solid and weak k-mers. Once the cutoff is determined, a more challenging but less-studied problem is to: (i) remove a small subset of solid k-mers that are likely to contain errors, and (ii) add a small subset of weak k-mers, that are likely to contain no errors, into the remaining set of solid k-mers. Identification of these two subsets of k-mers can improve the correction performance. RESULTS:We propose to use a Gamma distribution to model the frequencies of erroneous k-mers and a mixture of Gaussian distributions to model correct k-mers, and combine them to determine f0. To identify the two special subsets of k-mers, we use the z-score of k-mers which measures the number of standard deviations a k-mer's frequency is from the mean. Then these statistically-solid k-mers are used to construct a Bloom filter for error correction. Our method is markedly superior to the state-of-art methods, tested on both real and synthetic NGS data sets. CONCLUSION:The z-score is adequate to distinguish solid k-mers from weak k-mers, particularly useful for pinpointing out solid k-mers having very low frequency. Applying z-score on k-mer can markedly improve the error correction accuracy.
journal_name
BMC Genomicsjournal_title
BMC genomicsauthors
Zhao L,Xie J,Bai L,Chen W,Wang M,Zhang Z,Wang Y,Zhao Z,Li Jdoi
10.1186/s12864-018-5272-ysubject
Has Abstractpub_date
2018-12-31 00:00:00pages
912issue
Suppl 10issn
1471-2164pii
10.1186/s12864-018-5272-yjournal_volume
19pub_type
杂志文章相关文献
BMC GENOMICS文献大全abstract:BACKGROUND:Non-coding RNAs (ncRNAs), which perform diverse regulatory roles, have been found in organisms from all superkingdoms of life. However, there have been limited numbers of studies on the functions of ncRNAs, especially in nonmodel organisms such as Kluyveromyces marxianus that is widely used in the field of i...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-016-2474-z
更新日期:2016-02-29 00:00:00
abstract:BACKGROUND:Dense single nucleotide polymorphism (SNP) genotyping arrays provide extensive information on polymorphic variation across the genome of species of interest. Such information can be used in studies of the genetic architecture of quantitative traits and to improve the accuracy of selection in breeding program...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-90
更新日期:2014-02-06 00:00:00
abstract:BACKGROUND:Chinese bayberry (Myrica rubra Sieb. & Zucc.) is an important subtropical evergreen fruit tree in southern China. Generally dioecious, the female plants are cultivated for fruit and have been studied extensively, but male plants have received very little attention. Knowledge of males may have a major impact ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-015-1602-5
更新日期:2015-05-19 00:00:00
abstract:BACKGROUND:Pantoea ananatis is found in a wide range of natural environments, including water, soil, as part of the epi- and endophytic flora of various plant hosts, and in the insect gut. Some strains have proven effective as biological control agents and plant-growth promoters, while other strains have been implicate...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-404
更新日期:2014-05-27 00:00:00
abstract:BACKGROUND:The Polima (pol) system of cytoplasmic male sterility (CMS) and its fertility restoration gene Rfp have been used in hybrid breeding in Brassica napus, which has greatly improved the yield of rapeseed. However, the mechanism of the male sterility transition in pol CMS remains to be determined. RESULTS:To in...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-258
更新日期:2014-04-03 00:00:00
abstract:BACKGROUND:The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar h...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-9-57
更新日期:2008-01-29 00:00:00
abstract:BACKGROUND:While multiple replication origins have been observed in archaea, considerably less is known about their evolutionary processes. Here, we performed a comparative analysis of the predicted (proved in part) orc/cdc6-associated replication origins in 15 completely sequenced haloarchaeal genomes to investigate t...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-13-478
更新日期:2012-09-14 00:00:00
abstract:BACKGROUND:Meloidogyne incognita is a devastating nematode that causes significant losses in cucumber production worldwide. Although numerous studies have emphasized on the susceptible response of plants after nematode infection, the exact regulation mechanism of M. incognita-resistance in cucumber remains elusive. Ver...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-4979-0
更新日期:2018-08-03 00:00:00
abstract:BACKGROUND:The common marmoset monkey (Callithrix jacchus), a small non-endangered New World primate native to eastern Brazil, is becoming increasingly used as a non-human primate model in biomedical research, drug development and safety assessment. In contrast to the growing interest for the marmoset as an animal mode...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-8-190
更新日期:2007-06-25 00:00:00
abstract:BACKGROUND:CCCH type zinc finger proteins are RNA binding proteins with regulatory functions at all stages of mRNA metabolism. The best-characterized member, tritetraproline (TTP), binds to AU rich elements in 3' UTRs of unstable mRNAs, mediating their degradation. In kinetoplastids, CCCH type zinc finger proteins have...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-11-283
更新日期:2010-05-05 00:00:00
abstract:BACKGROUND:Circular chromosome conformation capture (4C) has provided important insights into three dimensional (3D) genome organization and its critical impact on the regulation of gene expression. We developed a new quantitative framework based on polymer physics for the analysis of paired-end sequencing 4C (PE-4Cseq...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-015-2137-5
更新日期:2015-11-21 00:00:00
abstract::In 2009 the International Society for Computational Biology (ISCB) started to roll out regional bioinformatics conferences in Africa, Latin America and Asia. The open and competitive bid for the first meeting in Asia (ISCB-Asia) was awarded to Asia-Pacific Bioinformatics Network (APBioNet) which has been running the I...
journal_title:BMC genomics
pub_type:
doi:10.1186/1471-2164-12-S3-S1
更新日期:2011-11-30 00:00:00
abstract:BACKGROUND:Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearra...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-561
更新日期:2014-07-04 00:00:00
abstract:BACKGROUND:Despite its relevance, almost no studies account for the genetic control in the early stages of tree development, i.e. from germination on. This study seeks to make a quite complete transcriptome for olive development and to elucidate the dynamic regulation of the transcriptomic response during the early-juv...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-5232-6
更新日期:2018-11-19 00:00:00
abstract:BACKGROUND:Despite the known importance of somatic cells for oocyte developmental competence acquisition, the overall mechanisms underlying the acquisition of full developmental competence are far from being understood, especially in non-mammalian species. The present work aimed at identifying key molecular signals fro...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-13-560
更新日期:2012-10-19 00:00:00
abstract:BACKGROUND:Extra-cellular components, such as serum and exosome, have drawn great attention as a readily accessible source of biomarkers for mammalian health. However, the contribution of different blood components to the signature of respective microRNAs (miRNAs) remains unknown, especially in cattle. In this study we...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-016-2962-1
更新日期:2016-08-12 00:00:00
abstract:BACKGROUND:Pseudomonas aeruginosa is an opportunistic pathogen with a high incidence of hospital infections that represents a threat to immune compromised patients. Genomic studies have shown that, in contrast to other pathogenic bacteria, clinical and environmental isolates do not show particular genomic differences. ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-318
更新日期:2014-04-28 00:00:00
abstract:BACKGROUND:Homologous recombination is the key process that generates genetic diversity and drives evolution. SPO11 protein triggers recombination by introducing DNA double stranded breaks at discreet areas of the genome called recombination hotspots. The hotspot locations are largely determined by the DNA binding spec...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-14-493
更新日期:2013-07-22 00:00:00
abstract:BACKGROUND:Many bacterial chromosomes display nucleotide asymmetry, or skew, between the leading and lagging strands of replication. Mutational differences between these strands result in an overall pattern of skew that is centered about the origin of replication. Such a pattern could also arise from selection coupled ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-8-369
更新日期:2007-10-12 00:00:00
abstract:BACKGROUND:Winter-ulcer Moritella viscosa infections continue to be a significant burden in Atlantic salmon (Salmo salar L.) farming. M. viscosa comprises two main clusters that differ in genetic variation and phenotypes including virulence. Horizontal gene transfer through acquisition and loss of mobile genetic elemen...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-017-3693-7
更新日期:2017-04-20 00:00:00
abstract:BACKGROUND:Previous studies show that galanin neurons in ventrolateral preoptic nucleus (VLPO-Gal) are essential for sleep regulation. Here, we explored the transcriptional regulation of the VLPO-Gal neurons in sleep by comparing their transcriptional responses between sleeping mice and those kept awake, sacrificed at ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-020-07050-7
更新日期:2020-09-14 00:00:00
abstract:BACKGROUND:Rheumatoid arthritis (RA) is a chronic autoimmune disease characterized by inflammation and destruction of synovial joints. RA affects up to 1 % of the population worldwide. Currently, there are no drugs that can cure RA or achieve sustained remission. The unknown cause of the disease represents a significan...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-016-2910-0
更新日期:2016-08-22 00:00:00
abstract:BACKGROUND:Cell lines are an indispensable tool in biomedical research and often used as surrogates for tissues. Although there are recognized important cellular and transcriptomic differences between cell lines and tissues, a systematic overview of the differences between the regulatory processes of a cell line and th...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-017-4111-x
更新日期:2017-09-12 00:00:00
abstract:BACKGROUND:The process of alternative splicing provides a unique mechanism by which eukaryotes are able to produce numerous protein products from the same gene. Heightened variability in the proteome has been thought to potentiate increased behavioral complexity and response flexibility to environmental stimuli, thus c...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-020-6600-6
更新日期:2020-03-23 00:00:00
abstract:BACKGROUND:Root-knot nematodes are sedentary endoparasites that can infect more than 3000 plant species. Root-knot nematodes cause an estimated $100 billion annual loss worldwide. For successful establishment of the root-knot nematode in its host plant, it causes dramatic morphological and physiological changes in plan...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-12-220
更新日期:2011-05-10 00:00:00
abstract:BACKGROUND:Grass carp (Ctenopharyngodon idella) belongs to the family Cyprinidae which includes more than 2000 fish species. It is one of the most important freshwater food fish species in world aquaculture. A linkage map is an essential framework for mapping traits of interest and is often the first step towards under...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-11-135
更新日期:2010-02-24 00:00:00
abstract:BACKGROUND:Ecological studies routinely show genotype-genotype interactions between insects and their parasites. The mechanisms behind these interactions are not clearly understood. Using the bumblebee Bombus terrestris/trypanosome Crithidia bombi model system (two bumblebee colonies by two Crithidia strains), we have ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-1031
更新日期:2014-11-27 00:00:00
abstract:BACKGROUND:The technological revolution in next-generation sequencing has brought unprecedented opportunities to study any organism of interest at the genomic or transcriptomic level. Transcriptome assembly is a crucial first step for studying the molecular basis of phenotypes of interest using RNA-Sequencing (RNA-Seq)...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-016-2923-8
更新日期:2016-07-27 00:00:00
abstract:BACKGROUND:Long non-coding RNAs (lncRNAs) regulate adipose tissue metabolism, however, their function on testosterone deficiency related obesity in humans is less understood. For this research, intact and castrated male pigs are the best model animal because of their similar proportional organ sizes, cardiovascular sys...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-017-3907-z
更新日期:2017-07-19 00:00:00
abstract:BACKGROUND:Integrons are genomic elements that mediate horizontal gene transfer by inserting and removing genetic material using site-specific recombination. Integrons are commonly found in bacterial genomes, where they maintain a large and diverse set of genes that plays an important role in adaptation and evolution. ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-020-06830-5
更新日期:2020-07-20 00:00:00