Estimating the total genome length of a metagenomic sample using k-mers.

Abstract:

BACKGROUND:Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition, limited availability of known reference genomes, and usually insufficient sequencing coverage. RESULTS:As an initial step toward understanding the complete composition of a metagenomic sample, we studied the problem of estimating the total length of all distinct component genomes in a metagenomic sample. We showed that this problem can be solved by estimating the total number of distinct k-mers in all the metagenomic sequencing data. We proposed a method for this estimation based on the sequencing coverage distribution of observed k-mers, and introduced a k-mer redundancy index (KRI) to fill in the gap between the count of distinct k-mers and the total genome length. We showed the effectiveness of the proposed method on a set of carefully designed simulation data corresponding to multiple situations of true metagenomic data. Results on real data indicate that the uncaptured genomic information can vary dramatically across metagenomic samples, with the potential to mislead downstream analyses. CONCLUSIONS:We proposed the question of how long the total genome length of all different species in a microbial community is and introduced a method to answer it.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Hua K,Zhang X

doi

10.1186/s12864-019-5467-x

subject

Has Abstract

pub_date

2019-04-04 00:00:00

pages

183

issue

Suppl 2

issn

1471-2164

pii

10.1186/s12864-019-5467-x

journal_volume

20

pub_type

杂志文章
  • A statistical framework for consolidating "sibling" probe sets for Affymetrix GeneChip data.

    abstract:BACKGROUND:Affymetrix GeneChip typically contains multiple probe sets per gene, defined as sibling probe sets in this study. These probe sets may or may not behave similar across treatments. The most appropriate way of consolidating sibling probe sets suitable for analysis is an open problem. We propose the Analysis of...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-188

    authors: Li H,Zhu D,Cook M

    更新日期:2008-04-24 00:00:00

  • Core and accessory genome architecture in a group of Pseudomonas aeruginosa Mu-like phages.

    abstract:BACKGROUND:Bacteriophages that infect the opportunistic pathogen Pseudomonas aeruginosa have been classified into several groups. One of them, which includes temperate phage particles with icosahedral heads and long flexible tails, bears genomes whose architecture and replication mechanism, but not their nucleotide seq...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-1146

    authors: Cazares A,Mendoza-Hernández G,Guarneros G

    更新日期:2014-12-19 00:00:00

  • Global analysis of the ovarian microRNA transcriptome: implication for miR-2 and miR-133 regulation of oocyte meiosis in the Chinese mitten crab, Eriocheir sinensis (Crustacea:Decapoda).

    abstract:BACKGROUND:MicroRNAs (miRNAs) are small non-coding RNA molecules that downregulate gene expression by base pairing to the 3'-untranslated region (UTR) of target messenger RNAs (mRNAs). Up to now, rare information for the miRNAs is available in decapod crustaceans. Our previous studies showed that many miRNA-binding sit...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-547

    authors: Song YN,Shi LL,Liu ZQ,Qiu GF

    更新日期:2014-07-01 00:00:00

  • Genomic characterization of ribitol teichoic acid synthesis in Staphylococcus aureus: genes, genomic organization and gene duplication.

    abstract:BACKGROUND:Staphylococcus aureus or MRSA (Methicillin Resistant S. aureus), is an acquired pathogen and the primary cause of nosocomial infections worldwide. In S. aureus, teichoic acid is an essential component of the cell wall, and its biosynthesis is not yet well characterized. Studies in Bacillus subtilis have disc...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-7-74

    authors: Qian Z,Yin Y,Zhang Y,Lu L,Li Y,Jiang Y

    更新日期:2006-04-05 00:00:00

  • Simple sequence repeats and compositional bias in the bipartite Ralstonia solanacearum GMI1000 genome.

    abstract:BACKGROUND:Ralstonia solanacearum is an important plant pathogen. The genome of R. solananearum GMI1000 is organised into two replicons (a 3.7-Mb chromosome and a 2.1-Mb megaplasmid) and this bipartite genome structure is characteristic for most R. solanacearum strains. To determine whether the megaplasmid was acquired...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-4-10

    authors: Coenye T,Vandamme P

    更新日期:2003-03-17 00:00:00

  • Yes, we can use it: a formal test on the accuracy of low-pass nanopore long-read sequencing for mitophylogenomics and barcoding research using the Caribbean spiny lobster Panulirus argus.

    abstract:BACKGROUND:Whole mitogenomes or short fragments (i.e., 300-700 bp of the cox1 gene) are the markers of choice for revealing within- and among-species genealogies. Protocols for sequencing and assembling mitogenomes include 'primer walking' or 'long PCR' followed by Sanger sequencing or Illumina short-read low-coverage ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-07292-5

    authors: Baeza JA

    更新日期:2020-12-09 00:00:00

  • LPS-treatment of bovine endometrial epithelial cells causes differential DNA methylation of genes associated with inflammation and endometrial function.

    abstract:BACKGROUND:Lipopolysaccharide (LPS) endotoxin stimulates pro-inflammatory pathways and is a key player in the pathological mechanisms involved in the development of endometritis. This study aimed to investigate LPS-induced DNA methylation changes in bovine endometrial epithelial cells (bEECs), which may affect endometr...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-06777-7

    authors: Jhamat N,Niazi A,Guo Y,Chanrot M,Ivanova E,Kelsey G,Bongcam-Rudloff E,Andersson G,Humblot P

    更新日期:2020-06-03 00:00:00

  • An unbiased approach to identify genes involved in development in a turtle with temperature-dependent sex determination.

    abstract:BACKGROUND:Many reptiles exhibit temperature-dependent sex determination (TSD). The initial cue in TSD is incubation temperature, unlike genotypic sex determination (GSD) where it is determined by the presence of specific alleles (or genetic loci). We used patterns of gene expression to identify candidates for genes wi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-308

    authors: Chojnowski JL,Braun EL

    更新日期:2012-07-15 00:00:00

  • Gastrointestinal microbial populations can distinguish pediatric and adolescent Acute Lymphoblastic Leukemia (ALL) at the time of disease diagnosis.

    abstract:BACKGROUND:An estimated 15,000 children and adolescents under the age of 19 years are diagnosed with leukemia, lymphoma and other tumors in the USA every year. All children and adolescent acute leukemia patients will undergo chemotherapy as part of their treatment regimen. Fortunately, survival rates for most pediatric...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2965-y

    authors: Rajagopala SV,Yooseph S,Harkins DM,Moncera KJ,Zabokrtsky KB,Torralba MG,Tovchigrechko A,Highlander SK,Pieper R,Sender L,Nelson KE

    更新日期:2016-08-15 00:00:00

  • Evolutionary origin of regulatory regions of retrogenes in Drosophila.

    abstract:BACKGROUND:Retrogenes are processed copies of other genes. This duplication mechanism produces a copy of the parental gene that should not contain introns, and usually does not contain cis-regulatory regions. Here, we computationally address the evolutionary origin of promoter and other cis-regulatory regions in retrog...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-241

    authors: Bai Y,Casola C,Betrán E

    更新日期:2008-05-22 00:00:00

  • Inference of gene interaction networks using conserved subsequential patterns from multiple time course gene expression datasets.

    abstract:MOTIVATION:Deciphering gene interaction networks (GINs) from time-course gene expression (TCGx) data is highly valuable to understand gene behaviors (e.g., activation, inhibition, time-lagged causality) at the system level. Existing methods usually use a global or local proximity measure to infer GINs from a single dat...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-16-S12-S4

    authors: Liu Q,Song R,Li J

    更新日期:2015-01-01 00:00:00

  • Genome-wide transcriptional analysis of T cell activation reveals differential gene expression associated with psoriasis.

    abstract:BACKGROUND:Psoriasis is a chronic autoimmune disease in which T cells have a predominant role in initiating and perpetuating the chronic inflammation in skin. However, the mechanisms that regulate T cell activation in psoriasis are still incompletely understood. The objective of the present study was to characterize th...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-825

    authors: Palau N,Julià A,Ferrándiz C,Puig L,Fonseca E,Fernández E,López-Lasanta M,Tortosa R,Marsal S

    更新日期:2013-11-23 00:00:00

  • Genetic architecture of kernel composition in global sorghum germplasm.

    abstract:BACKGROUND:Sorghum [Sorghum bicolor (L.) Moench] is an important cereal crop for dryland areas in the United States and for small-holder farmers in Africa. Natural variation of sorghum grain composition (protein, fat, and starch) between accessions can be used for crop improvement, but the genetic controls are still un...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3403-x

    authors: Rhodes DH,Hoffmann L Jr,Rooney WL,Herald TJ,Bean S,Boyles R,Brenton ZW,Kresovich S

    更新日期:2017-01-05 00:00:00

  • Altered gene expression in the superior temporal gyrus in schizophrenia.

    abstract:BACKGROUND:The superior temporal gyrus (STG), which encompasses the primary auditory cortex, is believed to be a major anatomical substrate for speech, language and communication. The STG connects to the limbic system (hippocampus and amygdala), the thalamus and neocortical association areas in the prefrontal cortex, a...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-199

    authors: Bowden NA,Scott RJ,Tooney PA

    更新日期:2008-04-29 00:00:00

  • Networking in microbes: conjugative elements and plasmids in the genus Alteromonas.

    abstract:BACKGROUND:To develop evolutionary models for the free living bacterium Alteromonas the genome sequences of isolates of the genus have been extensively analyzed. However, the main genetic exchange drivers in these microbes, conjugative elements (CEs), have not been considered in detail thus far. In this work, CEs have ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3461-0

    authors: López-Pérez M,Ramon-Marco N,Rodriguez-Valera F

    更新日期:2017-01-05 00:00:00

  • Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples.

    abstract:BACKGROUND:Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. RESULTS:We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogast...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-509

    authors: Liu Z,Venkatesh SS,Maley CC

    更新日期:2008-10-30 00:00:00

  • Prediction of HIV drug resistance from genotype with encoded three-dimensional protein structure.

    abstract:BACKGROUND:Drug resistance has become a severe challenge for treatment of HIV infections. Mutations accumulate in the HIV genome and make certain drugs ineffective. Prediction of resistance from genotype data is a valuable guide in choice of drugs for effective therapy. RESULTS:In order to improve the computational pr...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-S5-S1

    authors: Yu X,Weber IT,Harrison RW

    更新日期:2014-01-01 00:00:00

  • Glutamine rapidly induces the expression of key transcription factor genes involved in nitrogen and stress responses in rice roots.

    abstract:BACKGROUND:Glutamine is a major amino donor for the synthesis of amino acids, nucleotides, and other nitrogen-containing compounds in all organisms. In addition to its role in nutrition and metabolism, glutamine can also function as a signaling molecule in bacteria, yeast, and humans. By contrast, the functions of glut...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1892-7

    authors: Kan CC,Chung TY,Juo YA,Hsieh MH

    更新日期:2015-09-25 00:00:00

  • MetaTopics: an integration tool to analyze microbial community profile by topic model.

    abstract:BACKGROUND:Deciphering taxonomical structures based on high dimensional sequencing data is still challenging in metagenomics study. Moreover, the common workflow processed in this field fails to identify microbial communities and their effect on a specific disease status. Even the relationships and interactions between...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3257-2

    authors: Yan J,Chuai G,Qi T,Shao F,Zhou C,Zhu C,Yang J,Yu Y,Shi C,Kang N,He Y,Liu Q

    更新日期:2017-01-25 00:00:00

  • Deep and comparative analysis of the mycelium and appressorium transcriptomes of Magnaporthe grisea using MPSS, RL-SAGE, and oligoarray methods.

    abstract:BACKGROUND:Rice blast, caused by the fungal pathogen Magnaporthe grisea, is a devastating disease causing tremendous yield loss in rice production. The public availability of the complete genome sequence of M. grisea provides ample opportunities to understand the molecular mechanism of its pathogenesis on rice plants a...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-7-310

    authors: Gowda M,Venu RC,Raghupathy MB,Nobuta K,Li H,Wing R,Stahlberg E,Couglan S,Haudenschild CD,Dean R,Nahm BH,Meyers BC,Wang GL

    更新日期:2006-12-08 00:00:00

  • Use of biological priors enhances understanding of genetic architecture and genomic prediction of complex traits within and between dairy cattle breeds.

    abstract:BACKGROUND:A better understanding of the genetic architecture underlying complex traits (e.g., the distribution of causal variants and their effects) may aid in the genomic prediction. Here, we hypothesized that the genomic variants of complex traits might be enriched in a subset of genomic regions defined by genes gro...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4004-z

    authors: Fang L,Sahana G,Ma P,Su G,Yu Y,Zhang S,Lund MS,Sørensen P

    更新日期:2017-08-10 00:00:00

  • Identification of dysfunctional modules and disease genes in congenital heart disease by a network-based approach.

    abstract:BACKGROUND:The incidence of congenital heart disease (CHD) is continuously increasing among infants born alive nowadays, making it one of the leading causes of infant morbidity worldwide. Various studies suggest that both genetic and environmental factors lead to CHD, and therefore identifying its candidate genes and d...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-592

    authors: He D,Liu ZP,Chen L

    更新日期:2011-12-02 00:00:00

  • MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks.

    abstract:BACKGROUND:Tandem mass spectrometry allows biologists to identify and quantify protein samples in the form of digested peptide sequences. When performing peptide identification, spectral library search is more sensitive than traditional database search but is limited to peptides that have been previously identified. An...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6297-6

    authors: Lin YM,Chen CT,Chang JM

    更新日期:2019-12-24 00:00:00

  • The complete and fully assembled genome sequence of Aeromonas salmonicida subsp. pectinolytica and its comparative analysis with other Aeromonas species: investigation of the mobilome in environmental and pathogenic strains.

    abstract:BACKGROUND:Due to the predominant usage of short-read sequencing to date, most bacterial genome sequences reported in the last years remain at the draft level. This precludes certain types of analyses, such as the in-depth analysis of genome plasticity. RESULTS:Here we report the finalized genome sequence of the envir...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4301-6

    authors: Pfeiffer F,Zamora-Lagos MA,Blettinger M,Yeroslaviz A,Dahl A,Gruber S,Habermann BH

    更新日期:2018-01-05 00:00:00

  • Hierarchical transcriptional control regulates Plasmodium falciparum sexual differentiation.

    abstract:BACKGROUND:Malaria pathogenesis relies on sexual gametocyte forms of the malaria parasite to be transmitted between the infected human and the mosquito host but the molecular mechanisms controlling gametocytogenesis remains poorly understood. Here we provide a high-resolution transcriptome of Plasmodium falciparum as i...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6322-9

    authors: van Biljon R,van Wyk R,Painter HJ,Orchard L,Reader J,Niemand J,Llinás M,Birkholtz LM

    更新日期:2019-12-03 00:00:00

  • RNA profiles of rat olfactory epithelia: individual and age related variations.

    abstract:BACKGROUND:Mammalian genomes contain a large number (approximately 1000) of olfactory receptor (OR) genes, many of which (20 to 50%) are pseudogenes. OR gene transcription is not restricted to the olfactory epithelium, but is found in numerous tissues. Using microarray hybridization and RTqPCR, we analyzed the mRNA pro...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-572

    authors: Rimbault M,Robin S,Vaysse A,Galibert F

    更新日期:2009-12-02 00:00:00

  • Widespread promoter methylation of synaptic plasticity genes in long-term potentiation in the adult brain in vivo.

    abstract:BACKGROUND:DNA methylation is a key modulator of gene expression in mammalian development and cellular differentiation, including neurons. To date, the role of DNA modifications in long-term potentiation (LTP) has not been explored. RESULTS:To investigate the occurrence of DNA methylation changes in LTP, we undertook ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-3621-x

    authors: Maag JL,Kaczorowski DC,Panja D,Peters TJ,Bramham CR,Wibrand K,Dinger ME

    更新日期:2017-03-23 00:00:00

  • Divergence in function and expression of the NOD26-like intrinsic proteins in plants.

    abstract:BACKGROUND:NOD26-like intrinsic proteins (NIPs) that belong to the aquaporin superfamily are plant-specific and exhibit a similar three-dimensional structure. Experimental evidences however revealed that functional divergence should have extensively occurred among NIP genes. It is therefore intriguing to further invest...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-313

    authors: Liu Q,Wang H,Zhang Z,Wu J,Feng Y,Zhu Z

    更新日期:2009-07-15 00:00:00

  • Hepatic transcriptomic profiling reveals early toxicological mechanisms of uranium in Atlantic salmon (Salmo salar).

    abstract:BACKGROUND:Uranium (U) is a naturally occurring radionuclide that has been found in the aquatic environment due to anthropogenic activities. Exposure to U may pose risk to aquatic organisms due to its radiological and chemical toxicity. The present study aimed to characterize the chemical toxicity of U in Atlantic salm...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-694

    authors: Song Y,Salbu B,Teien HC,Sørlie Heier L,Rosseland BO,Høgåsen T,Tollefsen KE

    更新日期:2014-08-20 00:00:00

  • Comparative transcriptomics provide insight into the morphogenesis and evolution of fistular leaves in Allium.

    abstract:BACKGROUND:Fistular leaves frequently appear in Allium species, and previous developmental studies have proposed that the process of fistular leaf formation involves programmed cell death. However, molecular evidence for the role of programmed cell death in the formation of fistular leaf cavities has yet to be reported...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3474-8

    authors: Zhu S,Tang S,Tan Z,Yu Y,Dai Q,Liu T

    更新日期:2017-01-10 00:00:00