Impact of analytic provenance in genome analysis.

Abstract:

BACKGROUND:Many computational methods are available for assembly and annotation of newly sequenced microbial genomes. However, when new genomes are reported in the literature, there is frequently very little critical analysis of choices made during the sequence assembly and gene annotation stages. These choices have a direct impact on the biologically relevant products of a genomic analysis--for instance identification of common and differentiating regions among genomes in a comparison, or identification of enriched gene functional categories in a specific strain. Here, we examine the outcomes of different assembly and analysis steps in typical workflows in a comparison among strains of Vibrio vulnificus. RESULTS:Using six recently sequenced strains of V. vulnificus, we demonstrate the "alternate realities" of comparative genomics, and how they depend on the choice of a robust assembly method and accurate ab initio annotation. We apply several popular assemblers for paired-end Illumina data, and three well-regarded ab initio genefinders. We demonstrate significant differences in detected gene overlap among comparative genomics workflows that depend on these two steps. The divergence between workflows, even those using widely adopted methods, is obvious both at the single genome level and when a comparison is performed. In a typical example where multiple workflows are applied to the strain V. vulnificus CECT 4606, a workflow that uses the Velvet assembler and Glimmer gene finder identifies 3275 gene features, while a workflow that uses the Velvet assembler and the RAST annotation system identifies 5011 gene features. Only 3171 genes are identical between both workflows. When we examine 9 assembly/annotation workflow scenarios as input to a three-way genome comparison, differentiating genes and even differentially represented functional categories change significantly from scenario to scenario. CONCLUSIONS:Inconsistencies in genomic analysis can arise depending on the choices that are made during the assembly and annotation stages. These inconsistencies can have a significant impact on the interpretation of an individual genome's content. The impact is multiplied when comparison of content and function among multiple genomes is the goal. Tracking the analysis history of the data--its analytic provenance--is critical for reproducible analysis of genome data.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Morrison SS,Pyzh R,Jeon MS,Amaro C,Roig FJ,Baker-Austin C,Oliver JD,Gibas CJ

doi

10.1186/1471-2164-15-S8-S1

subject

Has Abstract

pub_date

2014-01-01 00:00:00

pages

S1

issn

1471-2164

pii

1471-2164-15-S8-S1

journal_volume

15 Suppl 8

pub_type

杂志文章
  • Microarray-based ultra-high resolution discovery of genomic deletion mutations.

    abstract:BACKGROUND:Oligonucleotide microarray-based comparative genomic hybridization (CGH) offers an attractive possible route for the rapid and cost-effective genome-wide discovery of deletion mutations. CGH typically involves comparison of the hybridization intensities of genomic DNA samples with microarray chip representat...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-224

    authors: Belfield EJ,Brown C,Gan X,Jiang C,Baban D,Mithani A,Mott R,Ragoussis J,Harberd NP

    更新日期:2014-03-22 00:00:00

  • miR-27b shapes the presynaptic transcriptome and influences neurotransmission by silencing the polycomb group protein Bmi1.

    abstract:BACKGROUND:MicroRNAs (miRNAs) are short non-coding RNAs that are emerging as important post-transcriptional regulators of neuronal and synaptic development. The precise impact of miRNAs on presynaptic function and neurotransmission remains, however, poorly understood. RESULTS:Here, we identify miR-27b-an abundant neur...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3139-7

    authors: Poon VY,Gu M,Ji F,VanDongen AM,Fivaz M

    更新日期:2016-10-04 00:00:00

  • High-throughput sequencing of circRNAs reveals novel insights into mechanisms of nigericin in pancreatic cancer.

    abstract:BACKGROUND:Our previous study had proved that nigericin could reduce colorectal cancer cell proliferation in dose- and time-dependent manners by targeting Wnt/β-catenin signaling. To better elucidate its potential anti-cancer mechanism, two pancreatic cancer (PC) cell lines were exposed to increasing concentrations of ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6032-3

    authors: Xu Z,Shen J,Hua S,Wan D,Chen Q,Han Y,Ren R,Liu F,Du Z,Guo X,Shi J,Zhi Q

    更新日期:2019-09-18 00:00:00

  • Mycoplasma non-coding RNA: identification of small RNAs and targets.

    abstract:BACKGROUND:Bacterial non-coding RNAs act by base-pairing as regulatory elements in crucial biological processes. We performed the identification of trans-encoded small RNAs (sRNA) from the genomes of Mycoplama hyopneumoniae, Mycoplasma flocculare and Mycoplasma hyorhinis, which are Mycoplasma species that have been ide...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3061-z

    authors: Siqueira FM,de Morais GL,Higashi S,Beier LS,Breyer GM,de Sá Godinho CP,Sagot MF,Schrank IS,Zaha A,de Vasconcelos AT

    更新日期:2016-10-25 00:00:00

  • Full-length cDNA sequences from Rhesus monkey placenta tissue: analysis and utility for comparative mapping.

    abstract:BACKGROUND:Rhesus monkeys (Macaca mulatta) are widely-used as experimental animals in biomedical research and are closely related to other laboratory macaques, such as cynomolgus monkeys (Macaca fascicularis), and to humans, sharing a last common ancestor from about 25 million years ago. Although rhesus monkeys have be...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-427

    authors: Kim DS,Huh JW,Kim YH,Park SJ,Lee SR,Chang KT

    更新日期:2010-07-12 00:00:00

  • Pangenome analysis of Bifidobacterium longum and site-directed mutagenesis through by-pass of restriction-modification systems.

    abstract:BACKGROUND:Bifidobacterial genome analysis has provided insights as to how these gut commensals adapt to and persist in the human GIT, while also revealing genetic diversity among members of a given bifidobacterial (sub)species. Bifidobacteria are notoriously recalcitrant to genetic modification, which prevents explora...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1968-4

    authors: O'Callaghan A,Bottacini F,O'Connell Motherway M,van Sinderen D

    更新日期:2015-10-21 00:00:00

  • Horizontal transfer of OC1 transposons in the Tasmanian devil.

    abstract:BACKGROUND:There is growing recognition that horizontal DNA transfer, a process known to be common in prokaryotes, is also a significant source of genomic variation in eukaryotes. Horizontal transfer of transposable elements (HTT) may be especially prevalent in eukaryotes given the inherent mobility, widespread occurre...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-134

    authors: Gilbert C,Waters P,Feschotte C,Schaack S

    更新日期:2013-02-27 00:00:00

  • Overlapping genes in the human and mouse genomes.

    abstract:BACKGROUND:Increasing evidence suggests that overlapping genes are much more common in eukaryotic genomes than previously thought. In this study we identified and characterized the overlapping genes in a set of 13,484 pairs of human-mouse orthologous genes. RESULTS:About 10% of the genes under study are overlapping ge...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-169

    authors: Sanna CR,Li WH,Zhang L

    更新日期:2008-04-14 00:00:00

  • Endogenous circadian time genes expressions in the liver of mice under constant darkness.

    abstract:BACKGROUND:The circadian rhythms regulate physiological functions and metabolism. Circadian Time (CT) is a unit to quantify the rhythm of endogenous circadian clock, independent of light influence. To understand the gene expression changes throughout CT, C57BL/6 J mice were maintained under constant darkness (DD) for 6...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-6639-4

    authors: Li H,Zhang S,Zhang W,Chen S,Rabearivony A,Shi Y,Liu J,Corton CJ,Liu C

    更新日期:2020-03-12 00:00:00

  • De novo assembly and characterization of root transcriptome using Illumina paired-end sequencing and development of cSSR markers in sweet potato (Ipomoea batatas).

    abstract:BACKGROUND:The tuberous root of sweet potato is an important agricultural and biological organ. There are not sufficient transcriptomic and genomic data in public databases for understanding of the molecular mechanism underlying the tuberous root formation and development. Thus, high throughput transcriptome sequencing...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-726

    authors: Wang Z,Fang B,Chen J,Zhang X,Luo Z,Huang L,Chen X,Li Y

    更新日期:2010-12-24 00:00:00

  • Genetic architecture and genomic selection of female reproduction traits in rainbow trout.

    abstract:BACKGROUND:Rainbow trout is a significant fish farming species under temperate climates. Female reproduction traits play an important role in the economy of breeding companies with the sale of fertilized eggs. The objectives of this study are threefold: to estimate the genetic parameters of female reproduction traits, ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-06955-7

    authors: D'Ambrosio J,Morvezen R,Brard-Fudulea S,Bestin A,Acin Perez A,Guéméné D,Poncet C,Haffray P,Dupont-Nivet M,Phocas F

    更新日期:2020-08-14 00:00:00

  • Comparative analyses of genotype dependent expressed sequence tags and stress-responsive transcriptome of chickpea wilt illustrate predicted and unexpected genes and novel regulators of plant immunity.

    abstract:BACKGROUND:The ultimate phenome of any organism is modulated by regulated transcription of many genes. Characterization of genetic makeup is thus crucial for understanding the molecular basis of phenotypic diversity, evolution and response to intra- and extra-cellular stimuli. Chickpea is the world's third most importa...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-415

    authors: Ashraf N,Ghai D,Barman P,Basu S,Gangisetty N,Mandal MK,Chakraborty N,Datta A,Chakraborty S

    更新日期:2009-09-05 00:00:00

  • Genes associated with the cis-regulatory functions of intragenic LINE-1 elements.

    abstract:BACKGROUND:Thousands of intragenic long interspersed element 1 sequences (LINE-1 elements or L1s) reside within genes. These intragenic L1 sequences are conserved and regulate the expression of their host genes. When L1 methylation is decreased, either through chemical induction or in cancer, the intragenic L1 transcri...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-205

    authors: Wanichnopparat W,Suwanwongse K,Pin-On P,Aporntewan C,Mutirangura A

    更新日期:2013-03-27 00:00:00

  • Gene2vec: distributed representation of genes based on co-expression.

    abstract:BACKGROUND:Existing functional description of genes are categorical, discrete, and mostly through manual process. In this work, we explore the idea of gene embedding, distributed representation of genes, in the spirit of word embedding. RESULTS:From a pure data-driven fashion, we trained a 200-dimension vector represe...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5370-x

    authors: Du J,Jia P,Dai Y,Tao C,Zhao Z,Zhi D

    更新日期:2019-02-04 00:00:00

  • Computational discovery and RT-PCR validation of novel Burkholderia conserved and Burkholderia pseudomallei unique sRNAs.

    abstract:BACKGROUND:The sRNAs of bacterial pathogens are known to be involved in various cellular roles including environmental adaptation as well as regulation of virulence and pathogenicity. It is expected that sRNAs may also have similar functions for Burkholderia pseudomallei, a soil bacterium that can adapt to diverse envi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-S7-S13

    authors: Khoo JS,Chai SF,Mohamed R,Nathan S,Firdaus-Raih M

    更新日期:2012-01-01 00:00:00

  • Comparative analysis of Cd-responsive maize and rice transcriptomes highlights Cd co-modulated orthologs.

    abstract:BACKGROUND:Metal tolerance is often an integrative result of metal uptake and distribution, which are fine-tuned by a network of signaling cascades and metal transporters. Thus, with the goal of advancing the molecular understanding of such metal homeostatic mechanisms, comparative RNAseq-based transcriptome analysis w...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5109-8

    authors: Cheng D,Tan M,Yu H,Li L,Zhu D,Chen Y,Jiang M

    更新日期:2018-09-26 00:00:00

  • Effect of sample stratification on dairy GWAS results.

    abstract:BACKGROUND:Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for s...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-536

    authors: Ma L,Wiggans GR,Wang S,Sonstegard TS,Yang J,Crooker BA,Cole JB,Van Tassell CP,Lawlor TJ,Da Y

    更新日期:2012-10-06 00:00:00

  • Copy number variation in the genomes of twelve natural isolates of Caenorhabditis elegans.

    abstract:BACKGROUND:Copy number variation is an important component of genetic variation in higher eukaryotes. The extent of natural copy number variation in C. elegans is unknown outside of 2 highly divergent wild isolates and the canonical N2 Bristol strain. RESULTS:We have used array comparative genomic hybridization (aCGH)...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-62

    authors: Maydan JS,Lorch A,Edgley ML,Flibotte S,Moerman DG

    更新日期:2010-01-25 00:00:00

  • Investigation of regions impacting inbreeding depression and their association with the additive genetic effect for United States and Australia Jersey dairy cattle.

    abstract:BACKGROUND:Variation in environment, management practices, nutrition or selection objectives has led to a variety of different choices being made in the use of genetic material between countries. Differences in genome-level homozygosity between countries may give rise to regions that result in inbreeding depression to ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-2001-7

    authors: Howard JT,Haile-Mariam M,Pryce JE,Maltecca C

    更新日期:2015-10-19 00:00:00

  • ABO antigen and secretor statuses are not associated with gut microbiota composition in 1,500 twins.

    abstract:BACKGROUND:Host genetics is one of several factors known to shape human gut microbiome composition, however, the physiological processes underlying the heritability are largely unknown. Inter-individual differences in host factors secreted into the gut lumen may lead to variation in microbiome composition. One such fac...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3290-1

    authors: Davenport ER,Goodrich JK,Bell JT,Spector TD,Ley RE,Clark AG

    更新日期:2016-11-21 00:00:00

  • The cytochrome P450 (CYP) gene superfamily in Daphnia pulex.

    abstract:BACKGROUND:Cytochrome P450s (CYPs) in animals fall into two categories: those that synthesize or metabolize endogenous molecules and those that interact with exogenous chemicals from the diet or the environment. The latter form a critical component of detoxification systems. RESULTS:Data mining and manual curation of ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-169

    authors: Baldwin WS,Marko PB,Nelson DR

    更新日期:2009-04-21 00:00:00

  • ArachnoServer: a database of protein toxins from spiders.

    abstract:BACKGROUND:Venomous animals incapacitate their prey using complex venoms that can contain hundreds of unique protein toxins. The realisation that many of these toxins may have pharmaceutical and insecticidal potential due to their remarkable potency and selectivity against target receptors has led to an explosion in th...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-375

    authors: Wood DL,Miljenović T,Cai S,Raven RJ,Kaas Q,Escoubas P,Herzig V,Wilson D,King GF

    更新日期:2009-08-13 00:00:00

  • Transcript profiling of Populus tomentosa genes in normal, tension, and opposite wood by RNA-seq.

    abstract:BACKGROUND:Wood formation affects the chemical and physical properties of wood, and thus affects its utility as a building material or a feedstock for biofuels, pulp and paper. To obtain genome-wide insights on the transcriptome changes and regulatory networks in wood formation, we used high-throughput RNA sequencing t...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1390-y

    authors: Chen J,Chen B,Zhang D

    更新日期:2015-03-10 00:00:00

  • Transcriptome analysis of a respiratory Saccharomyces cerevisiae strain suggests the expression of its phenotype is glucose insensitive and predominantly controlled by Hap4, Cat8 and Mig1.

    abstract:BACKGROUND:We previously described the first respiratory Saccharomyces cerevisiae strain, KOY.TM6*P, by integrating the gene encoding a chimeric hexose transporter, Tm6*, into the genome of an hxt null yeast. Subsequently we transferred this respiratory phenotype in the presence of up to 50 g/L glucose to a yeast strai...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-365

    authors: Bonander N,Ferndahl C,Mostad P,Wilks MD,Chang C,Showe L,Gustafsson L,Larsson C,Bill RM

    更新日期:2008-07-31 00:00:00

  • The Babesia bovis gene and promoter model: an update from full-length EST analysis.

    abstract:BACKGROUND:Babesia bovis is an apicomplexan parasite that causes babesiosis in infected cattle. Genomes of pathogens contain promising information that can facilitate the development of methods for controlling infections. Although the genome of B. bovis is publically available, annotated gene models are not highly reli...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-678

    authors: Yamagishi J,Wakaguri H,Yokoyama N,Yamashita R,Suzuki Y,Xuan X,Igarashi I

    更新日期:2014-08-13 00:00:00

  • Transcriptional responses of PBMC in psychosocially stressed animals indicate an alerting of the immune system in female but not in castrated male pigs.

    abstract:BACKGROUND:Brain and immune system are linked in a bi-directional manner. To date, it remained largely unknown why immune components become suppressed, enhanced, or remain unaffected in relation to psychosocial stress. Therefore, we mixed unfamiliar pigs with different levels of aggressiveness. We separated castrated m...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-967

    authors: Oster M,Muráni E,Ponsuksili S,D'Eath RB,Turner SP,Evans G,Thölking L,Kurt E,Klont R,Foury A,Mormède P,Wimmers K

    更新日期:2014-11-08 00:00:00

  • Reconstruction of temporal activity of microRNAs from gene expression data in breast cancer cell line.

    abstract:BACKGROUND:MicroRNAs (miRNAs) are small non-coding RNAs that regulate genes at the post-transcriptional level in spatiotemporal manner. Several miRNAs are identified as prognostic and diagnostic markers in many human cancers. Estimation of the temporal activities of the miRNAs is an important step in the way to underst...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-2260-3

    authors: Jayavelu ND,Bar N

    更新日期:2015-12-18 00:00:00

  • Coevolution of paired receptors in Xenopus carcinoembryonic antigen-related cell adhesion molecule families suggests appropriation as pathogen receptors.

    abstract:BACKGROUND:In mammals, CEACAM1 and closely related members represent paired receptors with similar extracellular ligand-binding regions and cytoplasmic domains with opposing functions. Human CEACAM1 and CEACAM3 which have inhibitory ITIM/ITSM and activating ITAM-like motifs, respectively, in their cytoplasmic regions a...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3279-9

    authors: Zimmermann W,Kammerer R

    更新日期:2016-11-16 00:00:00

  • Preferred and avoided codon pairs in three domains of life.

    abstract:BACKGROUND:Alternative synonymous codons are not used with equal frequencies. In addition, the contexts of codons - neighboring nucleotides and neighboring codons - can have certain patterns. The codon context can influence both translational accuracy and elongation rates. However, it is not known how strong or conserv...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-463

    authors: Tats A,Tenson T,Remm M

    更新日期:2008-10-08 00:00:00

  • Comparative genomics of European avian pathogenic E. Coli (APEC).

    abstract:BACKGROUND:Avian pathogenic Escherichia coli (APEC) causes colibacillosis, which results in significant economic losses to the poultry industry worldwide. However, the diversity between isolates remains poorly understood. Here, a total of 272 APEC isolates collected from the United Kingdom (UK), Italy and Germany were ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3289-7

    authors: Cordoni G,Woodward MJ,Wu H,Alanazi M,Wallis T,La Ragione RM

    更新日期:2016-11-22 00:00:00