Optimizing de novo assembly of short-read RNA-seq data for phylogenomics.

Abstract:

BACKGROUND:RNA-seq has shown huge potential for phylogenomic inferences in non-model organisms. However, error, incompleteness, and redundant assembled transcripts for each gene in de novo assembly of short reads cause noise in analyses and a large amount of missing data in the aligned matrix. To address these problems, we compare de novo assemblies of paired end 90 bp RNA-seq reads using Oases, Trinity, Trans-ABySS and SOAPdenovo-Trans to transcripts from genome annotation of the model plant Ricinus communis. By doing so we evaluate strategies for optimizing total gene coverage and minimizing assembly chimeras and redundancy. RESULTS:We found that the frequency and structure of chimeras vary dramatically among different software packages. The differences were largely due to the number of trans-self chimeras that contain repeats in the opposite direction. More than half of the total chimeras in Oases and Trinity were trans-self chimeras. Within each package, we found a trade-off between maximizing reference coverage and minimizing redundancy and chimera rate. In order to reduce redundancy, we investigated three methods: 1) using cap3 and CD-HIT-EST to combine highly similar transcripts, 2) only retaining the transcript with the highest read coverage, or removing the transcript with the lowest read coverage for each subcomponent in Trinity, and 3) filtering Oases single k-mer assemblies by number of transcripts per locus and relative transcript length, and then finding the transcript with the highest read coverage. We then utilized results from blastx against model protein sequences to effectively remove trans chimeras. After optimization, seven assembly strategies among all four packages successfully assembled 42.9-47.1% of reference genes to more than 200 bp, with a chimera rate of 0.92-2.21%, and on average 1.8-3.1 transcripts per reference gene assembled. CONCLUSIONS:With rapidly improving sequencing and assembly tools, our study provides a framework to benchmark and optimize performance before choosing tools or parameter combinations for analyzing short-read RNA-seq data. Our study demonstrates that choice of assembly package, k-mer sizes, post-assembly redundancy-reduction and chimera cleanup, and strand-specific RNA-seq library preparation and assembly dramatically improves gene coverage by non-redundant and non-chimeric transcripts that are optimized for downstream phylogenomic analyses.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Yang Y,Smith SA

doi

10.1186/1471-2164-14-328

subject

Has Abstract

pub_date

2013-05-14 00:00:00

pages

328

issn

1471-2164

pii

1471-2164-14-328

journal_volume

14

pub_type

杂志文章
  • Diverse histone modifications on histone 3 lysine 9 and their relation to DNA methylation in specifying gene silencing.

    abstract:BACKGROUND:Previous studies of individual genes have shown that in a self-enforcing way, dimethylation at histone 3 lysine 9 (dimethyl-H3K9) and DNA methylation cooperate to maintain a repressive mode of inactive genes. Less clear is whether this cooperation is generalized in mammalian genomes, such as mouse genome. He...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-131

    authors: Wu J,Wang SH,Potter D,Liu JC,Smith LT,Wu YZ,Huang TH,Plass C

    更新日期:2007-05-24 00:00:00

  • Multiple genetic loci define Ca++ utilization by bloodstream malaria parasites.

    abstract:BACKGROUND:Bloodstream malaria parasites require Ca++ for their development, but the sites and mechanisms of Ca++ utilization are not well understood. We hypothesized that there may be differences in Ca++ uptake or utilization by genetically distinct lines of P. falciparum. These differences, if identified, may provide...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5418-y

    authors: Apolis L,Olivas J,Srinivasan P,Kushwaha AK,Desai SA

    更新日期:2019-01-16 00:00:00

  • Computational discovery and RT-PCR validation of novel Burkholderia conserved and Burkholderia pseudomallei unique sRNAs.

    abstract:BACKGROUND:The sRNAs of bacterial pathogens are known to be involved in various cellular roles including environmental adaptation as well as regulation of virulence and pathogenicity. It is expected that sRNAs may also have similar functions for Burkholderia pseudomallei, a soil bacterium that can adapt to diverse envi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-S7-S13

    authors: Khoo JS,Chai SF,Mohamed R,Nathan S,Firdaus-Raih M

    更新日期:2012-01-01 00:00:00

  • How many human genes can be defined as housekeeping with current expression data?

    abstract:BACKGROUND:Housekeeping (HK) genes are ubiquitously expressed in all tissue/cell types and constitute a basal transcriptome for the maintenance of basic cellular functions. Partitioning transcriptomes into HK and tissue-specific (TS) genes relatively is fundamental for studying gene expression and cellular differentiat...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-172

    authors: Zhu J,He F,Song S,Wang J,Yu J

    更新日期:2008-04-16 00:00:00

  • Transcriptomic time-series analysis of early development in olive from germinated embryos to juvenile tree.

    abstract:BACKGROUND:Despite its relevance, almost no studies account for the genetic control in the early stages of tree development, i.e. from germination on. This study seeks to make a quite complete transcriptome for olive development and to elucidate the dynamic regulation of the transcriptomic response during the early-juv...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5232-6

    authors: Jiménez-Ruiz J,de la O Leyva-Pérez M,Vidoy-Mercado I,Barceló A,Luque F

    更新日期:2018-11-19 00:00:00

  • Computational prediction and experimental validation of evolutionarily conserved microRNA target genes in bilaterian animals.

    abstract:BACKGROUND:In many eukaryotes, microRNAs (miRNAs) bind to complementary sites in the 3'-untranslated regions (3'-UTRs) of target messenger RNAs (mRNAs) and regulate their expression at the stage of translation. Recent studies have revealed that many miRNAs are evolutionarily conserved; however, the evolution of their t...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-101

    authors: Takane K,Fujishima K,Watanabe Y,Sato A,Saito N,Tomita M,Kanai A

    更新日期:2010-02-09 00:00:00

  • Histone H3 lysine 4 methyltransferase is required for facultative heterochromatin at specific loci.

    abstract:BACKGROUND:Histone H3 lysine 4 tri-methylation (H3K4me3) and histone H3 lysine 9 tri-methylation (H3K9me3) are widely perceived to be opposing and often mutually exclusive chromatin modifications. However, both are needed for certain light-activated genes in Neurospora crassa (Neurospora), including frequency (frq) and...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5729-7

    authors: Zhu Q,Ramakrishnan M,Park J,Belden WJ

    更新日期:2019-05-08 00:00:00

  • Dynamic, mating-induced gene expression changes in female head and brain tissues of Drosophila melanogaster.

    abstract:BACKGROUND:Drosophila melanogaster females show changes in behavior and physiology after mating that are thought to maximize the number of progeny resulting from the most recent copulation. Sperm and seminal fluid proteins induce post-mating changes in females, however, very little is known about the resulting gene exp...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-541

    authors: Dalton JE,Kacheria TS,Knott SR,Lebo MS,Nishitani A,Sanders LE,Stirling EJ,Winbush A,Arbeitman MN

    更新日期:2010-10-06 00:00:00

  • Construction of relatedness matrices using genotyping-by-sequencing data.

    abstract:BACKGROUND:Genotyping-by-sequencing (GBS) is becoming an attractive alternative to array-based methods for genotyping individuals for a large number of single nucleotide polymorphisms (SNPs). Costs can be lowered by reducing the mean sequencing depth, but this results in genotype calls of lower quality. A common analys...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-2252-3

    authors: Dodds KG,McEwan JC,Brauning R,Anderson RM,van Stijn TC,Kristjánsson T,Clarke SM

    更新日期:2015-12-09 00:00:00

  • Secondary structure in the target as a confounding factor in synthetic oligomer microarray design.

    abstract:BACKGROUND:Secondary structure in the target is a property not usually considered in software applications for design of optimal custom oligonucleotide probes. It is frequently assumed that eliminating self-complementarity, or screening for secondary structure in the probe, is sufficient to avoid interference with hybr...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-6-31

    authors: Ratushna VG,Weller JW,Gibas CJ

    更新日期:2005-03-08 00:00:00

  • The purplish bifurcate mussel Mytilisepta virgata gene expression atlas reveals a remarkable tissue functional specialization.

    abstract:BACKGROUND:Mytilisepta virgata is a marine mussel commonly found along the coasts of Japan. Although this species has been the subject of occasional studies concerning its ecological role, growth and reproduction, it has been so far almost completely neglected from a genetic and molecular point of view. In the present ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4012-z

    authors: Gerdol M,Fujii Y,Hasan I,Koike T,Shimojo S,Spazzali F,Yamamoto K,Ozeki Y,Pallavicini A,Fujita H

    更新日期:2017-08-08 00:00:00

  • Evolution of cis- and trans-regulatory divergence in the chicken genome between two contrasting breeds analyzed using three tissue types at one-day-old.

    abstract:BACKGROUND:Gene expression variation is a key underlying factor influencing phenotypic variation, and can occur via cis- or trans-regulation. To understand the role of cis- and trans-regulatory variation on population divergence in chicken, we developed reciprocal crosses of two chicken breeds, White Leghorn and Cornis...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6342-5

    authors: Wang Q,Jia Y,Wang Y,Jiang Z,Zhou X,Zhang Z,Nie C,Li J,Yang N,Qu L

    更新日期:2019-12-05 00:00:00

  • Comparative genomics of 84 Pectobacterium genomes reveals the variations related to a pathogenic lifestyle.

    abstract:BACKGROUND:Pectobacterium spp. are necrotrophic bacterial plant pathogens of the family Pectobacteriaceae, responsible for a wide spectrum of diseases of important crops and ornamental plants including soft rot, blackleg, and stem wilt. P. carotovorum is a genetically heterogeneous species consisting of three valid sub...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5269-6

    authors: Li X,Ma Y,Liang S,Tian Y,Yin S,Xie S,Xie H

    更新日期:2018-12-07 00:00:00

  • Proteomic analysis of Citrus sinensis roots and leaves in response to long-term magnesium-deficiency.

    abstract:BACKGROUND:Magnesium (Mg)-deficiency is frequently observed in Citrus plantations and is responsible for the loss of productivity and poor fruit quality. Knowledge on the effects of Mg-deficiency on upstream targets is scarce. Seedlings of 'Xuegan' [Citrus sinensis (L.) Osbeck] were irrigated with Mg-deficient (0 mM Mg...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1462-z

    authors: Peng HY,Qi YP,Lee J,Yang LT,Guo P,Jiang HX,Chen LS

    更新日期:2015-03-31 00:00:00

  • Insights into the Musa genome: syntenic relationships to rice and between Musa species.

    abstract:BACKGROUND:Musa species (Zingiberaceae, Zingiberales) including bananas and plantains are collectively the fourth most important crop in developing countries. Knowledge concerning Musa genome structure and the origin of distinct cultivars has greatly increased over the last few years. Until now, however, no large-scale...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-58

    authors: Lescot M,Piffanelli P,Ciampi AY,Ruiz M,Blanc G,Leebens-Mack J,da Silva FR,Santos CM,D'Hont A,Garsmeur O,Vilarinhos AD,Kanamori H,Matsumoto T,Ronning CM,Cheung F,Haas BJ,Althoff R,Arbogast T,Hine E,Pappas GJ Jr,Sas

    更新日期:2008-01-30 00:00:00

  • The complete mitochondrial genome of the early flowering plant Nymphaea colorata is highly repetitive with low recombination.

    abstract:BACKGROUND:Mitochondrial genomes of flowering plants (angiosperms) are highly dynamic in genome structure. The mitogenome of the earliest angiosperm Amborella is remarkable in carrying rampant foreign DNAs, in contrast to Liriodendron, the other only known early angiosperm mitogenome that is described as 'fossilized'. ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4991-4

    authors: Dong S,Zhao C,Chen F,Liu Y,Zhang S,Wu H,Zhang L,Liu Y

    更新日期:2018-08-14 00:00:00

  • Correction to: Proteotranscriptomics assisted gene annotation and spatial proteomics of Bombyx mori BmN4 cell line.

    abstract::An amendment to this paper has been published and can be accessed via the original article. ...

    journal_title:BMC genomics

    pub_type: 已发布勘误

    doi:10.1186/s12864-020-07211-8

    authors: Levin M,Scheibe M,Butter F

    更新日期:2020-11-12 00:00:00

  • Polygenic and sex specific architecture for two maturation traits in farmed Atlantic salmon.

    abstract:BACKGROUND:A key developmental transformation in the life of all vertebrates is the transition to sexual maturity, whereby individuals are capable of reproducing for the first time. In the farming of Atlantic salmon, early maturation prior to harvest size has serious negative production impacts. RESULTS:We report geno...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5525-4

    authors: Mohamed AR,Verbyla KL,Al-Mamun HA,McWilliam S,Evans B,King H,Kube P,Kijas JW

    更新日期:2019-02-15 00:00:00

  • Developing and applying a gene functional association network for anti-angiogenic kinase inhibitor activity assessment in an angiogenesis co-culture model.

    abstract:BACKGROUND:Tumor angiogenesis is a highly regulated process involving intercellular communication as well as the interactions of multiple downstream signal transduction pathways. Disrupting one or even a few angiogenesis pathways is often insufficient to achieve sustained therapeutic benefits due to the complexity of a...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-264

    authors: Chen Y,Wei T,Yan L,Lawrence F,Qian HR,Burkholder TP,Starling JJ,Yingling JM,Shou J

    更新日期:2008-06-02 00:00:00

  • A genomics-based systems approach towards drug repositioning for rheumatoid arthritis.

    abstract:BACKGROUND:Rheumatoid arthritis (RA) is a chronic autoimmune disease characterized by inflammation and destruction of synovial joints. RA affects up to 1 % of the population worldwide. Currently, there are no drugs that can cure RA or achieve sustained remission. The unknown cause of the disease represents a significan...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2910-0

    authors: Xu R,Wang Q

    更新日期:2016-08-22 00:00:00

  • Detecting fitness epistasis in recently admixed populations with genome-wide data.

    abstract:BACKGROUND:Fitness epistasis, the interaction effect of genes at different loci on fitness, makes an important contribution to adaptive evolution. Although fitness interaction evidence has been observed in model organisms, it is more difficult to detect and remains poorly understood in human populations as a result of ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-06874-7

    authors: Ni X,Zhou M,Wang H,He KY,Broeckel U,Hanis C,Kardia S,Redline S,Cooper RS,Tang H,Zhu X

    更新日期:2020-07-11 00:00:00

  • Histological and global gene expression analysis of the 'lactating' pigeon crop.

    abstract:BACKGROUND:Both male and female pigeons have the ability to produce a nutrient solution in their crop for the nourishment of their young. The production of the nutrient solution has been likened to lactation in mammals, and hence the product has been called pigeon 'milk'. It has been shown that pigeon 'milk' is essenti...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-452

    authors: Gillespie MJ,Haring VR,McColl KA,Monaghan P,Donald JA,Nicholas KR,Moore RJ,Crowley TM

    更新日期:2011-09-19 00:00:00

  • Selective constraint, background selection, and mutation accumulation variability within and between human populations.

    abstract:BACKGROUND:Regions of the genome that are under evolutionary constraint across multiple species have previously been used to identify functional sequences in the human genome. Furthermore, it is known that there is an inverse relationship between evolutionary constraint and the allele frequency of a mutation segregatin...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-495

    authors: Hodgkinson A,Casals F,Idaghdour Y,Grenier JC,Hernandez RD,Awadalla P

    更新日期:2013-07-23 00:00:00

  • The Epc-N domain: a predicted protein-protein interaction domain found in select chromatin associated proteins.

    abstract:BACKGROUND:An underlying tenet of the epigenetic code hypothesis is the existence of protein domains that can recognize various chromatin structures. To date, two major candidates have emerged: (i) the bromodomain, which can recognize certain acetylation marks and (ii) the chromodomain, which can recognize certain meth...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-7-6

    authors: Perry J

    更新日期:2006-01-16 00:00:00

  • Local hopping mobile DNA implicated in pseudogene formation and reductive evolution in an obligate cyanobacteria-plant symbiosis.

    abstract:BACKGROUND:Insertion sequences (ISs) are approximately 1 kbp long "jumping" genes found in prokaryotes. ISs encode the protein Transposase, which facilitates the excision and reinsertion of ISs in genomes, making these sequences a type of class I ("cut-and-paste") Mobile Genetic Elements. ISs are proposed to be involve...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1386-7

    authors: Vigil-Stenman T,Larsson J,Nylander JA,Bergman B

    更新日期:2015-03-17 00:00:00

  • Uncovering molecular events associated with the chemosuppressive effects of flaxseed: a microarray analysis of the laying hen model of ovarian cancer.

    abstract:BACKGROUND:The laying hen model of spontaneous epithelial ovarian cancer (EOC) is unique in that it is the only model that enables observations of early events in disease progression and is therefore also uniquely suited for chemoprevention trials. Previous studies on the effect of dietary flaxseed in laying hens have ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-709

    authors: Hales KH,Speckman SC,Kurrey NK,Hales DB

    更新日期:2014-08-24 00:00:00

  • Delineation of condition specific Cis- and Trans-acting elements in plant promoters under various Endo- and exogenous stimuli.

    abstract:BACKGROUND:Transcription factors (TFs) play essential roles during plant development and response to environmental stresses. However, the relationships among transcription factors, cis-acting elements and target gene expression under endo- and exogenous stimuli have not been systematically characterized. RESULTS:Here,...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4469-4

    authors: Chow CN,Chiang-Hsieh YF,Chien CH,Zheng HQ,Lee TY,Wu NY,Tseng KC,Hou PF,Chang WC

    更新日期:2018-05-09 00:00:00

  • De novo assembly, characterization and functional annotation of Senegalese sole (Solea senegalensis) and common sole (Solea solea) transcriptomes: integration in a database and design of a microarray.

    abstract:BACKGROUND:Senegalese sole (Solea senegalensis) and common sole (S. solea) are two economically and evolutionary important flatfish species both in fisheries and aquaculture. Although some genomic resources and tools were recently described in these species, further sequencing efforts are required to establish a comple...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-952

    authors: Benzekri H,Armesto P,Cousin X,Rovira M,Crespo D,Merlo MA,Mazurais D,Bautista R,Guerrero-Fernández D,Fernandez-Pozo N,Ponce M,Infante C,Zambonino JL,Nidelet S,Gut M,Rebordinos L,Planas JV,Bégout ML,Claros MG,Manchado

    更新日期:2014-11-03 00:00:00

  • A graph-theoretic approach for classification and structure prediction of transmembrane β-barrel proteins.

    abstract:BACKGROUND:Transmembrane β-barrel proteins are a special class of transmembrane proteins which play several key roles in human body and diseases. Due to experimental difficulties, the number of transmembrane β-barrel proteins with known structures is very small. Over the years, a number of learning-based methods have b...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-S2-S5

    authors: Tran Vdu T,Chassignet P,Sheikh S,Steyaert JM

    更新日期:2012-04-12 00:00:00

  • The I2 resistance gene homologues in Solanum have complex evolutionary patterns and are targeted by miRNAs.

    abstract:BACKGROUND:Several resistance traits, including the I2 resistance against tomato fusarium wilt, were mapped to the long arm of chromosome 11 of Solanum. However, the structure and evolution of this locus remain poorly understood. RESULTS:Comparative analysis showed that the structure and evolutionary patterns of the I...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-743

    authors: Wei C,Kuang H,Li F,Chen J

    更新日期:2014-08-30 00:00:00