Abstract:
BACKGROUND:RNA-seq has shown huge potential for phylogenomic inferences in non-model organisms. However, error, incompleteness, and redundant assembled transcripts for each gene in de novo assembly of short reads cause noise in analyses and a large amount of missing data in the aligned matrix. To address these problems, we compare de novo assemblies of paired end 90 bp RNA-seq reads using Oases, Trinity, Trans-ABySS and SOAPdenovo-Trans to transcripts from genome annotation of the model plant Ricinus communis. By doing so we evaluate strategies for optimizing total gene coverage and minimizing assembly chimeras and redundancy. RESULTS:We found that the frequency and structure of chimeras vary dramatically among different software packages. The differences were largely due to the number of trans-self chimeras that contain repeats in the opposite direction. More than half of the total chimeras in Oases and Trinity were trans-self chimeras. Within each package, we found a trade-off between maximizing reference coverage and minimizing redundancy and chimera rate. In order to reduce redundancy, we investigated three methods: 1) using cap3 and CD-HIT-EST to combine highly similar transcripts, 2) only retaining the transcript with the highest read coverage, or removing the transcript with the lowest read coverage for each subcomponent in Trinity, and 3) filtering Oases single k-mer assemblies by number of transcripts per locus and relative transcript length, and then finding the transcript with the highest read coverage. We then utilized results from blastx against model protein sequences to effectively remove trans chimeras. After optimization, seven assembly strategies among all four packages successfully assembled 42.9-47.1% of reference genes to more than 200 bp, with a chimera rate of 0.92-2.21%, and on average 1.8-3.1 transcripts per reference gene assembled. CONCLUSIONS:With rapidly improving sequencing and assembly tools, our study provides a framework to benchmark and optimize performance before choosing tools or parameter combinations for analyzing short-read RNA-seq data. Our study demonstrates that choice of assembly package, k-mer sizes, post-assembly redundancy-reduction and chimera cleanup, and strand-specific RNA-seq library preparation and assembly dramatically improves gene coverage by non-redundant and non-chimeric transcripts that are optimized for downstream phylogenomic analyses.
journal_name
BMC Genomicsjournal_title
BMC genomicsauthors
Yang Y,Smith SAdoi
10.1186/1471-2164-14-328subject
Has Abstractpub_date
2013-05-14 00:00:00pages
328issn
1471-2164pii
1471-2164-14-328journal_volume
14pub_type
杂志文章相关文献
BMC GENOMICS文献大全abstract:BACKGROUND:Previous studies of individual genes have shown that in a self-enforcing way, dimethylation at histone 3 lysine 9 (dimethyl-H3K9) and DNA methylation cooperate to maintain a repressive mode of inactive genes. Less clear is whether this cooperation is generalized in mammalian genomes, such as mouse genome. He...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-8-131
更新日期:2007-05-24 00:00:00
abstract:BACKGROUND:Bloodstream malaria parasites require Ca++ for their development, but the sites and mechanisms of Ca++ utilization are not well understood. We hypothesized that there may be differences in Ca++ uptake or utilization by genetically distinct lines of P. falciparum. These differences, if identified, may provide...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-5418-y
更新日期:2019-01-16 00:00:00
abstract:BACKGROUND:The sRNAs of bacterial pathogens are known to be involved in various cellular roles including environmental adaptation as well as regulation of virulence and pathogenicity. It is expected that sRNAs may also have similar functions for Burkholderia pseudomallei, a soil bacterium that can adapt to diverse envi...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-13-S7-S13
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:Housekeeping (HK) genes are ubiquitously expressed in all tissue/cell types and constitute a basal transcriptome for the maintenance of basic cellular functions. Partitioning transcriptomes into HK and tissue-specific (TS) genes relatively is fundamental for studying gene expression and cellular differentiat...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-9-172
更新日期:2008-04-16 00:00:00
abstract:BACKGROUND:Despite its relevance, almost no studies account for the genetic control in the early stages of tree development, i.e. from germination on. This study seeks to make a quite complete transcriptome for olive development and to elucidate the dynamic regulation of the transcriptomic response during the early-juv...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-5232-6
更新日期:2018-11-19 00:00:00
abstract:BACKGROUND:In many eukaryotes, microRNAs (miRNAs) bind to complementary sites in the 3'-untranslated regions (3'-UTRs) of target messenger RNAs (mRNAs) and regulate their expression at the stage of translation. Recent studies have revealed that many miRNAs are evolutionarily conserved; however, the evolution of their t...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-11-101
更新日期:2010-02-09 00:00:00
abstract:BACKGROUND:Histone H3 lysine 4 tri-methylation (H3K4me3) and histone H3 lysine 9 tri-methylation (H3K9me3) are widely perceived to be opposing and often mutually exclusive chromatin modifications. However, both are needed for certain light-activated genes in Neurospora crassa (Neurospora), including frequency (frq) and...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-019-5729-7
更新日期:2019-05-08 00:00:00
abstract:BACKGROUND:Drosophila melanogaster females show changes in behavior and physiology after mating that are thought to maximize the number of progeny resulting from the most recent copulation. Sperm and seminal fluid proteins induce post-mating changes in females, however, very little is known about the resulting gene exp...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-11-541
更新日期:2010-10-06 00:00:00
abstract:BACKGROUND:Genotyping-by-sequencing (GBS) is becoming an attractive alternative to array-based methods for genotyping individuals for a large number of single nucleotide polymorphisms (SNPs). Costs can be lowered by reducing the mean sequencing depth, but this results in genotype calls of lower quality. A common analys...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-015-2252-3
更新日期:2015-12-09 00:00:00
abstract:BACKGROUND:Secondary structure in the target is a property not usually considered in software applications for design of optimal custom oligonucleotide probes. It is frequently assumed that eliminating self-complementarity, or screening for secondary structure in the probe, is sufficient to avoid interference with hybr...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-6-31
更新日期:2005-03-08 00:00:00
abstract:BACKGROUND:Mytilisepta virgata is a marine mussel commonly found along the coasts of Japan. Although this species has been the subject of occasional studies concerning its ecological role, growth and reproduction, it has been so far almost completely neglected from a genetic and molecular point of view. In the present ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-017-4012-z
更新日期:2017-08-08 00:00:00
abstract:BACKGROUND:Gene expression variation is a key underlying factor influencing phenotypic variation, and can occur via cis- or trans-regulation. To understand the role of cis- and trans-regulatory variation on population divergence in chicken, we developed reciprocal crosses of two chicken breeds, White Leghorn and Cornis...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-019-6342-5
更新日期:2019-12-05 00:00:00
abstract:BACKGROUND:Pectobacterium spp. are necrotrophic bacterial plant pathogens of the family Pectobacteriaceae, responsible for a wide spectrum of diseases of important crops and ornamental plants including soft rot, blackleg, and stem wilt. P. carotovorum is a genetically heterogeneous species consisting of three valid sub...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-5269-6
更新日期:2018-12-07 00:00:00
abstract:BACKGROUND:Magnesium (Mg)-deficiency is frequently observed in Citrus plantations and is responsible for the loss of productivity and poor fruit quality. Knowledge on the effects of Mg-deficiency on upstream targets is scarce. Seedlings of 'Xuegan' [Citrus sinensis (L.) Osbeck] were irrigated with Mg-deficient (0 mM Mg...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-015-1462-z
更新日期:2015-03-31 00:00:00
abstract:BACKGROUND:Musa species (Zingiberaceae, Zingiberales) including bananas and plantains are collectively the fourth most important crop in developing countries. Knowledge concerning Musa genome structure and the origin of distinct cultivars has greatly increased over the last few years. Until now, however, no large-scale...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-9-58
更新日期:2008-01-30 00:00:00
abstract:BACKGROUND:Mitochondrial genomes of flowering plants (angiosperms) are highly dynamic in genome structure. The mitogenome of the earliest angiosperm Amborella is remarkable in carrying rampant foreign DNAs, in contrast to Liriodendron, the other only known early angiosperm mitogenome that is described as 'fossilized'. ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-4991-4
更新日期:2018-08-14 00:00:00
abstract::An amendment to this paper has been published and can be accessed via the original article. ...
journal_title:BMC genomics
pub_type: 已发布勘误
doi:10.1186/s12864-020-07211-8
更新日期:2020-11-12 00:00:00
abstract:BACKGROUND:A key developmental transformation in the life of all vertebrates is the transition to sexual maturity, whereby individuals are capable of reproducing for the first time. In the farming of Atlantic salmon, early maturation prior to harvest size has serious negative production impacts. RESULTS:We report geno...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-019-5525-4
更新日期:2019-02-15 00:00:00
abstract:BACKGROUND:Tumor angiogenesis is a highly regulated process involving intercellular communication as well as the interactions of multiple downstream signal transduction pathways. Disrupting one or even a few angiogenesis pathways is often insufficient to achieve sustained therapeutic benefits due to the complexity of a...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-9-264
更新日期:2008-06-02 00:00:00
abstract:BACKGROUND:Rheumatoid arthritis (RA) is a chronic autoimmune disease characterized by inflammation and destruction of synovial joints. RA affects up to 1 % of the population worldwide. Currently, there are no drugs that can cure RA or achieve sustained remission. The unknown cause of the disease represents a significan...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-016-2910-0
更新日期:2016-08-22 00:00:00
abstract:BACKGROUND:Fitness epistasis, the interaction effect of genes at different loci on fitness, makes an important contribution to adaptive evolution. Although fitness interaction evidence has been observed in model organisms, it is more difficult to detect and remains poorly understood in human populations as a result of ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-020-06874-7
更新日期:2020-07-11 00:00:00
abstract:BACKGROUND:Both male and female pigeons have the ability to produce a nutrient solution in their crop for the nourishment of their young. The production of the nutrient solution has been likened to lactation in mammals, and hence the product has been called pigeon 'milk'. It has been shown that pigeon 'milk' is essenti...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-12-452
更新日期:2011-09-19 00:00:00
abstract:BACKGROUND:Regions of the genome that are under evolutionary constraint across multiple species have previously been used to identify functional sequences in the human genome. Furthermore, it is known that there is an inverse relationship between evolutionary constraint and the allele frequency of a mutation segregatin...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-14-495
更新日期:2013-07-23 00:00:00
abstract:BACKGROUND:An underlying tenet of the epigenetic code hypothesis is the existence of protein domains that can recognize various chromatin structures. To date, two major candidates have emerged: (i) the bromodomain, which can recognize certain acetylation marks and (ii) the chromodomain, which can recognize certain meth...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-7-6
更新日期:2006-01-16 00:00:00
abstract:BACKGROUND:Insertion sequences (ISs) are approximately 1 kbp long "jumping" genes found in prokaryotes. ISs encode the protein Transposase, which facilitates the excision and reinsertion of ISs in genomes, making these sequences a type of class I ("cut-and-paste") Mobile Genetic Elements. ISs are proposed to be involve...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-015-1386-7
更新日期:2015-03-17 00:00:00
abstract:BACKGROUND:The laying hen model of spontaneous epithelial ovarian cancer (EOC) is unique in that it is the only model that enables observations of early events in disease progression and is therefore also uniquely suited for chemoprevention trials. Previous studies on the effect of dietary flaxseed in laying hens have ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-709
更新日期:2014-08-24 00:00:00
abstract:BACKGROUND:Transcription factors (TFs) play essential roles during plant development and response to environmental stresses. However, the relationships among transcription factors, cis-acting elements and target gene expression under endo- and exogenous stimuli have not been systematically characterized. RESULTS:Here,...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-4469-4
更新日期:2018-05-09 00:00:00
abstract:BACKGROUND:Senegalese sole (Solea senegalensis) and common sole (S. solea) are two economically and evolutionary important flatfish species both in fisheries and aquaculture. Although some genomic resources and tools were recently described in these species, further sequencing efforts are required to establish a comple...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-952
更新日期:2014-11-03 00:00:00
abstract:BACKGROUND:Transmembrane β-barrel proteins are a special class of transmembrane proteins which play several key roles in human body and diseases. Due to experimental difficulties, the number of transmembrane β-barrel proteins with known structures is very small. Over the years, a number of learning-based methods have b...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-13-S2-S5
更新日期:2012-04-12 00:00:00
abstract:BACKGROUND:Several resistance traits, including the I2 resistance against tomato fusarium wilt, were mapped to the long arm of chromosome 11 of Solanum. However, the structure and evolution of this locus remain poorly understood. RESULTS:Comparative analysis showed that the structure and evolutionary patterns of the I...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-743
更新日期:2014-08-30 00:00:00