Comparative performance of transcriptome assembly methods for non-model organisms.

Abstract:

BACKGROUND:The technological revolution in next-generation sequencing has brought unprecedented opportunities to study any organism of interest at the genomic or transcriptomic level. Transcriptome assembly is a crucial first step for studying the molecular basis of phenotypes of interest using RNA-Sequencing (RNA-Seq). However, the optimal strategy for assembling vast amounts of short RNA-Seq reads remains unresolved, especially for organisms without a sequenced genome. This study compared four transcriptome assembly methods, including a widely used de novo assembler (Trinity), two transcriptome re-assembly strategies utilizing proteomic and genomic resources from closely related species (reference-based re-assembly and TransPS) and a genome-guided assembler (Cufflinks). RESULTS:These four assembly strategies were compared using a comprehensive transcriptomic database of Aedes albopictus, for which a genome sequence has recently been completed. The quality of the various assemblies was assessed by the number of contigs generated, contig length distribution, percent paired-end read mapping, and gene model representation via BLASTX. Our results reveal that de novo assembly generates a similar number of gene models relative to genome-guided assembly with a fragmented reference, but produces the highest level of redundancy and requires the most computational power. Using a closely related reference genome to guide transcriptome assembly can generate biased contig sequences. Increasing the number of reads used in the transcriptome assembly tends to increase the redundancy within the assembly and decrease both median contig length and percent identity between contigs and reference protein sequences. CONCLUSIONS:This study provides general guidance for transcriptome assembly of RNA-Seq data from organisms with or without a sequenced genome. The optimal transcriptome assembly strategy will depend upon the subsequent downstream analyses. However, our results emphasize the efficacy of de novo assembly, which can be as effective as genome-guided assembly when the reference genome assembly is fragmented. If a genome assembly and sufficient computational resources are available, it can be beneficial to combine de novo and genome-guided assemblies. Caution should be taken when using a closely related reference genome to guide transcriptome assembly. The quantity of read pairs used in the transcriptome assembly does not necessarily correlate with the quality of the assembly.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Huang X,Chen XG,Armbruster PA

doi

10.1186/s12864-016-2923-8

subject

Has Abstract

pub_date

2016-07-27 00:00:00

pages

523

issn

1471-2164

pii

10.1186/s12864-016-2923-8

journal_volume

17

pub_type

杂志文章
  • In silico secretome analysis approach for next generation sequencing transcriptomic data.

    abstract:BACKGROUND:Excretory/secretory proteins (ESPs) play a major role in parasitic infection as they are present at the host-parasite interface and regulate host immune system. In case of parasitic helminths, transcriptomics has been used extensively to understand the molecular basis of parasitism and for developing novel t...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-S3-S14

    authors: Garg G,Ranganathan S

    更新日期:2011-11-30 00:00:00

  • Anthocyanin biosynthetic genes in Brassica rapa.

    abstract:BACKGROUND:Anthocyanins are a group of flavonoid compounds. As a group of important secondary metabolites, they perform several key biological functions in plants. Anthocyanins also play beneficial health roles as potentially protective factors against cancer and heart disease. To elucidate the anthocyanin biosynthetic...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-426

    authors: Guo N,Cheng F,Wu J,Liu B,Zheng S,Liang J,Wang X

    更新日期:2014-06-04 00:00:00

  • Histological and global gene expression analysis of the 'lactating' pigeon crop.

    abstract:BACKGROUND:Both male and female pigeons have the ability to produce a nutrient solution in their crop for the nourishment of their young. The production of the nutrient solution has been likened to lactation in mammals, and hence the product has been called pigeon 'milk'. It has been shown that pigeon 'milk' is essenti...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-452

    authors: Gillespie MJ,Haring VR,McColl KA,Monaghan P,Donald JA,Nicholas KR,Moore RJ,Crowley TM

    更新日期:2011-09-19 00:00:00

  • Treatment-independent miRNA signature in blood of Wilms tumor patients.

    abstract:BACKGROUND:Blood-born miRNA signatures have recently been reported for various tumor diseases. Here, we compared the miRNA signature in Wilms tumor patients prior and after preoperative chemotherapy according to SIOP protocol 2001. RESULTS:We did not find a significant difference between miRNA signature of both groups...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-379

    authors: Schmitt J,Backes C,Nourkami-Tutdibi N,Leidinger P,Deutscher S,Beier M,Gessler M,Graf N,Lenhof HP,Keller A,Meese E

    更新日期:2012-08-07 00:00:00

  • Subtelomere organization in the genome of the microsporidian Encephalitozoon cuniculi: patterns of repeated sequences and physicochemical signatures.

    abstract:BACKGROUND:The microsporidian Encephalitozoon cuniculi is an obligate intracellular eukaryotic pathogen with a small nuclear genome (2.9 Mbp) consisting of 11 chromosomes. Although each chromosome end is known to contain a single rDNA unit, the incomplete assembly of subtelomeric regions following sequencing of the gen...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1920-7

    authors: Dia N,Lavie L,Faye N,Méténier G,Yeramian E,Duroure C,Toguebaye BS,Frutos R,Niang MN,Vivarès CP,Ben Mamoun C,Cornillot E

    更新日期:2016-01-07 00:00:00

  • PLAIDOH: a novel method for functional prediction of long non-coding RNAs identifies cancer-specific LncRNA activities.

    abstract:BACKGROUND:Long non-coding RNAs (lncRNAs) exhibit remarkable cell-type specificity and disease association. LncRNA's functional versatility includes epigenetic modification, nuclear domain organization, transcriptional control, regulation of RNA splicing and translation, and modulation of protein activity. However, mos...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5497-4

    authors: Pyfrom SC,Luo H,Payton JE

    更新日期:2019-02-15 00:00:00

  • Relating past and present diet to phenotypic and transcriptomic variation in the fruit fly.

    abstract:BACKGROUND:Sub-optimal developmental diets often have adverse effects on long-term fitness and health. One hypothesis is that such effects are caused by mismatches between the developmental and adult environment, and may be mediated by persistent changes in gene expression. However, there are few experimental tests of ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-3968-z

    authors: May CM,Zwaan BJ

    更新日期:2017-08-22 00:00:00

  • PNAC: a protein nucleolar association classifier.

    abstract:BACKGROUND:Although primarily known as the site of ribosome subunit production, the nucleolus is involved in numerous and diverse cellular processes. Recent large-scale proteomics projects have identified thousands of human proteins that associate with the nucleolus. However, in most cases, we know neither the fraction...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-74

    authors: Scott MS,Boisvert FM,Lamond AI,Barton GJ

    更新日期:2011-01-27 00:00:00

  • Estimating the total genome length of a metagenomic sample using k-mers.

    abstract:BACKGROUND:Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5467-x

    authors: Hua K,Zhang X

    更新日期:2019-04-04 00:00:00

  • NovelFam3000--uncharacterized human protein domains conserved across model organisms.

    abstract:BACKGROUND:Despite significant efforts from the research community, an extensive portion of the proteins encoded by human genes lack an assigned cellular function. Most metazoan proteins are composed of structural and/or functional domains, of which many appear in multiple proteins. Once a domain is characterized in on...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-7-48

    authors: Kemmer D,Podowski RM,Arenillas D,Lim J,Hodges E,Roth P,Sonnhammer EL,Höög C,Wasserman WW

    更新日期:2006-03-13 00:00:00

  • Evaluation and optimisation of indel detection workflows for ion torrent sequencing of the BRCA1 and BRCA2 genes.

    abstract:BACKGROUND:The Ion Torrent PGM is a popular benchtop sequencer that shows promise in replacing conventional Sanger sequencing as the gold standard for mutation detection. Despite the PGM's reported high accuracy in calling single nucleotide variations, it tends to generate many false positive calls in detecting inserti...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-516

    authors: Yeo ZX,Wong JC,Rozen SG,Lee AS

    更新日期:2014-06-24 00:00:00

  • The missing link: Bordetella petrii is endowed with both the metabolic versatility of environmental bacteria and virulence traits of pathogenic Bordetellae.

    abstract:BACKGROUND:Bordetella petrii is the only environmental species hitherto found among the otherwise host-restricted and pathogenic members of the genus Bordetella. Phylogenetically, it connects the pathogenic Bordetellae and environmental bacteria of the genera Achromobacter and Alcaligenes, which are opportunistic patho...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-449

    authors: Gross R,Guzman CA,Sebaihia M,dos Santos VA,Pieper DH,Koebnik R,Lechner M,Bartels D,Buhrmester J,Choudhuri JV,Ebensen T,Gaigalat L,Herrmann S,Khachane AN,Larisch C,Link S,Linke B,Meyer F,Mormann S,Nakunst D,Rückert

    更新日期:2008-09-30 00:00:00

  • The intestinal microbiome of fish under starvation.

    abstract:BACKGROUND:Starvation not only affects the nutritional and health status of the animals, but also the microbial composition in the host's intestine. Next-generation sequencing provides a unique opportunity to explore gut microbial communities and their interactions with hosts. However, studies on gut microbiomes have b...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-266

    authors: Xia JH,Lin G,Fu GH,Wan ZY,Lee M,Wang L,Liu XJ,Yue GH

    更新日期:2014-04-05 00:00:00

  • Genome-wide host responses against infectious laryngotracheitis virus vaccine infection in chicken embryo lung cells.

    abstract:BACKGROUND:Infectious laryngotracheitis virus (ILTV; gallid herpesvirus 1) infection causes high mortality and huge economic losses in the poultry industry. To protect chickens against ILTV infection, chicken-embryo origin (CEO) and tissue-culture origin (TCO) vaccines have been used. However, the transmission of vacci...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-143

    authors: Lee J,Bottje WG,Kong BW

    更新日期:2012-04-24 00:00:00

  • A genomic perspective on the potential of Actinobacillus succinogenes for industrial succinate production.

    abstract:BACKGROUND:Succinate is produced petrochemically from maleic anhydride to satisfy a small specialty chemical market. If succinate could be produced fermentatively at a price competitive with that of maleic anhydride, though, it could replace maleic anhydride as the precursor of many bulk chemicals, transforming a multi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-680

    authors: McKinlay JB,Laivenieks M,Schindler BD,McKinlay AA,Siddaramappa S,Challacombe JF,Lowry SR,Clum A,Lapidus AL,Burkhart KB,Harkins V,Vieille C

    更新日期:2010-11-30 00:00:00

  • ESAP plus: a web-based server for EST-SSR marker development.

    abstract:BACKGROUND:Simple sequence repeats (SSRs) have become widely used as molecular markers in plant genetic studies due to their abundance, high allelic variation at each locus and simplicity to analyze using conventional PCR amplification. To study plants with unknown genome sequence, SSR markers from Expressed Sequence T...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3328-4

    authors: Ponyared P,Ponsawat J,Tongsima S,Seresangtakul P,Akkasaeng C,Tantisuwichwong N

    更新日期:2016-12-22 00:00:00

  • MicroRNA modulate alveolar epithelial response to cyclic stretch.

    abstract:BACKGROUND:MicroRNAs (miRNAs) are post-transcriptional regulators of gene expression implicated in multiple cellular processes. Cyclic stretch of alveoli is characteristic of mechanical ventilation, and is postulated to be partly responsible for the lung injury and inflammation in ventilator-induced lung injury. We pro...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-154

    authors: Yehya N,Yerrapureddy A,Tobias J,Margulies SS

    更新日期:2012-04-26 00:00:00

  • Single Cell Explorer, collaboration-driven tools to leverage large-scale single cell RNA-seq data.

    abstract:BACKGROUND:Single cell transcriptome sequencing has become an increasingly valuable technology for dissecting complex biology at a resolution impossible with bulk sequencing. However, the gap between the technical expertise required to effectively work with the resultant high dimensional data and the biological experti...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6053-y

    authors: Feng D,Whitehurst CE,Shan D,Hill JD,Yue YG

    更新日期:2019-08-27 00:00:00

  • Characterization of the transcriptome of an ecologically important avian species, the Vinous-throated Parrotbill Paradoxornis webbianus bulomachus (Paradoxornithidae; Aves).

    abstract:BACKGROUND:Adaptive divergence driven by environmental heterogeneity has long been a fascinating topic in ecology and evolutionary biology. The study of the genetic basis of adaptive divergence has, however, been greatly hampered by a lack of genomic information. The recent development of transcriptome sequencing provi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-149

    authors: Chu JH,Lin RC,Yeh CF,Hsu YC,Li SH

    更新日期:2012-04-24 00:00:00

  • Celiac disease T-cell epitopes from gamma-gliadins: immunoreactivity depends on the genome of origin, transcript frequency, and flanking protein variation.

    abstract:BACKGROUND:Celiac disease (CD) is caused by an uncontrolled immune response to gluten, a heterogeneous mixture of wheat storage proteins. The CD-toxicity of these proteins and their derived peptides is depending on the presence of specific T-cell epitopes (9-mer peptides; CD epitopes) that mediate the stimulation of HL...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-277

    authors: Salentijn EM,Mitea DC,Goryunova SV,van der Meer IM,Padioleau I,Gilissen LJ,Koning F,Smulders MJ

    更新日期:2012-06-22 00:00:00

  • Comparative genomics of downy mildews reveals potential adaptations to biotrophy.

    abstract:BACKGROUND:Spinach downy mildew caused by the oomycete Peronospora effusa is a significant burden on the expanding spinach production industry, especially for organic farms where synthetic fungicides cannot be deployed to control the pathogen. P. effusa is highly variable and 15 new races have been recognized in the pa...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5214-8

    authors: Fletcher K,Klosterman SJ,Derevnina L,Martin F,Bertier LD,Koike S,Reyes-Chin-Wo S,Mou B,Michelmore R

    更新日期:2018-11-29 00:00:00

  • Polyphenism in social insects: insights from a transcriptome-wide analysis of gene expression in the life stages of the key pollinator, Bombus terrestris.

    abstract:BACKGROUND:Understanding polyphenism, the ability of a single genome to express multiple morphologically and behaviourally distinct phenotypes, is an important goal for evolutionary and developmental biology. Polyphenism has been key to the evolution of the Hymenoptera, and particularly the social Hymenoptera where the...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-623

    authors: Colgan TJ,Carolan JC,Bridgett SJ,Sumner S,Blaxter ML,Brown MJ

    更新日期:2011-12-20 00:00:00

  • A systems-based approach to analyse the host response in murine lung macrophages challenged with respiratory syncytial virus.

    abstract:BACKGROUND:Respiratory syncytial virus (RSV) is an important cause of lower respiratory tract infection in young children. The degree of disease severity is determined by the host response to infection. Lung macrophages play an important early role in the host response to infection and we have used a systems-based appr...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-190

    authors: Ravi LI,Li L,Sutejo R,Chen H,Wong PS,Tan BH,Sugrue RJ

    更新日期:2013-03-18 00:00:00

  • A network-based integrative approach to prioritize reliable hits from multiple genome-wide RNAi screens in Drosophila.

    abstract:BACKGROUND:The recently developed RNA interference (RNAi) technology has created an unprecedented opportunity which allows the function of individual genes in whole organisms or cell lines to be interrogated at genome-wide scale. However, multiple issues, such as off-target effects or low efficacies in knocking down ce...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-220

    authors: Wang L,Tu Z,Sun F

    更新日期:2009-05-12 00:00:00

  • Evolutionary engineering of a wine yeast strain revealed a key role of inositol and mannoprotein metabolism during low-temperature fermentation.

    abstract:BACKGROUND:Wine produced at low temperature is often considered to improve sensory qualities. However, there are certain drawbacks to low temperature fermentations: e.g. low growth rate, long lag phase, and sluggish or stuck fermentations. Selection and development of new Saccharomyces cerevisiae strains well adapted a...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1755-2

    authors: López-Malo M,García-Rios E,Melgar B,Sanchez MR,Dunham MJ,Guillamón JM

    更新日期:2015-07-22 00:00:00

  • Transcriptome analysis of Sacha Inchi (Plukenetia volubilis L.) seeds at two developmental stages.

    abstract:BACKGROUND:Sacha Inchi (Plukenetia volubilis L., Euphorbiaceae) is a potential oilseed crop because the seeds of this plant are rich in unsaturated fatty acids (FAs). In particular, the fatty acid composition of its seed oil differs markedly in containing large quantities of α-linolenic acid (18C:3, a kind of ω-3 FAs)....

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-716

    authors: Wang X,Xu R,Wang R,Liu A

    更新日期:2012-12-20 00:00:00

  • Origin and fate of pseudogenes in Hemiascomycetes: a comparative analysis.

    abstract:BACKGROUND:Pseudogenes are ubiquitous genetic elements that derive from functional genes after mutational inactivation. Characterization of pseudogenes is important to understand genome dynamics and evolution, and its significance increases when several genomes of related organisms can be compared. Among yeasts, only t...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-260

    authors: Lafontaine I,Dujon B

    更新日期:2010-04-22 00:00:00

  • Association of the matrix attachment region recognition signature with coding regions in Caenorhabditis elegans.

    abstract:BACKGROUND:Matrix attachment regions (MAR) are the sites on genomic DNA that interact with the nuclear matrix. There is increasing evidence for the involvement of MAR in regulation of gene expression. The unsuitability of experimental detection of MAR for genome-wide analyses has led to the development of computational...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-418

    authors: Anthony A,Blaxter M

    更新日期:2007-11-15 00:00:00

  • Integrative genomic and functional profiling of the pancreatic cancer genome.

    abstract:BACKGROUND:Pancreatic cancer is a deadly disease with a five-year survival of less than 5%. A better understanding of the underlying biology may suggest novel therapeutic targets. Recent surveys of the pancreatic cancer genome have uncovered numerous new alterations; yet systematic functional characterization of candid...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-624

    authors: Shain AH,Salari K,Giacomini CP,Pollack JR

    更新日期:2013-09-16 00:00:00

  • A graph-theoretic approach for classification and structure prediction of transmembrane β-barrel proteins.

    abstract:BACKGROUND:Transmembrane β-barrel proteins are a special class of transmembrane proteins which play several key roles in human body and diseases. Due to experimental difficulties, the number of transmembrane β-barrel proteins with known structures is very small. Over the years, a number of learning-based methods have b...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-S2-S5

    authors: Tran Vdu T,Chassignet P,Sheikh S,Steyaert JM

    更新日期:2012-04-12 00:00:00