A comprehensive evaluation of ensembl, RefSeq, and UCSC annotations in the context of RNA-seq read mapping and gene quantification.

Abstract:

BACKGROUND:RNA-Seq has become increasingly popular in transcriptome profiling. One aspect of transcriptome research is to quantify the expression levels of genomic elements, such as genes, their transcripts and exons. Acquiring a transcriptome expression profile requires genomic elements to be defined in the context of the genome. Multiple human genome annotation databases exist, including RefGene (RefSeq Gene), Ensembl, and the UCSC annotation database. The impact of the choice of an annotation on estimating gene expression remains insufficiently investigated. RESULTS:In this paper, we systematically characterized the impact of genome annotation choice on read mapping and transcriptome quantification by analyzing a RNA-Seq dataset generated by the Human Body Map 2.0 Project. The impact of a gene model on mapping of non-junction reads is different from junction reads. For the RNA-Seq dataset with a read length of 75 bp, on average, 95% of non-junction reads were mapped to exactly the same genomic location regardless of which gene models was used. By contrast, this percentage dropped to 53% for junction reads. In addition, about 30% of junction reads failed to align without the assistance of a gene model, while 10-15% mapped alternatively. There are 21,958 common genes among RefGene, Ensembl, and UCSC annotations. When we compared the gene quantification results in RefGene and Ensembl annotations, 20% of genes are not expressed, and thus have a zero count in both annotations. Surprisingly, identical gene quantification results were obtained for only 16.3% (about one sixth) of genes. Approximately 28.1% of genes' expression levels differed by 5% or higher, and of those, the relative expression levels for 9.3% of genes (equivalent to 2038) differed by 50% or greater. The case studies revealed that the gene definition differences in gene models frequently result in inconsistency in gene quantification. CONCLUSIONS:We demonstrated that the choice of a gene model has a dramatic effect on both gene quantification and differential analysis. Our research will help RNA-Seq data analysts to make an informed choice of gene model in practical RNA-Seq data analysis.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Zhao S,Zhang B

doi

10.1186/s12864-015-1308-8

subject

Has Abstract

pub_date

2015-02-18 00:00:00

pages

97

issn

1471-2164

pii

s12864-015-1308-8

journal_volume

16

pub_type

杂志文章
  • Evidence of uneven selective pressure on different subsets of the conserved human genome; implications for the significance of intronic and intergenic DNA.

    abstract:BACKGROUND:Human genetic variation produces the wide range of phenotypic differences that make us individual. However, little is known about the distribution of variation in the most conserved functional regions of the human genome. We examined whether different subsets of the conserved human genome have been subjected...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-614

    authors: Davidson S,Starkey A,MacKenzie A

    更新日期:2009-12-16 00:00:00

  • Single-cell transcriptomics using spliced leader PCR: Evidence for multiple losses of photosynthesis in polykrikoid dinoflagellates.

    abstract:BACKGROUND:Most microbial eukaryotes are uncultivated and thus poorly suited to standard genomic techniques. This is the case for Polykrikos lebouriae, a dinoflagellate with ultrastructurally aberrant plastids. It has been suggested that these plastids stem from a novel symbiosis with either a diatom or haptophyte, but...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1636-8

    authors: Gavelis GS,White RA,Suttle CA,Keeling PJ,Leander BS

    更新日期:2015-07-17 00:00:00

  • Transcriptome profiling provides insights into dormancy release during cold storage of Lilium pumilum.

    abstract:BACKGROUND:Bulbs of the ornamental flower Lilium pumilum enter a period of dormancy after flowering in spring, and require exposure to cold for a period of time in order to release dormancy. Previous studies focused mainly on anatomical, physiological and biochemical changes during dormancy release. There are no dorman...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4536-x

    authors: Wang W,Su X,Tian Z,Liu Y,Zhou Y,He M

    更新日期:2018-03-14 00:00:00

  • Expanding dynamics of the virulence-related gene variations in the toxigenic Vibrio cholerae serogroup O1.

    abstract:BACKGROUND:Toxigenic Vibrio cholerae serogroup O1 is the causative pathogen in the sixth and seventh cholera pandemics. Cholera toxin is the major virulent factor but other virulence and virulence-related factors play certain roles in the pathogenesis and survival in the host. Along with the evolution of the epidemic s...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5725-y

    authors: Li Z,Pang B,Wang D,Li J,Xu J,Fang Y,Lu X,Kan B

    更新日期:2019-05-09 00:00:00

  • DNA methylation regulates discrimination of enhancers from promoters through a H3K4me1-H3K4me3 seesaw mechanism.

    abstract:BACKGROUND:DNA methylation at promoters is largely correlated with inhibition of gene expression. However, the role of DNA methylation at enhancers is not fully understood, although a crosstalk with chromatin marks is expected. Actually, there exist contradictory reports about positive and negative correlations between...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4353-7

    authors: Sharifi-Zarchi A,Gerovska D,Adachi K,Totonchi M,Pezeshk H,Taft RJ,Schöler HR,Chitsaz H,Sadeghi M,Baharvand H,Araúzo-Bravo MJ

    更新日期:2017-12-12 00:00:00

  • The landscape of mitochondrial small non-coding RNAs in the PGCs of male mice, spermatogonia, gametes and in zygotes.

    abstract:BACKGROUND:Mitochondria are organelles that fulfill a fundamental role in cell bioenergetics, as well as in other processes like cell signaling and death. Small non-coding RNAs (sncRNA) are now being considered as pivotal post-transcriptional regulators, widening the landscape of their diversity and functions. In mamma...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5020-3

    authors: Larriba E,Rial E,Del Mazo J

    更新日期:2018-08-28 00:00:00

  • Identification of small RNAs in Francisella tularensis.

    abstract:BACKGROUND:Regulation of bacterial gene expression by small RNAs (sRNAs) have proved to be important for many biological processes. Francisella tularensis is a highly pathogenic Gram-negative bacterium that causes the disease tularaemia in humans and animals. Relatively little is known about the regulatory networks exi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-625

    authors: Postic G,Frapy E,Dupuis M,Dubail I,Livny J,Charbit A,Meibom KL

    更新日期:2010-11-10 00:00:00

  • New enumeration algorithm for protein structure comparison and classification.

    abstract:BACKGROUND:Protein structure comparison and classification is an effective method for exploring protein structure-function relations. This problem is computationally challenging. Many different computational approaches for protein structure comparison apply the secondary structure elements (SSEs) representation of prot...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-S2-S1

    authors: Ashby C,Johnson D,Walker K,Kanj IA,Xia G,Huang X

    更新日期:2013-01-01 00:00:00

  • Dosage sensitivity of X-linked genes in human embryonic single cells.

    abstract:BACKGROUND:During the evolution of mammalian sex chromosomes, the degeneration of Y-linked homologs has led to a dosage imbalance between X-linked and autosomal genes. The evolutionary resolution to such dosage imbalance, as hypothesized by Susumu Ohno fifty years ago, should be doubling the expression of X-linked gene...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5432-8

    authors: Yang JR,Chen X

    更新日期:2019-01-14 00:00:00

  • An early response regulatory cluster induced by low temperature and hydrogen peroxide in seedlings of chilling-tolerant japonica rice.

    abstract:BACKGROUND:Plants respond to low temperature through an intricately coordinated transcriptional network. The CBF/DREB-regulated network of genes has been shown to play a prominent role in freeze-tolerance of Arabidopsis through the process of cold acclimation (CA). Recent evidence also showed that the CBF/DREB regulon ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-175

    authors: Cheng C,Yun KY,Ressom HW,Mohanty B,Bajic VB,Jia Y,Yun SJ,de los Reyes BG

    更新日期:2007-06-18 00:00:00

  • Genome-wide analysis of the effect of histone modifications on the coexpression of neighboring genes in Saccharomyces cerevisiae.

    abstract:BACKGROUND:Neighboring gene pairs in the genome of Saccharomyces cerevisiae have a tendency to be expressed at the same time. The distribution of histone modifications along chromatin fibers is suggested to be an important mechanism responsible for such coexpression. However, the extent of the contribution of histone m...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-550

    authors: Deng Y,Dai X,Xiang Q,Dai Z,He C,Wang J,Feng J

    更新日期:2010-10-09 00:00:00

  • Generation and analysis of expression sequence tags from haustoria of the wheat stripe rust fungus Puccinia striiformis f. sp. Tritici.

    abstract:BACKGROUND:Stripe rust, caused by Puccinia striiformis f. sp. tritici (Pst), is one of the most destructive diseases of wheat (Triticum aestivum L.) worldwide. In spite of its agricultural importance, the genomics and genetics of the pathogen are poorly characterized. Pst transcripts from urediniospores and germinated ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-626

    authors: Yin C,Chen X,Wang X,Han Q,Kang Z,Hulbert SH

    更新日期:2009-12-23 00:00:00

  • Chronic wounds alter the proteome profile in skin mucus of farmed gilthead seabream.

    abstract:BACKGROUND:Skin and its mucus are known to be the first barrier of defence against any external stressors. In fish, skin wounds frequently appear as a result of intensive culture and also some diseases have skin ulcers as external clinical signs. However, there is no information about the changes produced by the wounds...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4349-3

    authors: Cordero H,Brinchmann MF,Cuesta A,Esteban MA

    更新日期:2017-12-02 00:00:00

  • Characterization of genome-wide segmental duplications reveals a common genomic feature of association with immunity among domestic animals.

    abstract:BACKGROUND:Segmental duplications (SDs) commonly exist in plant and animal genomes, playing crucial roles in genomic rearrangement, gene innovation and the formation of copy number variants. However, they have received little attention in most livestock species. RESULTS:Aiming at characterizing SDs across the genomes ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-3690-x

    authors: Feng X,Jiang J,Padhi A,Ning C,Fu J,Wang A,Mrode R,Liu JF

    更新日期:2017-04-12 00:00:00

  • Genomic expression during human myelopoiesis.

    abstract:BACKGROUND:Human myelopoiesis is an exciting biological model for cellular differentiation since it represents a plastic process where multipotent stem cells gradually limit their differentiation potential, generating different precursor cells which finally evolve into distinct terminally differentiated cells. This stu...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-264

    authors: Ferrari F,Bortoluzzi S,Coppe A,Basso D,Bicciato S,Zini R,Gemelli C,Danieli GA,Ferrari S

    更新日期:2007-08-03 00:00:00

  • Organogenic nodule development in hop (Humulus lupulus L.): transcript and metabolic responses.

    abstract:BACKGROUND:Hop (Humulus lupulus L.) is an economically important plant forming organogenic nodules which can be used for genetic transformation and micropropagation. We are interested in the mechanisms underlying reprogramming of cells through stress and hormone treatments. RESULTS:An integrated molecular and metabolo...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-445

    authors: Fortes AM,Santos F,Choi YH,Silva MS,Figueiredo A,Sousa L,Pessoa F,Santos BA,Sebastiana M,Palme K,Malhó R,Verpoorte R,Pais MS

    更新日期:2008-09-29 00:00:00

  • Identification of dysfunctional modules and disease genes in congenital heart disease by a network-based approach.

    abstract:BACKGROUND:The incidence of congenital heart disease (CHD) is continuously increasing among infants born alive nowadays, making it one of the leading causes of infant morbidity worldwide. Various studies suggest that both genetic and environmental factors lead to CHD, and therefore identifying its candidate genes and d...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-592

    authors: He D,Liu ZP,Chen L

    更新日期:2011-12-02 00:00:00

  • The comparison of four mitochondrial genomes reveals cytoplasmic male sterility candidate genes in cotton.

    abstract:BACKGROUND:The mitochondrial genomes of higher plants vary remarkably in size, structure and sequence content, as demonstrated by the accumulation and activity of repetitive DNA sequences. Incompatibility between mitochondrial genome and nuclear genome leads to non-functional male reproductive organs and results in cyt...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5122-y

    authors: Li S,Chen Z,Zhao N,Wang Y,Nie H,Hua J

    更新日期:2018-10-26 00:00:00

  • Altered gene expression in the superior temporal gyrus in schizophrenia.

    abstract:BACKGROUND:The superior temporal gyrus (STG), which encompasses the primary auditory cortex, is believed to be a major anatomical substrate for speech, language and communication. The STG connects to the limbic system (hippocampus and amygdala), the thalamus and neocortical association areas in the prefrontal cortex, a...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-199

    authors: Bowden NA,Scott RJ,Tooney PA

    更新日期:2008-04-29 00:00:00

  • In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity.

    abstract:BACKGROUND:MicroRNAs (miRNAs), short approximately 21-nucleotide RNA molecules, play an important role in post-transcriptional regulation of gene expression. The number of known miRNA hairpins registered in the miRBase database is rapidly increasing, but recent reports suggest that many miRNAs with restricted temporal ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-204

    authors: van der Burgt A,Fiers MW,Nap JP,van Ham RC

    更新日期:2009-04-30 00:00:00

  • Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms.

    abstract:BACKGROUND:High-throughput sequencing has opened up exciting possibilities in population and conservation genetics by enabling the assessment of genetic variation at genome-wide scales. One approach to reduce genome complexity, i.e. investigating only parts of the genome, is reduced-representation library (RRL) sequenc...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-16

    authors: Greminger MP,Stölting KN,Nater A,Goossens B,Arora N,Bruggmann R,Patrignani A,Nussberger B,Sharma R,Kraus RH,Ambu LN,Singleton I,Chikhi L,van Schaik CP,Krützen M

    更新日期:2014-01-10 00:00:00

  • Generation of a de novo transcriptome from equine lamellar tissue.

    abstract:BACKGROUND:Laminitis, the structural failure of interdigitated tissue that suspends the distal skeleton within the hoof capsule, is a devastating disease that is the second leading cause of both lameness and euthanasia in the horse. Current transcriptomic research focuses on the expression of known genes. However, as t...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1948-8

    authors: Holl HM,Gao S,Fei Z,Andrews C,Brooks SA

    更新日期:2015-10-03 00:00:00

  • Mosaic autosomal aneuploidies are detectable from single-cell RNAseq data.

    abstract:BACKGROUND:Aneuploidies are copy number variants that affect entire chromosomes. They are seen commonly in cancer, embryonic stem cells, human embryos, and in various trisomic diseases. Aneuploidies frequently affect only a subset of cells in a sample; this is known as "mosaic" aneuploidy. A cell that harbours an aneup...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4253-x

    authors: Griffiths JA,Scialdone A,Marioni JC

    更新日期:2017-11-25 00:00:00

  • Comparative genomic analyses reveal broad diversity in botulinum-toxin-producing Clostridia.

    abstract:BACKGROUND:Clostridium botulinum is a diverse group of bacteria characterized by the production of botulinum neurotoxin. Botulinum neurotoxins are classified into serotypes (BoNT/A-G), which are produced by six species/Groups of Clostridia, but the genetic background of the bacteria remains poorly understood. The purpo...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2502-z

    authors: Williamson CH,Sahl JW,Smith TJ,Xie G,Foley BT,Smith LA,Fernández RA,Lindström M,Korkeala H,Keim P,Foster J,Hill K

    更新日期:2016-03-03 00:00:00

  • Comparison of gene coverage of mouse oligonucleotide microarray platforms.

    abstract:BACKGROUND:The increasing use of DNA microarrays for genetical genomics studies generates a need for platforms with complete coverage of the genome. We have compared the effective gene coverage in the mouse genome of different commercial and noncommercial oligonucleotide microarray platforms by performing an in-house g...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-7-58

    authors: Verdugo RA,Medrano JF

    更新日期:2006-03-21 00:00:00

  • Assessing structural variation in a personal genome-towards a human reference diploid genome.

    abstract:BACKGROUND:Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity,...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1479-3

    authors: English AC,Salerno WJ,Hampton OA,Gonzaga-Jauregui C,Ambreth S,Ritter DI,Beck CR,Davis CF,Dahdouli M,Ma S,Carroll A,Veeraraghavan N,Bruestle J,Drees B,Hastie A,Lam ET,White S,Mishra P,Wang M,Han Y,Zhang F,Stankie

    更新日期:2015-04-11 00:00:00

  • Assessing runs of Homozygosity: a comparison of SNP Array and whole genome sequence low coverage data.

    abstract:BACKGROUND:Runs of Homozygosity (ROH) are genomic regions where identical haplotypes are inherited from each parent. Since their first detection due to technological advances in the late 1990s, ROHs have been shedding light on human population history and deciphering the genetic basis of monogenic and complex traits an...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4489-0

    authors: Ceballos FC,Hazelhurst S,Ramsay M

    更新日期:2018-01-30 00:00:00

  • ABSSeq: a new RNA-Seq analysis method based on modelling absolute expression differences.

    abstract:BACKGROUND:The recent advances in next generation sequencing technology have made the sequencing of RNA (i.e., RNA-Seq) an extemely popular approach for gene expression analysis. Identification of significant differential expression represents a crucial initial step in these analyses, on which most subsequent inference...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2848-2

    authors: Yang W,Rosenstiel PC,Schulenburg H

    更新日期:2016-08-04 00:00:00

  • A high-density genetic map of Arachis duranensis, a diploid ancestor of cultivated peanut.

    abstract:BACKGROUND:Cultivated peanut (Arachis hypogaea) is an allotetraploid species whose ancestral genomes are most likely derived from the A-genome species, A. duranensis, and the B-genome species, A. ipaensis. The very recent (several millennia) evolutionary origin of A. hypogaea has imposed a bottleneck for allelic and ph...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-469

    authors: Nagy ED,Guo Y,Tang S,Bowers JE,Okashah RA,Taylor CA,Zhang D,Khanal S,Heesacker AF,Khalilian N,Farmer AD,Carrasquilla-Garcia N,Penmetsa RV,Cook D,Stalker HT,Nielsen N,Ozias-Akins P,Knapp SJ

    更新日期:2012-09-11 00:00:00

  • Integrated proteomic and metabolomic analysis to study the effects of spaceflight on Candida albicans.

    abstract:BACKGROUND:Candida albicans is an opportunistic pathogenic yeast, which could become pathogenic in various stressful environmental factors including the spaceflight environment. In this study, we aim to explore the phenotypic changes and possible mechanisms of C. albicans after exposure to spaceflight conditions. RESU...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-6476-5

    authors: Wang J,Liu Y,Zhao G,Gao J,Liu J,Wu X,Xu C,Li Y

    更新日期:2020-01-17 00:00:00