Abstract:
BACKGROUND:RNA-Seq has become increasingly popular in transcriptome profiling. One aspect of transcriptome research is to quantify the expression levels of genomic elements, such as genes, their transcripts and exons. Acquiring a transcriptome expression profile requires genomic elements to be defined in the context of the genome. Multiple human genome annotation databases exist, including RefGene (RefSeq Gene), Ensembl, and the UCSC annotation database. The impact of the choice of an annotation on estimating gene expression remains insufficiently investigated. RESULTS:In this paper, we systematically characterized the impact of genome annotation choice on read mapping and transcriptome quantification by analyzing a RNA-Seq dataset generated by the Human Body Map 2.0 Project. The impact of a gene model on mapping of non-junction reads is different from junction reads. For the RNA-Seq dataset with a read length of 75 bp, on average, 95% of non-junction reads were mapped to exactly the same genomic location regardless of which gene models was used. By contrast, this percentage dropped to 53% for junction reads. In addition, about 30% of junction reads failed to align without the assistance of a gene model, while 10-15% mapped alternatively. There are 21,958 common genes among RefGene, Ensembl, and UCSC annotations. When we compared the gene quantification results in RefGene and Ensembl annotations, 20% of genes are not expressed, and thus have a zero count in both annotations. Surprisingly, identical gene quantification results were obtained for only 16.3% (about one sixth) of genes. Approximately 28.1% of genes' expression levels differed by 5% or higher, and of those, the relative expression levels for 9.3% of genes (equivalent to 2038) differed by 50% or greater. The case studies revealed that the gene definition differences in gene models frequently result in inconsistency in gene quantification. CONCLUSIONS:We demonstrated that the choice of a gene model has a dramatic effect on both gene quantification and differential analysis. Our research will help RNA-Seq data analysts to make an informed choice of gene model in practical RNA-Seq data analysis.
journal_name
BMC Genomicsjournal_title
BMC genomicsauthors
Zhao S,Zhang Bdoi
10.1186/s12864-015-1308-8subject
Has Abstractpub_date
2015-02-18 00:00:00pages
97issn
1471-2164pii
s12864-015-1308-8journal_volume
16pub_type
杂志文章相关文献
BMC GENOMICS文献大全abstract:BACKGROUND:Human genetic variation produces the wide range of phenotypic differences that make us individual. However, little is known about the distribution of variation in the most conserved functional regions of the human genome. We examined whether different subsets of the conserved human genome have been subjected...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-10-614
更新日期:2009-12-16 00:00:00
abstract:BACKGROUND:Most microbial eukaryotes are uncultivated and thus poorly suited to standard genomic techniques. This is the case for Polykrikos lebouriae, a dinoflagellate with ultrastructurally aberrant plastids. It has been suggested that these plastids stem from a novel symbiosis with either a diatom or haptophyte, but...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-015-1636-8
更新日期:2015-07-17 00:00:00
abstract:BACKGROUND:Bulbs of the ornamental flower Lilium pumilum enter a period of dormancy after flowering in spring, and require exposure to cold for a period of time in order to release dormancy. Previous studies focused mainly on anatomical, physiological and biochemical changes during dormancy release. There are no dorman...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-4536-x
更新日期:2018-03-14 00:00:00
abstract:BACKGROUND:Toxigenic Vibrio cholerae serogroup O1 is the causative pathogen in the sixth and seventh cholera pandemics. Cholera toxin is the major virulent factor but other virulence and virulence-related factors play certain roles in the pathogenesis and survival in the host. Along with the evolution of the epidemic s...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-019-5725-y
更新日期:2019-05-09 00:00:00
abstract:BACKGROUND:DNA methylation at promoters is largely correlated with inhibition of gene expression. However, the role of DNA methylation at enhancers is not fully understood, although a crosstalk with chromatin marks is expected. Actually, there exist contradictory reports about positive and negative correlations between...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-017-4353-7
更新日期:2017-12-12 00:00:00
abstract:BACKGROUND:Mitochondria are organelles that fulfill a fundamental role in cell bioenergetics, as well as in other processes like cell signaling and death. Small non-coding RNAs (sncRNA) are now being considered as pivotal post-transcriptional regulators, widening the landscape of their diversity and functions. In mamma...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-5020-3
更新日期:2018-08-28 00:00:00
abstract:BACKGROUND:Regulation of bacterial gene expression by small RNAs (sRNAs) have proved to be important for many biological processes. Francisella tularensis is a highly pathogenic Gram-negative bacterium that causes the disease tularaemia in humans and animals. Relatively little is known about the regulatory networks exi...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-11-625
更新日期:2010-11-10 00:00:00
abstract:BACKGROUND:Protein structure comparison and classification is an effective method for exploring protein structure-function relations. This problem is computationally challenging. Many different computational approaches for protein structure comparison apply the secondary structure elements (SSEs) representation of prot...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-14-S2-S1
更新日期:2013-01-01 00:00:00
abstract:BACKGROUND:During the evolution of mammalian sex chromosomes, the degeneration of Y-linked homologs has led to a dosage imbalance between X-linked and autosomal genes. The evolutionary resolution to such dosage imbalance, as hypothesized by Susumu Ohno fifty years ago, should be doubling the expression of X-linked gene...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-019-5432-8
更新日期:2019-01-14 00:00:00
abstract:BACKGROUND:Plants respond to low temperature through an intricately coordinated transcriptional network. The CBF/DREB-regulated network of genes has been shown to play a prominent role in freeze-tolerance of Arabidopsis through the process of cold acclimation (CA). Recent evidence also showed that the CBF/DREB regulon ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-8-175
更新日期:2007-06-18 00:00:00
abstract:BACKGROUND:Neighboring gene pairs in the genome of Saccharomyces cerevisiae have a tendency to be expressed at the same time. The distribution of histone modifications along chromatin fibers is suggested to be an important mechanism responsible for such coexpression. However, the extent of the contribution of histone m...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-11-550
更新日期:2010-10-09 00:00:00
abstract:BACKGROUND:Stripe rust, caused by Puccinia striiformis f. sp. tritici (Pst), is one of the most destructive diseases of wheat (Triticum aestivum L.) worldwide. In spite of its agricultural importance, the genomics and genetics of the pathogen are poorly characterized. Pst transcripts from urediniospores and germinated ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-10-626
更新日期:2009-12-23 00:00:00
abstract:BACKGROUND:Skin and its mucus are known to be the first barrier of defence against any external stressors. In fish, skin wounds frequently appear as a result of intensive culture and also some diseases have skin ulcers as external clinical signs. However, there is no information about the changes produced by the wounds...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-017-4349-3
更新日期:2017-12-02 00:00:00
abstract:BACKGROUND:Segmental duplications (SDs) commonly exist in plant and animal genomes, playing crucial roles in genomic rearrangement, gene innovation and the formation of copy number variants. However, they have received little attention in most livestock species. RESULTS:Aiming at characterizing SDs across the genomes ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-017-3690-x
更新日期:2017-04-12 00:00:00
abstract:BACKGROUND:Human myelopoiesis is an exciting biological model for cellular differentiation since it represents a plastic process where multipotent stem cells gradually limit their differentiation potential, generating different precursor cells which finally evolve into distinct terminally differentiated cells. This stu...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-8-264
更新日期:2007-08-03 00:00:00
abstract:BACKGROUND:Hop (Humulus lupulus L.) is an economically important plant forming organogenic nodules which can be used for genetic transformation and micropropagation. We are interested in the mechanisms underlying reprogramming of cells through stress and hormone treatments. RESULTS:An integrated molecular and metabolo...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-9-445
更新日期:2008-09-29 00:00:00
abstract:BACKGROUND:The incidence of congenital heart disease (CHD) is continuously increasing among infants born alive nowadays, making it one of the leading causes of infant morbidity worldwide. Various studies suggest that both genetic and environmental factors lead to CHD, and therefore identifying its candidate genes and d...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-12-592
更新日期:2011-12-02 00:00:00
abstract:BACKGROUND:The mitochondrial genomes of higher plants vary remarkably in size, structure and sequence content, as demonstrated by the accumulation and activity of repetitive DNA sequences. Incompatibility between mitochondrial genome and nuclear genome leads to non-functional male reproductive organs and results in cyt...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-5122-y
更新日期:2018-10-26 00:00:00
abstract:BACKGROUND:The superior temporal gyrus (STG), which encompasses the primary auditory cortex, is believed to be a major anatomical substrate for speech, language and communication. The STG connects to the limbic system (hippocampus and amygdala), the thalamus and neocortical association areas in the prefrontal cortex, a...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-9-199
更新日期:2008-04-29 00:00:00
abstract:BACKGROUND:MicroRNAs (miRNAs), short approximately 21-nucleotide RNA molecules, play an important role in post-transcriptional regulation of gene expression. The number of known miRNA hairpins registered in the miRBase database is rapidly increasing, but recent reports suggest that many miRNAs with restricted temporal ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-10-204
更新日期:2009-04-30 00:00:00
abstract:BACKGROUND:High-throughput sequencing has opened up exciting possibilities in population and conservation genetics by enabling the assessment of genetic variation at genome-wide scales. One approach to reduce genome complexity, i.e. investigating only parts of the genome, is reduced-representation library (RRL) sequenc...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-16
更新日期:2014-01-10 00:00:00
abstract:BACKGROUND:Laminitis, the structural failure of interdigitated tissue that suspends the distal skeleton within the hoof capsule, is a devastating disease that is the second leading cause of both lameness and euthanasia in the horse. Current transcriptomic research focuses on the expression of known genes. However, as t...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-015-1948-8
更新日期:2015-10-03 00:00:00
abstract:BACKGROUND:Aneuploidies are copy number variants that affect entire chromosomes. They are seen commonly in cancer, embryonic stem cells, human embryos, and in various trisomic diseases. Aneuploidies frequently affect only a subset of cells in a sample; this is known as "mosaic" aneuploidy. A cell that harbours an aneup...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-017-4253-x
更新日期:2017-11-25 00:00:00
abstract:BACKGROUND:Clostridium botulinum is a diverse group of bacteria characterized by the production of botulinum neurotoxin. Botulinum neurotoxins are classified into serotypes (BoNT/A-G), which are produced by six species/Groups of Clostridia, but the genetic background of the bacteria remains poorly understood. The purpo...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-016-2502-z
更新日期:2016-03-03 00:00:00
abstract:BACKGROUND:The increasing use of DNA microarrays for genetical genomics studies generates a need for platforms with complete coverage of the genome. We have compared the effective gene coverage in the mouse genome of different commercial and noncommercial oligonucleotide microarray platforms by performing an in-house g...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-7-58
更新日期:2006-03-21 00:00:00
abstract:BACKGROUND:Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity,...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-015-1479-3
更新日期:2015-04-11 00:00:00
abstract:BACKGROUND:Runs of Homozygosity (ROH) are genomic regions where identical haplotypes are inherited from each parent. Since their first detection due to technological advances in the late 1990s, ROHs have been shedding light on human population history and deciphering the genetic basis of monogenic and complex traits an...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-4489-0
更新日期:2018-01-30 00:00:00
abstract:BACKGROUND:The recent advances in next generation sequencing technology have made the sequencing of RNA (i.e., RNA-Seq) an extemely popular approach for gene expression analysis. Identification of significant differential expression represents a crucial initial step in these analyses, on which most subsequent inference...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-016-2848-2
更新日期:2016-08-04 00:00:00
abstract:BACKGROUND:Cultivated peanut (Arachis hypogaea) is an allotetraploid species whose ancestral genomes are most likely derived from the A-genome species, A. duranensis, and the B-genome species, A. ipaensis. The very recent (several millennia) evolutionary origin of A. hypogaea has imposed a bottleneck for allelic and ph...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-13-469
更新日期:2012-09-11 00:00:00
abstract:BACKGROUND:Candida albicans is an opportunistic pathogenic yeast, which could become pathogenic in various stressful environmental factors including the spaceflight environment. In this study, we aim to explore the phenotypic changes and possible mechanisms of C. albicans after exposure to spaceflight conditions. RESU...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-020-6476-5
更新日期:2020-01-17 00:00:00