Local sequence and sequencing depth dependent accuracy of RNA-seq reads.

Abstract:

BACKGROUND:Many biases and spurious effects are inherent in RNA-seq technology, resulting in a non-uniform distribution of sequencing read counts for each base position in a gene. Therefore, a base-level strategy is required to model the non-uniformity. Also, the properties of sequencing read counts can be leveraged to achieve a more precise estimation of the mean and variance of measurement. RESULTS:In this study, we aimed to unveil the effects on RNA-seq accuracy from multiple factors and develop accurate modeling of RNA-seq reads in comparison. We found that the overdispersion rate decreased when sequencing depth increased on the base level. Moreover, the influence of local sequence(s) on the overdispersion rate was notable but no longer significant after adjusting the effect from sequencing depth. Based on these findings, we propose a desirable beta-binomial model with a dynamic overdispersion rate on the base-level proportion of sequencing read counts from two samples. CONCLUSIONS:The current study provides thorough insights into the impact of overdispersion at the position level and especially into its relationship with sequencing depth, local sequence, and preparation protocol. These properties of RNA-seq will aid in improvement of the quality control procedure and development of statistical methods for RNA-seq downstream analyses.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Cai G,Liang S,Zheng X,Xiao F

doi

10.1186/s12859-017-1780-z

subject

Has Abstract

pub_date

2017-08-09 00:00:00

pages

364

issue

1

issn

1471-2105

pii

10.1186/s12859-017-1780-z

journal_volume

18

pub_type

杂志文章
  • Reporting FDR analogous confidence intervals for the log fold change of differentially expressed genes.

    abstract:BACKGROUND:Gene expression experiments are common in molecular biology, for example in order to identify genes which play a certain role in a specified biological framework. For that purpose expression levels of several thousand genes are measured simultaneously using DNA microarrays. Comparing two distinct groups of t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-288

    authors: Jung K,Friede T,Beissbarth T

    更新日期:2011-07-15 00:00:00

  • Hierarchical modularity of nested bow-ties in metabolic networks.

    abstract:BACKGROUND:The exploration of the structural topology and the organizing principles of genome-based large-scale metabolic networks is essential for studying possible relations between structure and functionality of metabolic networks. Topological analysis of graph models has often been applied to study the structural c...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-386

    authors: Zhao J,Yu H,Luo JH,Cao ZW,Li YX

    更新日期:2006-08-18 00:00:00

  • MOSBIE: a tool for comparison and analysis of rule-based biochemical models.

    abstract:BACKGROUND:Mechanistic models that describe the dynamical behaviors of biochemical systems are common in computational systems biology, especially in the realm of cellular signaling. The development of families of such models, either by a single research group or by different groups working within the same area, presen...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-316

    authors: Wenskovitch JE Jr,Harris LA,Tapia JJ,Faeder JR,Marai GE

    更新日期:2014-09-25 00:00:00

  • Spot quantification in two dimensional gel electrophoresis image analysis: comparison of different approaches and presentation of a novel compound fitting algorithm.

    abstract:BACKGROUND:Various computer-based methods exist for the detection and quantification of protein spots in two dimensional gel electrophoresis images. Area-based methods are commonly used for spot quantification: an area is assigned to each spot and the sum of the pixel intensities in that area, the so-called volume, is ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-181

    authors: Brauner JM,Groemer TW,Stroebel A,Grosse-Holz S,Oberstein T,Wiltfang J,Kornhuber J,Maler JM

    更新日期:2014-06-11 00:00:00

  • Fast batch searching for protein homology based on compression and clustering.

    abstract:BACKGROUND:In bioinformatics community, many tasks associate with matching a set of protein query sequences in large sequence datasets. To conduct multiple queries in the database, a common used method is to run BLAST on each original querey or on the concatenated queries. It is inefficient since it doesn't exploit the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1938-8

    authors: Ge H,Sun L,Yu J

    更新日期:2017-11-21 00:00:00

  • Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient.

    abstract:BACKGROUND:Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-158

    authors: Stoltzfus A,Lapp H,Matasci N,Deus H,Sidlauskas B,Zmasek CM,Vaidya G,Pontelli E,Cranston K,Vos R,Webb CO,Harmon LJ,Pirrung M,O'Meara B,Pennell MW,Mirarab S,Rosenberg MS,Balhoff JP,Bik HM,Heath TA,Midford PE,Brown

    更新日期:2013-05-13 00:00:00

  • MiRFinder: an improved approach and software implementation for genome-wide fast microRNA precursor scans.

    abstract:BACKGROUND:MicroRNAs (miRNAs) are recognized as one of the most important families of non-coding RNAs that serve as important sequence-specific post-transcriptional regulators of gene expression. Identification of miRNAs is an important requirement for understanding the mechanisms of post-transcriptional regulation. Hu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-341

    authors: Huang TH,Fan B,Rothschild MF,Hu ZL,Li K,Zhao SH

    更新日期:2007-09-17 00:00:00

  • Is EC class predictable from reaction mechanism?

    abstract:BACKGROUND:We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-60

    authors: Nath N,Mitchell JB

    更新日期:2012-04-24 00:00:00

  • AT excursion: a new approach to predict replication origins in viral genomes by locating AT-rich regions.

    abstract:BACKGROUND:Replication origins are considered important sites for understanding the molecular mechanisms involved in DNA replication. Many computational methods have been developed for predicting their locations in archaeal, bacterial and eukaryotic genomes. However, a prediction method designed for a particular kind o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-163

    authors: Chew DS,Leung MY,Choi KP

    更新日期:2007-05-21 00:00:00

  • Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method.

    abstract:BACKGROUND:Many processes in molecular biology involve the recognition of short sequences of nucleic-or amino acids, such as the binding of immunogenic peptides to major histocompatibility complex (MHC) molecules. From experimental data, a model of the sequence specificity of these processes can be constructed, such as...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-132

    authors: Peters B,Sette A

    更新日期:2005-05-31 00:00:00

  • High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID).

    abstract:BACKGROUND:We previously developed GoMiner, an application that organizes lists of 'interesting' genes (for example, under-and overexpressed genes from a microarray experiment) for biological interpretation in the context of the Gene Ontology. The original version of GoMiner was oriented toward visualization and interp...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-168

    authors: Zeeberg BR,Qin H,Narasimhan S,Sunshine M,Cao H,Kane DW,Reimers M,Stephens RM,Bryant D,Burt SK,Elnekave E,Hari DM,Wynn TA,Cunningham-Rundles C,Stewart DM,Nelson D,Weinstein JN

    更新日期:2005-07-05 00:00:00

  • TAMEE: data management and analysis for tissue microarrays.

    abstract:BACKGROUND:With the introduction of tissue microarrays (TMAs) researchers can investigate gene and protein expression in tissues on a high-throughput scale. TMAs generate a wealth of data calling for extended, high level data management. Enhanced data analysis and systematic data management are required for traceabilit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-81

    authors: Thallinger GG,Baumgartner K,Pirklbauer M,Uray M,Pauritsch E,Mehes G,Buck CR,Zatloukal K,Trajanoski Z

    更新日期:2007-03-07 00:00:00

  • Epiviz: a view inside the design of an integrated visual analysis software for genomics.

    abstract:BACKGROUND:Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S11-S4

    authors: Chelaru F,Corrada Bravo H

    更新日期:2015-01-01 00:00:00

  • Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets.

    abstract:BACKGROUND:Shotgun metagenomics based on untargeted sequencing can explore the taxonomic profile and the function of unknown microorganisms in samples, and complement the shortage of amplicon sequencing. Binning assembled sequences into individual groups, which represent microbial genomes, is the key step and a major c...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03667-3

    authors: Yue Y,Huang H,Qi Z,Dou HM,Liu XY,Han TF,Chen Y,Song XJ,Zhang YH,Tu J

    更新日期:2020-07-28 00:00:00

  • OmicsARules: a R package for integration of multi-omics datasets via association rules mining.

    abstract:BACKGROUND:The improvements of high throughput technologies have produced large amounts of multi-omics experiments datasets. Initial analysis of these data has revealed many concurrent gene alterations within single dataset or/and among multiple omics datasets. Although powerful bioinformatics pipelines have been devel...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3171-0

    authors: Chen D,Zhang F,Zhao Q,Xu J

    更新日期:2019-11-08 00:00:00

  • Application of text-mining for updating protein post-translational modification annotation in UniProtKB.

    abstract:BACKGROUND:The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-104

    authors: Veuthey AL,Bridge A,Gobeill J,Ruch P,McEntyre JR,Bougueleret L,Xenarios I

    更新日期:2013-03-22 00:00:00

  • CollapsABEL: an R library for detecting compound heterozygote alleles in genome-wide association studies.

    abstract:BACKGROUND:Compound Heterozygosity (CH) in classical genetics is the presence of two different recessive mutations at a particular gene locus. A relaxed form of CH alleles may account for an essential proportion of the missing heritability, i.e. heritability of phenotypes so far not accounted for by single genetic vari...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1006-9

    authors: Zhong K,Karssen LC,Kayser M,Liu F

    更新日期:2016-04-08 00:00:00

  • A comparative study of conservation and variation scores.

    abstract:BACKGROUND:Conservation and variation scores are used when evaluating sites in a multiple sequence alignment, in order to identify residues critical for structure or function. A variety of scores are available today but it is not clear how different scores relate to each other. RESULTS:We applied 25 conservation and v...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-388

    authors: Johansson F,Toh H

    更新日期:2010-07-21 00:00:00

  • TE-Tracker: systematic identification of transposition events through whole-genome resequencing.

    abstract:BACKGROUND:Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now poss...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0377-z

    authors: Gilly A,Etcheverry M,Madoui MA,Guy J,Quadrana L,Alberti A,Martin A,Heitkam T,Engelen S,Labadie K,Le Pen J,Wincker P,Colot V,Aury JM

    更新日期:2014-11-19 00:00:00

  • Probe-specific mixed-model approach to detect copy number differences using multiplex ligation-dependent probe amplification (MLPA).

    abstract:BACKGROUND:MLPA method is a potentially useful semi-quantitative method to detect copy number alterations in targeted regions. In this paper, we propose a method for the normalization procedure based on a non-linear mixed-model, as well as a new approach for determining the statistical significance of altered probes ba...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-261

    authors: González JR,Carrasco JL,Armengol L,Villatoro S,Jover L,Yasui Y,Estivill X

    更新日期:2008-06-04 00:00:00

  • TooT-T: discrimination of transport proteins from non-transport proteins.

    abstract:BACKGROUND:Membrane transport proteins (transporters) play an essential role in every living cell by transporting hydrophilic molecules across the hydrophobic membranes. While the sequences of many membrane proteins are known, their structure and function is still not well characterized and understood, owing to the imm...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3311-6

    authors: Alballa M,Butler G

    更新日期:2020-04-23 00:00:00

  • Drug-target interaction prediction using semi-bipartite graph model and deep learning.

    abstract:BACKGROUND:Identifying drug-target interaction is a key element in drug discovery. In silico prediction of drug-target interaction can speed up the process of identifying unknown interactions between drugs and target proteins. In recent studies, handcrafted features, similarity metrics and machine learning methods have...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3518-6

    authors: Eslami Manoochehri H,Nourani M

    更新日期:2020-07-06 00:00:00

  • Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation.

    abstract:BACKGROUND:High-throughput experiments, such as with DNA microarrays, typically result in hundreds of genes potentially relevant to the process under study, rendering the interpretation of these experiments problematic. Here, we propose and evaluate an approach to find functional associations between large numbers of g...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-14

    authors: Jelier R,Jenster G,Dorssers LC,Wouters BJ,Hendriksen PJ,Mons B,Delwel R,Kors JA

    更新日期:2007-01-18 00:00:00

  • ProMEX: a mass spectral reference database for proteins and protein phosphorylation sites.

    abstract:BACKGROUND:In the last decade, techniques were established for the large scale genome-wide analysis of proteins, RNA, and metabolites, and database solutions have been developed to manage the generated data sets. The Golm Metabolome Database for metabolite data (GMD) represents one such effort to make these data broadl...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-216

    authors: Hummel J,Niemann M,Wienkoop S,Schulze W,Steinhauser D,Selbig J,Walther D,Weckwerth W

    更新日期:2007-06-23 00:00:00

  • Reverse engineering gene regulatory networks: coupling an optimization algorithm with a parameter identification technique.

    abstract:BACKGROUND:To infer gene regulatory networks from time series gene profiles, two important tasks that are related to biological systems must be undertaken. One task is to determine a valid network structure that has topological properties that can influence the network dynamics profoundly. The other task is to optimize...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S15-S8

    authors: Hsiao YT,Lee WP

    更新日期:2014-01-01 00:00:00

  • An integrated approach to the prediction of domain-domain interactions.

    abstract:BACKGROUND:The development of high-throughput technologies has produced several large scale protein interaction data sets for multiple species, and significant efforts have been made to analyze the data sets in order to understand protein activities. Considering that the basic units of protein interactions are domain i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-269

    authors: Lee H,Deng M,Sun F,Chen T

    更新日期:2006-05-25 00:00:00

  • Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013.

    abstract:BACKGROUND:Since their introduction in 2009, the BioNLP Shared Task events have been instrumental in advancing the development of methods and resources for the automatic extraction of information from the biomedical literature. In this paper, we present the Cancer Genetics (CG) and Pathway Curation (PC) tasks, two even...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S10-S2

    authors: Pyysalo S,Ohta T,Rak R,Rowley A,Chun HW,Jung SJ,Choi SP,Tsujii J,Ananiadou S

    更新日期:2015-01-01 00:00:00

  • In silico docking of urokinase plasminogen activator and integrins.

    abstract:BACKGROUND:Urokinase, its receptor and the integrins are functionally associated and involved in regulation of cell signaling, migration, adhesion and proliferation. No structural information is available on this potential multimolecular complex. However, the tri-dimensional structure of urokinase, urokinase receptor a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S2-S8

    authors: Degryse B,Fernandez-Recio J,Citro V,Blasi F,Cubellis MV

    更新日期:2008-03-26 00:00:00

  • Evolutionary Pareto-optimization of stably folding peptides.

    abstract:BACKGROUND:As a rule, peptides are more flexible and unstructured than proteins with their substantial stabilizing hydrophobic cores. Nevertheless, a few stably folding peptides have been discovered. This raises the question whether there may be more such peptides that are unknown as yet. These molecules could be helpf...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-109

    authors: Gronwald W,Hohm T,Hoffmann D

    更新日期:2008-02-19 00:00:00

  • Efficient prediction of human protein-protein interactions at a global scale.

    abstract:BACKGROUND:Our knowledge of global protein-protein interaction (PPI) networks in complex organisms such as humans is hindered by technical limitations of current methods. RESULTS:On the basis of short co-occurring polypeptide regions, we developed a tool called MP-PIPE capable of predicting a global human PPI network ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0383-1

    authors: Schoenrock A,Samanfar B,Pitre S,Hooshyar M,Jin K,Phillips CA,Wang H,Phanse S,Omidi K,Gui Y,Alamgir M,Wong A,Barrenäs F,Babu M,Benson M,Langston MA,Green JR,Dehne F,Golshani A

    更新日期:2014-12-10 00:00:00