Abstract:
BACKGROUND:Many biases and spurious effects are inherent in RNA-seq technology, resulting in a non-uniform distribution of sequencing read counts for each base position in a gene. Therefore, a base-level strategy is required to model the non-uniformity. Also, the properties of sequencing read counts can be leveraged to achieve a more precise estimation of the mean and variance of measurement. RESULTS:In this study, we aimed to unveil the effects on RNA-seq accuracy from multiple factors and develop accurate modeling of RNA-seq reads in comparison. We found that the overdispersion rate decreased when sequencing depth increased on the base level. Moreover, the influence of local sequence(s) on the overdispersion rate was notable but no longer significant after adjusting the effect from sequencing depth. Based on these findings, we propose a desirable beta-binomial model with a dynamic overdispersion rate on the base-level proportion of sequencing read counts from two samples. CONCLUSIONS:The current study provides thorough insights into the impact of overdispersion at the position level and especially into its relationship with sequencing depth, local sequence, and preparation protocol. These properties of RNA-seq will aid in improvement of the quality control procedure and development of statistical methods for RNA-seq downstream analyses.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Cai G,Liang S,Zheng X,Xiao Fdoi
10.1186/s12859-017-1780-zsubject
Has Abstractpub_date
2017-08-09 00:00:00pages
364issue
1issn
1471-2105pii
10.1186/s12859-017-1780-zjournal_volume
18pub_type
杂志文章abstract:BACKGROUND:Gene expression experiments are common in molecular biology, for example in order to identify genes which play a certain role in a specified biological framework. For that purpose expression levels of several thousand genes are measured simultaneously using DNA microarrays. Comparing two distinct groups of t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-288
更新日期:2011-07-15 00:00:00
abstract:BACKGROUND:The exploration of the structural topology and the organizing principles of genome-based large-scale metabolic networks is essential for studying possible relations between structure and functionality of metabolic networks. Topological analysis of graph models has often been applied to study the structural c...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-386
更新日期:2006-08-18 00:00:00
abstract:BACKGROUND:Mechanistic models that describe the dynamical behaviors of biochemical systems are common in computational systems biology, especially in the realm of cellular signaling. The development of families of such models, either by a single research group or by different groups working within the same area, presen...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-316
更新日期:2014-09-25 00:00:00
abstract:BACKGROUND:Various computer-based methods exist for the detection and quantification of protein spots in two dimensional gel electrophoresis images. Area-based methods are commonly used for spot quantification: an area is assigned to each spot and the sum of the pixel intensities in that area, the so-called volume, is ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-181
更新日期:2014-06-11 00:00:00
abstract:BACKGROUND:In bioinformatics community, many tasks associate with matching a set of protein query sequences in large sequence datasets. To conduct multiple queries in the database, a common used method is to run BLAST on each original querey or on the concatenated queries. It is inefficient since it doesn't exploit the...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1938-8
更新日期:2017-11-21 00:00:00
abstract:BACKGROUND:Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-158
更新日期:2013-05-13 00:00:00
abstract:BACKGROUND:MicroRNAs (miRNAs) are recognized as one of the most important families of non-coding RNAs that serve as important sequence-specific post-transcriptional regulators of gene expression. Identification of miRNAs is an important requirement for understanding the mechanisms of post-transcriptional regulation. Hu...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-341
更新日期:2007-09-17 00:00:00
abstract:BACKGROUND:We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-60
更新日期:2012-04-24 00:00:00
abstract:BACKGROUND:Replication origins are considered important sites for understanding the molecular mechanisms involved in DNA replication. Many computational methods have been developed for predicting their locations in archaeal, bacterial and eukaryotic genomes. However, a prediction method designed for a particular kind o...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-163
更新日期:2007-05-21 00:00:00
abstract:BACKGROUND:Many processes in molecular biology involve the recognition of short sequences of nucleic-or amino acids, such as the binding of immunogenic peptides to major histocompatibility complex (MHC) molecules. From experimental data, a model of the sequence specificity of these processes can be constructed, such as...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-132
更新日期:2005-05-31 00:00:00
abstract:BACKGROUND:We previously developed GoMiner, an application that organizes lists of 'interesting' genes (for example, under-and overexpressed genes from a microarray experiment) for biological interpretation in the context of the Gene Ontology. The original version of GoMiner was oriented toward visualization and interp...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-168
更新日期:2005-07-05 00:00:00
abstract:BACKGROUND:With the introduction of tissue microarrays (TMAs) researchers can investigate gene and protein expression in tissues on a high-throughput scale. TMAs generate a wealth of data calling for extended, high level data management. Enhanced data analysis and systematic data management are required for traceabilit...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-81
更新日期:2007-03-07 00:00:00
abstract:BACKGROUND:Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-16-S11-S4
更新日期:2015-01-01 00:00:00
abstract:BACKGROUND:Shotgun metagenomics based on untargeted sequencing can explore the taxonomic profile and the function of unknown microorganisms in samples, and complement the shortage of amplicon sequencing. Binning assembled sequences into individual groups, which represent microbial genomes, is the key step and a major c...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03667-3
更新日期:2020-07-28 00:00:00
abstract:BACKGROUND:The improvements of high throughput technologies have produced large amounts of multi-omics experiments datasets. Initial analysis of these data has revealed many concurrent gene alterations within single dataset or/and among multiple omics datasets. Although powerful bioinformatics pipelines have been devel...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3171-0
更新日期:2019-11-08 00:00:00
abstract:BACKGROUND:The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-104
更新日期:2013-03-22 00:00:00
abstract:BACKGROUND:Compound Heterozygosity (CH) in classical genetics is the presence of two different recessive mutations at a particular gene locus. A relaxed form of CH alleles may account for an essential proportion of the missing heritability, i.e. heritability of phenotypes so far not accounted for by single genetic vari...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1006-9
更新日期:2016-04-08 00:00:00
abstract:BACKGROUND:Conservation and variation scores are used when evaluating sites in a multiple sequence alignment, in order to identify residues critical for structure or function. A variety of scores are available today but it is not clear how different scores relate to each other. RESULTS:We applied 25 conservation and v...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-388
更新日期:2010-07-21 00:00:00
abstract:BACKGROUND:Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now poss...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0377-z
更新日期:2014-11-19 00:00:00
abstract:BACKGROUND:MLPA method is a potentially useful semi-quantitative method to detect copy number alterations in targeted regions. In this paper, we propose a method for the normalization procedure based on a non-linear mixed-model, as well as a new approach for determining the statistical significance of altered probes ba...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-261
更新日期:2008-06-04 00:00:00
abstract:BACKGROUND:Membrane transport proteins (transporters) play an essential role in every living cell by transporting hydrophilic molecules across the hydrophobic membranes. While the sequences of many membrane proteins are known, their structure and function is still not well characterized and understood, owing to the imm...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3311-6
更新日期:2020-04-23 00:00:00
abstract:BACKGROUND:Identifying drug-target interaction is a key element in drug discovery. In silico prediction of drug-target interaction can speed up the process of identifying unknown interactions between drugs and target proteins. In recent studies, handcrafted features, similarity metrics and machine learning methods have...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3518-6
更新日期:2020-07-06 00:00:00
abstract:BACKGROUND:High-throughput experiments, such as with DNA microarrays, typically result in hundreds of genes potentially relevant to the process under study, rendering the interpretation of these experiments problematic. Here, we propose and evaluate an approach to find functional associations between large numbers of g...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-14
更新日期:2007-01-18 00:00:00
abstract:BACKGROUND:In the last decade, techniques were established for the large scale genome-wide analysis of proteins, RNA, and metabolites, and database solutions have been developed to manage the generated data sets. The Golm Metabolome Database for metabolite data (GMD) represents one such effort to make these data broadl...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-216
更新日期:2007-06-23 00:00:00
abstract:BACKGROUND:To infer gene regulatory networks from time series gene profiles, two important tasks that are related to biological systems must be undertaken. One task is to determine a valid network structure that has topological properties that can influence the network dynamics profoundly. The other task is to optimize...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S15-S8
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:The development of high-throughput technologies has produced several large scale protein interaction data sets for multiple species, and significant efforts have been made to analyze the data sets in order to understand protein activities. Considering that the basic units of protein interactions are domain i...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-269
更新日期:2006-05-25 00:00:00
abstract:BACKGROUND:Since their introduction in 2009, the BioNLP Shared Task events have been instrumental in advancing the development of methods and resources for the automatic extraction of information from the biomedical literature. In this paper, we present the Cancer Genetics (CG) and Pathway Curation (PC) tasks, two even...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-16-S10-S2
更新日期:2015-01-01 00:00:00
abstract:BACKGROUND:Urokinase, its receptor and the integrins are functionally associated and involved in regulation of cell signaling, migration, adhesion and proliferation. No structural information is available on this potential multimolecular complex. However, the tri-dimensional structure of urokinase, urokinase receptor a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S2-S8
更新日期:2008-03-26 00:00:00
abstract:BACKGROUND:As a rule, peptides are more flexible and unstructured than proteins with their substantial stabilizing hydrophobic cores. Nevertheless, a few stably folding peptides have been discovered. This raises the question whether there may be more such peptides that are unknown as yet. These molecules could be helpf...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-109
更新日期:2008-02-19 00:00:00
abstract:BACKGROUND:Our knowledge of global protein-protein interaction (PPI) networks in complex organisms such as humans is hindered by technical limitations of current methods. RESULTS:On the basis of short co-occurring polypeptide regions, we developed a tool called MP-PIPE capable of predicting a global human PPI network ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0383-1
更新日期:2014-12-10 00:00:00