Abstract:
BACKGROUND:High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat's serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. RESULTS:We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during 'spliced alignment' and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. CONCLUSIONS:PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Maji RK,Sarkar A,Khatua S,Dasgupta S,Ghosh Zdoi
10.1186/1471-2105-15-167subject
Has Abstractpub_date
2014-06-04 00:00:00pages
167issn
1471-2105pii
1471-2105-15-167journal_volume
15pub_type
杂志文章abstract:BACKGROUND:This paper summarises the lessons and experiences gained from a case study of the application of semantic web technologies to the integration of data from the bacterial species Francisella tularensis novicida (Fn). Fn data sources are disparate and heterogeneous, as multiple laboratories across the world, us...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S10-S3
更新日期:2009-10-01 00:00:00
abstract:BACKGROUND:Many biases and spurious effects are inherent in RNA-seq technology, resulting in a non-uniform distribution of sequencing read counts for each base position in a gene. Therefore, a base-level strategy is required to model the non-uniformity. Also, the properties of sequencing read counts can be leveraged to...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1780-z
更新日期:2017-08-09 00:00:00
abstract:BACKGROUND:Heritability of a phenotypic or molecular trait measures the proportion of variance that is attributable to genotypic variance. It is an important concept in breeding and genetics. Few methods are available for calculating heritability for traits derived from high-throughput sequencing. RESULTS:We propose s...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1539-6
更新日期:2017-03-02 00:00:00
abstract:BACKGROUND:Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactio...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S3-S5
更新日期:2008-04-11 00:00:00
abstract:BACKGROUND:SSWAP (Simple Semantic Web Architecture and Protocol; pronounced "swap") is an architecture, protocol, and platform for using reasoning to semantically integrate heterogeneous disparate data and services on the web. SSWAP was developed as a hybrid semantic web services technology to overcome limitations foun...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-309
更新日期:2009-09-23 00:00:00
abstract:BACKGROUND:Selective pressures at the DNA level shape genes into profiles consisting of patterns of rapidly evolving sites and sites withstanding change. These profiles remain detectable even when protein sequences become extensively diverged. A common task in molecular biology is to infer functional, structural or evo...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0688-8
更新日期:2015-08-14 00:00:00
abstract:BACKGROUND:As high-throughput sequencing applications continue to evolve, the rapid growth in quantity and variety of sequence-based data calls for the development of new software libraries and tools for data analysis and visualization. Often, effective use of these tools requires computational skills beyond those of m...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03577-4
更新日期:2020-06-29 00:00:00
abstract:BACKGROUND:Since proteins perform their functions by interacting with one another and with other biomolecules, reconstructing a map of the protein-protein interactions of a cell, experimentally or computationally, is an important first step toward understanding cellular function and machinery of a proteome. Solely deri...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-S5-S16
更新日期:2006-12-18 00:00:00
abstract:BACKGROUND:Accurate somatic mutation-calling is essential for insightful mutation analyses in cancer studies. Several mutation-callers are publicly available and more are likely to appear. Nonetheless, mutation-calling is still challenging and there is unlikely to be one established caller that systematically outperfor...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-154
更新日期:2014-05-21 00:00:00
abstract:BACKGROUND:Metabolic networks reflect the relationships between metabolites (biomolecules) and the enzymes (proteins), and are of particular interest since they describe all chemical reactions of an organism. The metabolic networks are constructed from the genome sequence of an organism, and the graphs can be used to s...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3112-y
更新日期:2019-10-15 00:00:00
abstract:BACKGROUND:The incorporation of biological knowledge can enhance the analysis of biomedical data. We present a novel method that uses a proteomic knowledge base to enhance the performance of a rule-learning algorithm in identifying putative biomarkers of disease from high-dimensional proteomic mass spectral data. In pa...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S9-S16
更新日期:2009-09-17 00:00:00
abstract:BACKGROUND:Integrative network methods are commonly used for interpretation of high-throughput experimental biological data: transcriptomics, proteomics, metabolomics and others. One of the common approaches is finding a connected subnetwork of a global interaction network that best encompasses significant individual c...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03572-9
更新日期:2020-11-18 00:00:00
abstract:BACKGROUND:Microsatellite (simple sequence repeat - SSR) and single nucleotide polymorphism (SNP) markers are two types of important genetic markers useful in genetic mapping and genotyping. Often, large-scale genomic research projects require high-throughput computer-assisted primer design. Numerous such web-based or ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-253
更新日期:2008-05-29 00:00:00
abstract:BACKGROUND:Hidden variability is a fundamentally important issue in the context of gene expression studies. Collected tissue samples may have a wide variety of hidden effects that may alter their transcriptional landscape significantly. As a result their actual differential expression pattern can be potentially distort...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-236
更新日期:2013-07-24 00:00:00
abstract:BACKGROUND:DNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms. Consequently, it rema...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2115-4
更新日期:2018-03-27 00:00:00
abstract:BACKGROUND:Sharing sets of chemical data (e.g., chemical properties, docking scores, etc.) among collaborators with diverse skill sets is a common task in computer-aided drug design and medicinal chemistry. The ability to associate this data with images of the relevant molecular structures greatly facilitates scientifi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-159
更新日期:2014-05-23 00:00:00
abstract:BACKGROUND:Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dict...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-74
更新日期:2008-01-31 00:00:00
abstract:BACKGROUND:Protein-protein interactions play a key role in biological processes of proteins within a cell. Recent high-throughput techniques have generated protein-protein interaction data in a genome-scale. A wide range of computational approaches have been applied to interactome network analysis for uncovering functi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S3-S3
更新日期:2010-04-29 00:00:00
abstract:BACKGROUND:To further our understanding of immunopeptidomics, improved tools are needed to identify peptides presented by major histocompatibility complex class I (MHC-I). Many existing tools are limited by their reliance upon chemical affinity data, which is less biologically relevant than sampling by mass spectrometr...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2561-z
更新日期:2019-01-05 00:00:00
abstract:BACKGROUND:Several studies demonstrated the feasibility of predicting bacterial antibiotic resistance phenotypes from whole-genome sequences, the prediction process usually amounting to detecting the presence of genes involved in antibiotic resistance mechanisms, or of specific mutations, previously identified from a t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2403-z
更新日期:2018-10-17 00:00:00
abstract:BACKGROUND:For the development of genome assembly tools, some comprehensive and efficiently computable validation measures are required to assess the quality of the assembly. The mostly used N50 measure summarizes the assembly results by the length of the scaffold (or contig) overlapping the midpoint of the length-orde...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-255
更新日期:2012-10-03 00:00:00
abstract:BACKGROUND:The biological network is highly dynamic. Functional relations between genes can be activated or deactivated depending on the biological conditions. On the genome-scale network, subnetworks that gain or lose local expression consistency may shed light on the regulatory mechanisms related to the changing biol...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3046-4
更新日期:2019-12-24 00:00:00
abstract:BACKGROUND:Non-linearities in observed log-ratios of gene expressions, also known as intensity dependent log-ratios, can often be accounted for by global biases in the two channels being compared. Any step in a microarray process may introduce such offsets and in this article we study the biases introduced by the micro...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-177
更新日期:2004-11-12 00:00:00
abstract:BACKGROUND:Sequence comparison is one of the most prominent tools in biological research, and is instrumental in studying gene function and evolution. The rapid development of high-throughput technologies for measuring protein interactions calls for extending this fundamental operation to the level of pathways in prote...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-199
更新日期:2006-04-10 00:00:00
abstract:BACKGROUND:Histopathology image analysis is a gold standard for cancer recognition and diagnosis. Automatic analysis of histopathology images can help pathologists diagnose tumor and cancer subtypes, alleviating the workload of pathologists. There are two basic types of tasks in digital histopathology image analysis: i...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1685-x
更新日期:2017-05-26 00:00:00
abstract:BACKGROUND:In many bacteria, intragenomic diversity in synonymous codon usage among genes has been reported. However, no quantitative attempt has been made to compare the diversity levels among different genomes. Here, we introduce a mean dissimilarity-based index (Dmean) for quantifying the level of diversity in synon...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-167
更新日期:2009-06-01 00:00:00
abstract:BACKGROUND:The ability to confidently predict health outcomes from gene expression would catalyze a revolution in molecular diagnostics. Yet, the goal of developing actionable, robust, and reproducible predictive signatures of phenotypes such as clinical outcome has not been attained in almost any disease area. Here, w...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3427-8
更新日期:2020-03-20 00:00:00
abstract:BACKGROUND:Abundant information about gene products is stored in online searchable databases such as annotation or literature. To efficiently obtain and digest such information, there is a pressing need for automated information-summarization and functional-similarity clustering of genes. RESULTS:We have developed a n...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-392
更新日期:2006-08-29 00:00:00
abstract:BACKGROUND:Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to f...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-510
更新日期:2008-12-01 00:00:00
abstract:BACKGROUND:Antibacterial peptides are one of the effecter molecules of innate immune system. Over the last few decades several antibacterial peptides have successfully approved as drug by FDA, which has prompted an interest in these antibacterial peptides. In our recent study we analyzed 999 antibacterial peptides, whi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S1-S19
更新日期:2010-01-18 00:00:00