Abstract:
BACKGROUND:High throughput DNA/RNA sequencing has revolutionized biological and clinical research. Sequencing is widely used, and generates very large amounts of data, mainly due to reduced cost and advanced technologies. Quickly assessing the quality of giga-to-tera base levels of sequencing data has become a routine but important task. Identification and elimination of low-quality sequence data is crucial for reliability of downstream analysis results. There is a need for a high-speed tool that uses optimized parallel programming for batch processing and simply gauges the quality of sequencing data from multiple datasets independent of any other processing steps. RESULTS:FQStat is a stand-alone, platform-independent software tool that assesses the quality of FASTQ files using parallel programming. Based on the machine architecture and input data, FQStat automatically determines the number of cores and the amount of memory to be allocated per file for optimum performance. Our results indicate that in a core-limited case, core assignment overhead exceeds the benefit of additional cores. In a core-unlimited case, there is a saturation point reached in performance by increasingly assigning additional cores per file. We also show that memory allocation per file has a lower priority in performance when compared to the allocation of cores. FQStat's output is summarized in HTML web page, tab-delimited text file, and high-resolution image formats. FQStat calculates and plots read count, read length, quality score, and high-quality base statistics. FQStat identifies and marks low-quality sequencing data to suggest removal from downstream analysis. We applied FQStat on real sequencing data to optimize performance and to demonstrate its capabilities. We also compared FQStat's performance to similar quality control (QC) tools that utilize parallel programming and attained improvements in run time. CONCLUSIONS:FQStat is a user-friendly tool with a graphical interface that employs a parallel programming architecture and automatically optimizes its performance to generate quality control statistics for sequencing data. Unlike existing tools, these statistics are calculated for multiple datasets and separately at the "lane," "sample," and "experiment" level to identify subsets of the samples with low quality, thereby preventing the loss of complete samples when reliable data can still be obtained.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Chanumolu SK,Albahrani M,Otu HHdoi
10.1186/s12859-019-3015-ysubject
Has Abstractpub_date
2019-08-15 00:00:00pages
424issue
1issn
1471-2105pii
10.1186/s12859-019-3015-yjournal_volume
20pub_type
杂志文章abstract:BACKGROUND:In quantitative proteomics, peptide mapping is a valuable approach to combine positional quantitative information with topographical and domain information of proteins. Quantitative proteomic analysis of cell surface shedding is an exemplary application area of this approach. RESULTS:We developed ImproViser...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-207
更新日期:2014-06-19 00:00:00
abstract:BACKGROUND:The detection of genomic copy number alterations (CNA) in cancer based on SNP arrays requires methods that take into account tumour specific factors such as normal cell contamination and tumour heterogeneity. A number of tools have been recently developed but their performance needs yet to be thoroughly asse...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-192
更新日期:2012-08-07 00:00:00
abstract:BACKGROUND:Proton Magnetic Resonance (MR) Spectroscopy (MRS) is a widely available technique for those clinical centres equipped with MR scanners. Unlike the rest of MR-based techniques, MRS yields not images but spectra of metabolites in the tissues. In pathological situations, the MRS profile changes and this has bee...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-581
更新日期:2010-11-29 00:00:00
abstract:BACKGROUND:Drug combinations have the potential to improve efficacy while limiting toxicity. To robustly identify synergistic combinations, high-throughput screens using full dose-response surface are desirable but require an impractical number of data points. Screening of a sparse number of doses per drug allows to sc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2642-7
更新日期:2019-02-18 00:00:00
abstract:BACKGROUND:Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context. RESU...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-141
更新日期:2011-05-09 00:00:00
abstract:BACKGROUND:Mechanotransduction in bone cells plays a pivotal role in osteoblast differentiation and bone remodelling. Mechanotransduction provides the link between modulation of the extracellular matrix by mechanical load and intracellular activity. By controlling the balance between the intracellular and extracellular...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3394-0
更新日期:2020-03-18 00:00:00
abstract:BACKGROUND:Bioluminescent proteins (BLPs) widely exist in many living organisms. As BLPs are featured by the capability of emitting lights, they can be served as biomarkers and easily detected in biomedical research, such as gene expression analysis and signal transduction pathways. Therefore, accurate identification o...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1709-6
更新日期:2017-06-05 00:00:00
abstract:BACKGROUND:Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted cont...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2032-6
更新日期:2018-01-25 00:00:00
abstract:BACKGROUND:The traditional phylogeny analysis within gene family is mainly based on DNA or amino acid sequence homologies. However, these phylogenetic tree analyses are not suitable for those "non-traditional" gene families like microRNA with very short sequences. For the normal protein-coding gene families, low bootst...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-79
更新日期:2011-03-18 00:00:00
abstract:BACKGROUND:The human immunodeficiency virus type 1 (HIV-1) aspartic protease is an important enzyme owing to its imperative part in viral development and a causative agent of deadliest disease known as acquired immune deficiency syndrome (AIDS). Development of HIV-1 protease inhibitors can help understand the specifici...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1337-6
更新日期:2016-12-23 00:00:00
abstract:BACKGROUND:Antibacterial peptides are one of the effecter molecules of innate immune system. Over the last few decades several antibacterial peptides have successfully approved as drug by FDA, which has prompted an interest in these antibacterial peptides. In our recent study we analyzed 999 antibacterial peptides, whi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S1-S19
更新日期:2010-01-18 00:00:00
abstract:BACKGROUND:Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method. However, int...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-52
更新日期:2008-01-25 00:00:00
abstract:BACKGROUND:Scientific workflows management systems are increasingly used to specify and manage bioinformatics experiments. Their programming model appeals to bioinformaticians, who can use them to easily specify complex data processing pipelines. Such a model is underpinned by a graph structure, where nodes represent b...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S1-S12
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. In order to fully capture the intrinsic value and knowledge expressed within them, we need to take advantage of their inner structure, which implicitly co...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-265
更新日期:2012-10-15 00:00:00
abstract:BACKGROUND:A massive amount of proteomic data is generated on a daily basis, nonetheless annotating all sequences is costly and often unfeasible. As a countermeasure, machine learning methods have been used to automatically annotate new protein functions. More specifically, many studies have investigated hierarchical m...
journal_title:BMC bioinformatics
pub_type: 杂志文章,评审
doi:10.1186/s12859-019-3060-6
更新日期:2019-09-23 00:00:00
abstract:BACKGROUND:New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bio...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S14-S4
更新日期:2009-11-10 00:00:00
abstract:BACKGROUND:SSWAP (Simple Semantic Web Architecture and Protocol; pronounced "swap") is an architecture, protocol, and platform for using reasoning to semantically integrate heterogeneous disparate data and services on the web. SSWAP was developed as a hybrid semantic web services technology to overcome limitations foun...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-309
更新日期:2009-09-23 00:00:00
abstract:BACKGROUND:With the introduction of tissue microarrays (TMAs) researchers can investigate gene and protein expression in tissues on a high-throughput scale. TMAs generate a wealth of data calling for extended, high level data management. Enhanced data analysis and systematic data management are required for traceabilit...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-81
更新日期:2007-03-07 00:00:00
abstract:BACKGROUND:The rapid pace of bioscience research makes it very challenging to track relevant articles in one's area of interest. MEDLINE, a primary source for biomedical literature, offers access to more than 20 million citations with three-quarters of a million new ones added each year. Thus it is not surprising to se...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0630-0
更新日期:2015-06-20 00:00:00
abstract:BACKGROUND:Interaction of a drug or chemical with a biological system can result in a gene-expression profile or signature characteristic of the event. Using a suitably robust algorithm these signatures can potentially be used to connect molecules with similar pharmacological or toxicological properties by gene express...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-258
更新日期:2008-06-02 00:00:00
abstract:BACKGROUND:Endometrial cancers (ECs) are one of the most common types of malignant tumor in females. Substantial efforts had been made to identify significantly mutated genes (SMGs) in ECs and use them as biomarkers for the classification of histological subtypes and the prediction of clinical outcomes. However, the im...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1891-6
更新日期:2017-12-28 00:00:00
abstract:BACKGROUND:The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-104
更新日期:2013-03-22 00:00:00
abstract:BACKGROUND:Direct in vivo investigation of human metabolism is complicated by the distinct metabolic functions of various sub-cellular organelles. Diverse micro-environments in different organelles may lead to distinct functions of the same protein and the use of different enzymes for the same metabolic reaction. To be...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-393
更新日期:2010-07-22 00:00:00
abstract:BACKGROUND:Historically, two categories of computational algorithms (alignment-based and alignment-free) have been applied to sequence comparison-one of the most fundamental issues in bioinformatics. Multiple sequence alignment, although dominantly used by biologists, possesses both fundamental as well as computational...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S6-S15
更新日期:2008-05-28 00:00:00
abstract:BACKGROUND:Recently differential variability has been showed to be valuable in evaluating the association of DNA methylation to the risks of complex human diseases. The statistical tests based on both differential methylation level and differential variability can be more powerful than those based only on differential ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2185-3
更新日期:2018-05-18 00:00:00
abstract:BACKGROUND:FREGENE simulates sequence-level data over large genomic regions in large populations. Because, unlike coalescent simulators, it works forwards through time, it allows complex scenarios of selection, demography, and recombination to be modelled simultaneously. Detailed tracking of sites under selection is im...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-364
更新日期:2008-09-08 00:00:00
abstract:BACKGROUND:Many cancer genomes are extensively rearranged with highly aberrant chromosomal karyotypes. Structural and copy number variations in cancer genomes can be determined via abnormal mapping of sequenced reads to the reference genome. Recently it became possible to reconcile both of these types of large-scale va...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3208-4
更新日期:2019-12-17 00:00:00
abstract:BACKGROUND:REX1 and REX2 are protein components of the RNA editing complex (the editosome) and function as exouridylylases. The exact roles of REX1 and REX2 in the editosome are unclear and the consequences of the presence of two related proteins are not fully understood. Here, a variety of computational studies were p...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-305
更新日期:2006-06-16 00:00:00
abstract:BACKGROUND:Data are the evidentiary basis for scientific hypotheses, analyses and publication, for policy formation and for decision-making. They are essential to the evaluation and testing of results by peer scientists both present and future. There is broad consensus in the scientific and conservation communities tha...
journal_title:BMC bioinformatics
pub_type: 指南,杂志文章
doi:10.1186/1471-2105-12-S15-S1
更新日期:2011-01-01 00:00:00
abstract:BACKGROUND:The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA) is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of seq...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-138
更新日期:2012-06-20 00:00:00