Abstract:
BACKGROUND:Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. RESULTS:The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes. CONCLUSIONS:The BSF is a highly efficient pairwise similarity algorithm that can scale to billions of comparisons without the need for specialized hardware.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Lee JY,Fujimoto GM,Wilson R,Wiley HS,Payne SHdoi
10.1186/s12859-018-2210-6subject
Has Abstractpub_date
2018-06-11 00:00:00pages
221issue
1issn
1471-2105pii
10.1186/s12859-018-2210-6journal_volume
19pub_type
杂志文章abstract:BACKGROUND:The nonparametric trend test (NPT) is well suitable for identifying the genetic variants associated with quantitative traits when the trait values do not satisfy the normal distribution assumption. If the genetic model, defined according to the mode of inheritance, is known, the NPT derived under the given g...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-0888-x
更新日期:2016-01-28 00:00:00
abstract::Following publication of the original article [1], the author reported that there are several errors in the original article. ...
journal_title:BMC bioinformatics
pub_type: 杂志文章,已发布勘误
doi:10.1186/s12859-019-3318-z
更新日期:2020-01-22 00:00:00
abstract:BACKGROUND:The statistical modeling of biomedical corpora could yield integrated, coarse-to-fine views of biological phenomena that complement discoveries made from analysis of molecular sequence and profiling data. Here, the potential of such modeling is demonstrated by examining the 5,225 free-text items in the Caeno...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-250
更新日期:2006-05-08 00:00:00
abstract:BACKGROUND:When constructing new biomarker or gene signature scores for time-to-event outcomes, the underlying aims are to develop a discrimination model that helps to predict whether patients have a poor or good prognosis and to identify the most influential variables for this task. In practice, this is often done fit...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1149-8
更新日期:2016-07-22 00:00:00
abstract:BACKGROUND:Next-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment. One important use of the technology is the sequencing of small non-coding regulatory RNAs and the identification of the genomic locales from which they originate. Currently, there is a pauci...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-93
更新日期:2010-02-18 00:00:00
abstract:BACKGROUND:Direct in vivo investigation of human metabolism is complicated by the distinct metabolic functions of various sub-cellular organelles. Diverse micro-environments in different organelles may lead to distinct functions of the same protein and the use of different enzymes for the same metabolic reaction. To be...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-393
更新日期:2010-07-22 00:00:00
abstract:BACKGROUND:Prioritizing disease genes is trying to identify potential disease causing genes for a given phenotype, which can be applied to reveal the inherited basis of human diseases and facilitate drug development. Our motivation is inspired by label propagation algorithm and the false positive protein-protein intera...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2040-6
更新日期:2018-02-08 00:00:00
abstract:BACKGROUND:Many centrality measures have been proposed to mine and characterize the correlations between network topological properties and protein essentiality. However, most of them show limited prediction accuracy, and the number of common predicted essential proteins by different methods is very small. RESULTS:In ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1166-7
更新日期:2016-08-25 00:00:00
abstract:BACKGROUND:Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings i...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-421
更新日期:2009-12-15 00:00:00
abstract:BACKGROUND:High-throughput technology allows for genome-wide measurements at different molecular levels for the same patient, e.g. single nucleotide polymorphisms (SNPs) and gene expression. Correspondingly, it might be beneficial to also integrate complementary information from different molecular levels when building...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1183-6
更新日期:2016-08-30 00:00:00
abstract:BACKGROUND:Identifying differentially abundant features between different experimental groups is a common goal for many metabolomics and proteomics studies. However, analyzing data from mass spectrometry (MS) is difficult because the data may not be normally distributed and there is often a large fraction of zero value...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3067-z
更新日期:2019-10-17 00:00:00
abstract::The RNA polymerase NS5B of Hepatitis C virus (HCV) is a well-characterised drug target with an active site and four allosteric binding sites. This work presents a workflow for virtual screening and its application to Drug Bank screening targeting the Hepatitis C Virus (HCV) RNA polymerase non-nucleoside binding sites....
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S17-S5
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:Interest in de novo genome assembly has been renewed in the past decade due to rapid advances in high-throughput sequencing (HTS) technologies which generate relatively short reads resulting in highly fragmented assemblies consisting of contigs. Additional long-range linkage information is typically used to ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S9-S9
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:A cross-correlation (XCorr) score function is one of the most popular score functions utilized to search peptide identifications in databases, and many computer programs, such as SEQUEST, Comet, and Tide, currently use this score function. Recently, the HiXCorr algorithm was developed to speed up this score ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2559-6
更新日期:2018-12-12 00:00:00
abstract:BACKGROUND:Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-183
更新日期:2009-06-15 00:00:00
abstract:BACKGROUND:Many methods for species tree inference require data from a sufficiently large sample of genomic loci in order to produce accurate estimates. However, few studies have attempted to use analytical theory to quantify "sufficiently large". RESULTS:Using the multispecies coalescent model, we report a general an...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1266-4
更新日期:2016-11-11 00:00:00
abstract:BACKGROUND:HIV/AIDS is a serious threat to public health. The emergence of drug resistance mutations diminishes the effectiveness of drug therapy for HIV/AIDS. Developing a computational prediction of drug resistance phenotype will enable efficient and timely selection of the best treatment regimens. RESULTS:A unified...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1114-6
更新日期:2016-08-31 00:00:00
abstract:BACKGROUND:In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multip...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1273-5
更新日期:2016-10-03 00:00:00
abstract:BACKGROUND:The creation of a complete genome-wide map of transcription factor binding sites is essential for understanding gene regulatory networks in vivo. However, current prediction methods generally rely on statistical models that imperfectly model transcription factor binding. Generation of new prediction methods ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-62
更新日期:2011-02-25 00:00:00
abstract:BACKGROUND:Reliable prediction of antibody, or B-cell, epitopes remains challenging yet highly desirable for the design of vaccines and immunodiagnostics. A correlation between antigenicity, solvent accessibility, and flexibility in proteins was demonstrated. Subsequently, Thornton and colleagues proposed a method for ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-514
更新日期:2008-12-02 00:00:00
abstract:BACKGROUND:Network co-regulated modules are believed to have the functionality of packaging multiple biological entities, and can thus be assumed to coordinate many biological functions in their network neighbouring regions. RESULTS:Here, we weighted edges of a human protein interaction network and a transcriptional r...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-392
更新日期:2010-07-22 00:00:00
abstract:BACKGROUND:The improvements of high throughput technologies have produced large amounts of multi-omics experiments datasets. Initial analysis of these data has revealed many concurrent gene alterations within single dataset or/and among multiple omics datasets. Although powerful bioinformatics pipelines have been devel...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3171-0
更新日期:2019-11-08 00:00:00
abstract::Time course gene expression experiments are a popular means to infer co-expression. Many methods have been proposed to cluster genes or to build networks based on similarity measures of their expression dynamics. In this paper we apply a correlation based approach to network reconstruction to three datasets of time se...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-S1-S16
更新日期:2007-03-08 00:00:00
abstract:BACKGROUND:Biological molecules are often asymmetric with respect to stereochemistry, and correct stereochemistry is essential to their function. Molecular dynamics simulations of biomolecules have increasingly become an integral part of biophysical research. However, stereochemical errors in biomolecular structures ca...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-190
更新日期:2011-05-23 00:00:00
abstract:BACKGROUND:Set comparisons permeate a large number of data analysis workflows, in particular workflows in biological sciences. Venn diagrams are frequently employed for such analysis but current tools are limited. RESULTS:We have developed InteractiVenn, a more flexible tool for interacting with Venn diagrams includin...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0611-3
更新日期:2015-05-22 00:00:00
abstract:BACKGROUND:Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on c...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-272
更新日期:2010-05-20 00:00:00
abstract:BACKGROUND:Identification of functional elements of a genome often requires dividing a sequence of measurements along a genome into segments where adjacent segments have different properties, such as different mean values. Despite dozens of algorithms developed to address this problem in genomics research, methods with...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2140-3
更新日期:2018-04-11 00:00:00
abstract:BACKGROUND:We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, ma...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-101
更新日期:2004-07-26 00:00:00
abstract:BACKGROUND:As a novel cancer diagnostic paradigm, mass spectroscopic serum proteomic pattern diagnostics was reported superior to the conventional serologic cancer biomarkers. However, its clinical use is not fully validated yet. An important factor to prevent this young technology to become a mainstream cancer diagnos...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S1-S1
更新日期:2010-01-18 00:00:00
abstract:BACKGROUND:Mechanistic models are becoming more and more popular in Systems Biology; identification and control of models underlying biochemical pathways of interest in oncology is a primary goal in this field. Unfortunately the scarce availability of data still limits our understanding of the intrinsic characteristics...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S12-S4
更新日期:2009-10-15 00:00:00