Abstract:
BACKGROUND:In bioinformatics community, many tasks associate with matching a set of protein query sequences in large sequence datasets. To conduct multiple queries in the database, a common used method is to run BLAST on each original querey or on the concatenated queries. It is inefficient since it doesn't exploit the common subsequences shared by queries. RESULTS:We propose a compression and cluster based BLASTP (C2-BLASTP) algorithm to further exploit the joint information among the query sequences and the database. Firstly, the queries and database are compressed in turn by procedures of redundancy analysis, redundancy removal and distinction record. Secondly, the database is clustered according to Hamming distance among the subsequences. To improve the sensitivity and selectivity of sequence alignments, ten groups of reduced amino acid alphabets are used. Following this, the hits finding operator is implemented on the clustered database. Furthermore, an execution database is constructed based on the found potential hits, with the objective of mitigating the effect of increasing scale of the sequence database. Finally, the homology search is performed in the execution database. Experiments on NCBI NR database demonstrate the effectiveness of the proposed C2-BLASTP for batch searching of homology in sequence database. The results are evaluated in terms of homology accuracy, search speed and memory usage. CONCLUSIONS:It can be seen that the C2-BLASTP achieves competitive results as compared with some state-of-the-art methods.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Ge H,Sun L,Yu Jdoi
10.1186/s12859-017-1938-8subject
Has Abstractpub_date
2017-11-21 00:00:00pages
508issue
1issn
1471-2105pii
10.1186/s12859-017-1938-8journal_volume
18pub_type
杂志文章abstract:BACKGROUND:Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies. Here we report DLAD...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2463-0
更新日期:2018-12-28 00:00:00
abstract::We provide a 2007 update on the bioinformatics research in the Asia-Pacific from the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998. From 2002, APBioNet has organized the first International Conference on Bioinformatics (InCoB) bringing together scientists work...
journal_title:BMC bioinformatics
pub_type:
doi:10.1186/1471-2105-9-S1-S1
更新日期:2008-01-01 00:00:00
abstract:BACKGROUND:A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S1-S11
更新日期:2009-01-30 00:00:00
abstract:BACKGROUND:The topology of a biological pathway provides clues as to how a pathway operates, but rationally using this topology information with observed gene expression data remains a challenge. RESULTS:We introduce a new general-purpose analytic method called Mechanistic Bayesian Networks (MBNs) that allows for the ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-433
更新日期:2009-12-18 00:00:00
abstract:BACKGROUND:We previously developed GoMiner, an application that organizes lists of 'interesting' genes (for example, under-and overexpressed genes from a microarray experiment) for biological interpretation in the context of the Gene Ontology. The original version of GoMiner was oriented toward visualization and interp...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-168
更新日期:2005-07-05 00:00:00
abstract:BACKGROUND:Next-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment. One important use of the technology is the sequencing of small non-coding regulatory RNAs and the identification of the genomic locales from which they originate. Currently, there is a pauci...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-93
更新日期:2010-02-18 00:00:00
abstract:BACKGROUND:Glycation is a one of the post-translational modifications (PTM) where sugar molecules and residues in protein sequences are covalently bonded. It has become one of the clinically important PTM in recent times attributed to many chronic and age related complications. Being a non-enzymatic reaction, it is a g...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2547-x
更新日期:2019-02-04 00:00:00
abstract:BACKGROUND:Finishing is the process of improving the quality and utility of draft genome sequences generated by shotgun sequencing and computational assembly. Finishing can involve targeted sequencing. Finishing reads may be incorporated by manual or automated means. One automated method uses targeted addition by local...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-457
更新日期:2010-09-10 00:00:00
abstract:BACKGROUND:Microarray has been widely used to measure the gene expression level on the genome scale in the current decade. Many algorithms have been developed to reconstruct gene regulatory networks based on microarray data. Unfortunately, most of these models and algorithms focus on global properties of the expression...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S11-S15
更新日期:2010-12-14 00:00:00
abstract:BACKGROUND:Gene expression patterns of olfactory receptors (ORs) are an important component of the signal encoding mechanism in the olfactory system since they determine the interactions between odorant ligands and sensory neurons. We have developed the Olfactory Receptor Microarray Database (ORMD) to house OR gene exp...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-231
更新日期:2007-06-30 00:00:00
abstract:BACKGROUND:Historically, two categories of computational algorithms (alignment-based and alignment-free) have been applied to sequence comparison-one of the most fundamental issues in bioinformatics. Multiple sequence alignment, although dominantly used by biologists, possesses both fundamental as well as computational...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S6-S15
更新日期:2008-05-28 00:00:00
abstract:BACKGROUND:Here we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-0887-y
更新日期:2016-01-20 00:00:00
abstract:BACKGROUND:Protein crystal structures are potentially over-interpreted since they are routinely refined without any restraint on the upper limit of atomic B-factors. Consequently, some of their atoms, undetected in the electron density maps, are allowed to reach extremely large B-factors, even above 100 square Angstrom...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2083-8
更新日期:2018-02-23 00:00:00
abstract:BACKGROUND:Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of seque...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-186
更新日期:2008-04-10 00:00:00
abstract:BACKGROUND:Phosphorylated histone H2AX, also known as γH2AX, forms μm-sized nuclear foci at the sites of DNA double-strand breaks (DSBs) induced by ionizing radiation and other agents. Due to their specificity and sensitivity, γH2AX immunoassays have become the gold standard for studying DSB induction and repair. One o...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3370-8
更新日期:2020-01-28 00:00:00
abstract:BACKGROUND:Biological data resources have become heterogeneous and derive from multiple sources. This introduces challenges in the management and utilization of this data in software development. Although efforts are underway to create a standard format for the transmission and storage of biological data, this objectiv...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-4-51
更新日期:2003-10-28 00:00:00
abstract:BACKGROUND:Comprehensive two-dimensional gas chromatography coupled with mass spectrometry (GC × GC-MS) is a powerful technique which has gained increasing attention over the last two decades. The GC × GC-MS provides much increased separation capacity, chemical selectivity and sensitivity for complex sample analysis an...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-235
更新日期:2011-06-15 00:00:00
abstract:BACKGROUND:Reproducibility of results can have a significant impact on the acceptance of new technologies in gene expression analysis. With the recent introduction of the so-called next-generation sequencing (NGS) technology and established microarrays, one is able to choose between two completely different platforms f...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-176
更新日期:2010-04-08 00:00:00
abstract:BACKGROUND:Leishmania and other members of the Trypanosomatidae family diverged early on in eukaryotic evolution and consequently display unique cellular properties. Their apparent lack of transcriptional regulation is compensated by complex post-transcriptional control mechanisms, including the processing of polycistr...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-158
更新日期:2008-03-20 00:00:00
abstract:BACKGROUND:Under conditions of no strand bias the number of Gs is equal to that of Cs for each DNA strand; similarly, the total number of Ts is equal to that of As. However, within each strand there are considerable local deviations from the A = T and G = C equality. These asymmetries in nucleotide composition have bee...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-21
更新日期:2007-01-23 00:00:00
abstract:BACKGROUND:Expression quantitative trait loci (eQTL) mapping is often used to identify genetic loci and candidate genes correlated with traits. Although usually a group of genes affect complex traits, genes in most eQTL mapping methods are considered as independent. Recently, some eQTL mapping methods have accounted fo...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1387-9
更新日期:2016-12-13 00:00:00
abstract:BACKGROUND:Eukaryotic whole genome sequences are accumulating at an impressive rate. Effective methods for comparing multiple whole eukaryotic genomes on a large scale are needed. Most attempted solutions involve the production of large scale alignments, and many of these require a high stringency pre-screen for putati...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-204
更新日期:2004-12-17 00:00:00
abstract:BACKGROUND:Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be generated from a continuous...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1790-x
更新日期:2017-09-13 00:00:00
abstract:BACKGROUND:The ability to detect nuclei in embryos is essential for studying the development of multicellular organisms. A system of automated nuclear detection has already been tested on a set of four-dimensional (4D) Nomarski differential interference contrast (DIC) microscope images of Caenorhabditis elegans embryos...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-125
更新日期:2005-05-24 00:00:00
abstract:BACKGROUND:Collections of transcription factor binding profiles (Transfac, Jaspar) are essential to identify regulatory elements in DNA sequences. Subsets of highly similar profiles complicate large scale analysis of transcription factor binding sites. RESULTS:We propose to identify and group similar profiles using tw...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-237
更新日期:2005-09-28 00:00:00
abstract:BACKGROUND:In quantitative proteomics, peptide mapping is a valuable approach to combine positional quantitative information with topographical and domain information of proteins. Quantitative proteomic analysis of cell surface shedding is an exemplary application area of this approach. RESULTS:We developed ImproViser...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-207
更新日期:2014-06-19 00:00:00
abstract:BACKGROUND:A phylogeny postulates shared ancestry relationships among organisms in the form of a binary tree. Phylogenies attempt to answer an important question posed in biology: what are the ancestor-descendent relationships between organisms? At the core of every biological problem lies a phylogenetic component. The...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-66
更新日期:2013-02-26 00:00:00
abstract:BACKGROUND:Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain datab...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-4-16
更新日期:2003-05-06 00:00:00
abstract:BACKGROUND:Because loops connect regular secondary structures, analysis of the former depends directly on the definition of the latter. The numerous assignment methods, however, can offer different definitions. In a previous study, we defined a structural alphabet composed of 16 average protein fragments, which we call...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-58
更新日期:2004-05-12 00:00:00
abstract:BACKGROUND:DNA methylation changes are associated with a wide array of biological processes. Bisulfite conversion of DNA followed by high-throughput sequencing is increasingly being used to assess genome-wide methylation at single-base resolution. The relative slowness of most commonly used aligners for processing such...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-337
更新日期:2014-10-18 00:00:00