Abstract:
BACKGROUND:PSI-BLAST, an extremely popular tool for sequence similarity search, features the utilization of Position-Specific Scoring Matrix (PSSM) constructed from a multiple sequence alignment (MSA). PSSM allows the detection of more distant homologs than a general amino acid substitution matrix does. An accurate estimation of the weights for sequences in an MSA is crucially important for PSSM construction. PSI-BLAST divides a given MSA into multiple blocks, for which sequence weights are calculated. When the block width becomes very narrow, the sequence weight calculation can be odd. RESULTS:We demonstrate that PSI-BLAST indeed generates a significant fraction of blocks having width less than 5, thereby degrading the PSI-BLAST performance. We revised the code of PSI-BLAST to prevent the blocks from being narrower than a given minimum block width (MBW). We designate the modified application of PSI-BLAST as PSI-BLASTexB. When MBW is 25, PSI-BLASTexB notably outperforms PSI-BLAST consistently for three independent benchmark sets. The performance boost is even more drastic when an MSA, instead of a sequence, is used as a query. CONCLUSIONS:Our results demonstrate that the generation of narrow-width blocks during the sequence weight calculation is a critically important factor that restricts the PSI-BLAST search performance. By preventing narrow blocks, PSI-BLASTexB upgrades the PSI-BLAST performance remarkably. Binaries and source codes of PSI-BLASTexB (MBW = 25) are available at https://github.com/kyungtaekLIM/PSI-BLASTexB .
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Oda T,Lim K,Tomii Kdoi
10.1186/s12859-017-1686-9subject
Has Abstractpub_date
2017-06-02 00:00:00pages
288issue
1issn
1471-2105pii
10.1186/s12859-017-1686-9journal_volume
18pub_type
杂志文章abstract:BACKGROUND:Protein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted. Consequently, assessing the qualities of predicted protein models in perspective is one of the key components of successful protein structure predict...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1691-z
更新日期:2017-05-25 00:00:00
abstract:BACKGROUND:Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting m...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-44
更新日期:2011-02-02 00:00:00
abstract:BACKGROUND:Polychromatic flow cytometry is a popular technique that has wide usage in the medical sciences, especially for studying phenotypic properties of cells. The high-dimensionality of data generated by flow cytometry usually makes it difficult to visualize. The naive solution of simply plotting two-dimensional g...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1662-4
更新日期:2017-06-07 00:00:00
abstract:BACKGROUND:As phenotypic features derived from heritable characters, the topologies of metabolic pathways contain both phylogenetic and phenetic components. In the post-genomic era, it is possible to measure the "phylophenetic" contents of different pathways topologies from a global perspective. RESULTS:We reconstruct...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-252
更新日期:2006-05-09 00:00:00
abstract:BACKGROUND:Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published da...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-203
更新日期:2008-04-21 00:00:00
abstract:BACKGROUND:Protein sequence alignment analyses have become a crucial step for many bioinformatics studies during the past decades. Multiple sequence alignment (MSA) and pair-wise sequence alignment (PSA) are two major approaches in sequence alignment. Former benchmark studies revealed drawbacks of MSA methods on nucleo...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2524-4
更新日期:2018-12-31 00:00:00
abstract:BACKGROUND:The knowledge base-driven pathway analysis is becoming the first choice for many investigators, in that it not only can reduce the complexity of functional analysis by grouping thousands of genes into just several hundred pathways, but also can increase the explanatory power for the experiment by identifying...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1285-1
更新日期:2016-10-06 00:00:00
abstract:BACKGROUND:Research in life sciences is benefiting from a large availability of formal description techniques and analysis methodologies. These allow both the phenomena investigated to be precisely modeled and virtual experiments to be performed in silico. Such experiments may result in easier, faster, and satisfying a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S4-S7
更新日期:2008-04-25 00:00:00
abstract:BACKGROUND:determining beforehand specific positions to align (anchor points) has proved valuable for the accuracy of automated multiple sequence alignment (MSA) software. This feature can be used manually to include biological expertise, or automatically, usually by pairwise similarity searches. Multiple local similar...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-445
更新日期:2010-09-02 00:00:00
abstract:BACKGROUND:The Distributed Annotation System (DAS) allows merging of DNA sequence annotations from multiple sources and provides a single annotation view. A straightforward way to establish a DAS annotation server is to use the "Lightweight DAS" server (LDAS). Onto this type of server, annotations can be uploaded as fl...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-55
更新日期:2004-05-07 00:00:00
abstract:BACKGROUND:The traditional phylogeny analysis within gene family is mainly based on DNA or amino acid sequence homologies. However, these phylogenetic tree analyses are not suitable for those "non-traditional" gene families like microRNA with very short sequences. For the normal protein-coding gene families, low bootst...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-79
更新日期:2011-03-18 00:00:00
abstract:BACKGROUND:Transcription factors are known to play key roles in carcinogenesis and therefore, are gaining popularity as potential therapeutic targets in drug development. A 'master regulator' transcription factor often appears to control most of the regulatory activities of the other transcription factors and the assoc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1499-x
更新日期:2017-02-02 00:00:00
abstract:BACKGROUND:Current malaria diagnosis relies primarily on microscopic examination of Giemsa-stained thick and thin blood films. This method requires vigorously trained technicians to efficiently detect and classify the malaria parasite species such as Plasmodium falciparum (Pf) and Plasmodium vivax (Pv) for an appropria...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S17-S18
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:Mecp2 null mice model Rett syndrome (RTT) a human neurological disorder affecting females after apparent normal pre- and peri-natal developmental periods. Neuroanatomical studies in cerebral cortex of RTT mouse models revealed delayed maturation of neuronal morphology and autonomous as well as non-cell auton...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0859-7
更新日期:2016-01-20 00:00:00
abstract:BACKGROUND:Event sequences where different types of events often occur close together arise, e.g., when studying potential transcription factor binding sites (TFBS, events) of certain transcription factors (TF, types) in a DNA sequence. These events tend to occur in bursts: in some genomic regions there are more genes ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-336
更新日期:2008-08-08 00:00:00
abstract:BACKGROUND:Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experiments is how to estimate the sample size required for good statistical power. What is the projected sample size or number of replicate chips need...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-84
更新日期:2006-02-22 00:00:00
abstract:BACKGROUND:In the adaptive immune system, variable regions of immunoglobulin (IG) are encoded by random recombination of variable (V), diversity (D), and joining (J) gene segments in the germline. Partitioning the functional antibody sequences to their sourcing germline gene segments is vital not only for understanding...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S12-S20
更新日期:2008-12-12 00:00:00
abstract:BACKGROUND:Reliability and Reproducibility of differentially expressed genes (DEGs) are essential for the biological interpretation of microarray data. The microarray quality control (MAQC) project launched by US Food and Drug Administration (FDA) elucidated that the lists of DEGs generated by intra- and inter-platform...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-143
更新日期:2013-04-29 00:00:00
abstract:BACKGROUND:The exponential growth of gigantic biological data from various sources, such as protein-protein interaction (PPI), genome sequences scaffolding, Mass spectrometry (MS) molecular networking and metabolic flux, demands an efficient way for better visualization and interpretation beyond the conventional, two-d...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-322
更新日期:2013-11-14 00:00:00
abstract:BACKGROUND:Mixed models have a long and fruitful history in statistics. They are pertinent to genomics problems because they are highly versatile, accommodating a wide variety of situations within the same theoretical and algorithmic framework. RESULTS:Qxpak is a package for versatile statistical genomics, specificall...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-202
更新日期:2011-05-25 00:00:00
abstract:BACKGROUND:Inferring gene regulatory network (GRN) has been an important topic in Bioinformatics. Many computational methods infer the GRN from high-throughput expression data. Due to the presence of time delays in the regulatory relationships, High-Order Dynamic Bayesian Network (HO-DBN) is a good model of GRN. Howeve...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0823-6
更新日期:2015-11-25 00:00:00
abstract:BACKGROUND:Cartilage damage is a crucial feature involved in several pathological conditions characterized by joint disorders, such as osteoarthritis and rheumatoid arthritis. Accumulated evidences showed that Wnt/β-catenin pathway plays a role in the pathogenesis of cartilage damage. In addition, it is experimentally ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2981-4
更新日期:2019-07-31 00:00:00
abstract:BACKGROUND:Typical evolutionary events like recombination, hybridization or gene transfer make necessary the use of phylogenetic networks to properly depict the evolution of DNA and protein sequences. Although several theoretical classes have been proposed to characterize these networks, they make stringent assumptions...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-268
更新日期:2010-05-20 00:00:00
abstract:BACKGROUND:PCR clonal artefacts originating from NGS library preparation can affect both genomic as well as RNA-Seq applications when protocols are pushed to their limits. In RNA-Seq however the artifactual reads are not easy to tell apart from normal read duplication due to natural over-sequencing of highly expressed ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1276-2
更新日期:2016-10-21 00:00:00
abstract:BACKGROUND:Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now poss...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0377-z
更新日期:2014-11-19 00:00:00
abstract:BACKGROUND:For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significant...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-471
更新日期:2007-12-03 00:00:00
abstract:BACKGROUND:Two important challenges in the analysis of molecular biology information are data (multi-omic information) integration and the detection of patterns across large scale molecular networks and sequences. They are are actually coupled beause the integration of omic information may provide better means to detec...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2175-5
更新日期:2018-07-09 00:00:00
abstract:BACKGROUND AND GOAL:The Random Forest (RF) algorithm for regression and classification has considerably gained popularity since its introduction in 2001. Meanwhile, it has grown to a standard classification approach competing with logistic regression in many innovation-friendly scientific fields. RESULTS:In this conte...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2264-5
更新日期:2018-07-17 00:00:00
abstract:BACKGROUND:Tumors have been hypothesized to be the result of a mixture of oncogenic events, some of which will be reflected in the gene expression of the tumor. Based on this hypothesis a variety of data-driven methods have been employed to decompose tumor expression profiles into component profiles, hypothetically lin...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S1-S20
更新日期:2009-01-30 00:00:00
abstract:BACKGROUND:Managing and organizing biological knowledge remains a major challenge, due to the complexity of living systems. Recently, systemic representations have been promising in tackling such a challenge at the whole-cell scale. In such representations, the cell is considered as a system composed of interlocked sub...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03637-9
更新日期:2020-07-23 00:00:00