Abstract:
BACKGROUND:In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure. The width of the Gaussian distribution, the variability of a spatial restraint, is closely related to the reliability of the restraint information extracted from a template, and it should be accurately estimated for successful template-based protein structure modeling. RESULTS:To predict the variability of the spatial restraints in template-based modeling, we have devised a prediction model, Sigma-RF, by using the random forest (RF) algorithm. The benchmark results on 22 CASP9 targets show that the variability values from Sigma-RF are of higher correlations with the true distance deviation than those from Modeller. We assessed the effect of new sigma values by performing the single-domain homology modeling of 22 CASP9 targets and 24 CASP10 targets. For most of the targets tested, we could obtain more accurate 3D models from the identical alignments by using the Sigma-RF results than by using Modeller ones. CONCLUSIONS:We find that the average alignment quality of residues located between and at two aligned residues, quasi-local information, is the most contributing factor, by investigating the importance of input features used in the RF machine learning. This average alignment quality is shown to be more important than the previously identified quantity of a local information: the product of alignment qualities at two aligned residues.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Lee J,Lee K,Joung I,Joo K,Brooks BR,Lee Jdoi
10.1186/s12859-015-0526-zsubject
Has Abstractpub_date
2015-03-21 00:00:00pages
94issn
1471-2105pii
10.1186/s12859-015-0526-zjournal_volume
16pub_type
杂志文章abstract:BACKGROUND:The Distributed Annotation System (DAS) allows merging of DNA sequence annotations from multiple sources and provides a single annotation view. A straightforward way to establish a DAS annotation server is to use the "Lightweight DAS" server (LDAS). Onto this type of server, annotations can be uploaded as fl...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-55
更新日期:2004-05-07 00:00:00
abstract:BACKGROUND:The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where seq...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1751-4
更新日期:2017-07-17 00:00:00
abstract:BACKGROUND:Despite the widespread use of high throughput expression platforms and the availability of a desktop implementation of Gene Set Enrichment Analysis (GSEA) that enables non-experts to perform gene set based analyses, the availability of the necessary precompiled gene sets is rare for species other than human....
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-146
更新日期:2014-05-17 00:00:00
abstract:BACKGROUND:The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-525
更新日期:2008-12-08 00:00:00
abstract:BACKGROUND:Mammalian germ cells undergo meiosis to produce sperm or eggs, haploid cells that are primed to meet and propagate life. Meiosis is initiated by retinoic acid and meiotic prophase is the first and most complex stage of meiosis when homologous chromosomes pair to exchange genetic information. Errors in meiosi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-72
更新日期:2013-02-27 00:00:00
abstract:BACKGROUND:Protein-protein interactions (PPIs) play crucial roles in virtually every aspect of cellular function within an organism. Over the last decade, the development of novel high-throughput techniques has resulted in enormous amounts of data and provided valuable resources for studying protein interactions. Howev...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S7-S3
更新日期:2012-05-08 00:00:00
abstract:BACKGROUND:Human triosephosphate isomerase (HsTIM) deficiency is a genetic disease caused often by the pathogenic mutation E104D. This mutation, located at the side of an abnormally large cluster of water in the inter-subunit interface, reduces the thermostability of the enzyme. Why and how these water molecules are di...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-S16-S11
更新日期:2013-01-01 00:00:00
abstract:BACKGROUND:Clustering techniques are routinely used in gene expression data analysis to organize the massive data. Clustering techniques arrange a large number of genes or assays into a few clusters while maximizing the intra-cluster similarity and inter-cluster separation. While clustering of genes facilitates learnin...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-40
更新日期:2009-01-30 00:00:00
abstract:BACKGROUND:The cost of DNA sequencing has undergone a dramatical reduction in the past decade. As a result, sequencing technologies have been increasingly applied to genomic research. RNA-Seq is becoming a common technique for surveying gene expression based on DNA sequencing. As it is not clear how increased sequencin...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S13-S5
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:Tiling-arrays are applicable to multiple types of biological research questions. Due to its advantages (high sensitivity, resolution, unbiased), the technology is often employed in genome-wide investigations. A major challenge in the analysis of tiling-array data is to define regions-of-interest, i.e., conti...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-275
更新日期:2010-05-21 00:00:00
abstract:BACKGROUND:Sharing sets of chemical data (e.g., chemical properties, docking scores, etc.) among collaborators with diverse skill sets is a common task in computer-aided drug design and medicinal chemistry. The ability to associate this data with images of the relevant molecular structures greatly facilitates scientifi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-159
更新日期:2014-05-23 00:00:00
abstract:BACKGROUND:Transcription factor binding sites (TFBSs) are crucial in the regulation of gene transcription. Recently, chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP-chip array) has been used to identify potential regulatory sequences, but the procedure can only map the probable protein-DNA...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S12-S7
更新日期:2008-12-12 00:00:00
abstract:BACKGROUND:Since RNA molecules regulate genes and control alternative splicing by allostery, it is important to develop algorithms to predict RNA conformational switches. Some tools, such as paRNAss, RNAshapes and RNAbor, can be used to predict potential conformational switches; nevertheless, no existent tool can detec...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S5-S6
更新日期:2012-04-12 00:00:00
abstract:BACKGROUND:Clustering of protein sequences is of key importance in predicting the structure and function of newly sequenced proteins and is also of use for their annotation. With the advent of multiple high-throughput sequencing technologies, new protein sequences are becoming available at an extraordinary rate. The ra...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2080-y
更新日期:2018-03-05 00:00:00
abstract:BACKGROUND:The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB), thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been develo...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-439
更新日期:2010-08-27 00:00:00
abstract:BACKGROUND:Short oligonucleotide arrays have several probes measuring the expression level of each target transcript. Therefore the selection of probes is a key component for the quality of measurements. However, once probes have been selected and synthesized on an array, it is still possible to re-evaluate the results...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-111
更新日期:2004-08-14 00:00:00
abstract:BACKGROUND:Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case-control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Meth...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-61
更新日期:2013-02-22 00:00:00
abstract:BACKGROUND:PCR clonal artefacts originating from NGS library preparation can affect both genomic as well as RNA-Seq applications when protocols are pushed to their limits. In RNA-Seq however the artifactual reads are not easy to tell apart from normal read duplication due to natural over-sequencing of highly expressed ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1276-2
更新日期:2016-10-21 00:00:00
abstract:BACKGROUND:Many statistical methods have been proposed to identify disease biomarkers from gene expression profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study. In this paper, we develop a nove...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-416
更新日期:2008-10-06 00:00:00
abstract:BACKGROUND:Mixed models have a long and fruitful history in statistics. They are pertinent to genomics problems because they are highly versatile, accommodating a wide variety of situations within the same theoretical and algorithmic framework. RESULTS:Qxpak is a package for versatile statistical genomics, specificall...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-202
更新日期:2011-05-25 00:00:00
abstract:BACKGROUND:MicroRNAs (miRNA) are regulatory genes that target and repress other RNA molecules via sequence-specific binding. Several biological processes are regulated across many organisms by evolutionarily conserved miRNAs. Plants and invertebrates employ their miRNA in defense against viruses by targeting and degrad...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-S2-S3
更新日期:2013-01-01 00:00:00
abstract:BACKGROUND:Ontologies are widely used to represent knowledge in biomedicine. Systematic approaches for detecting errors and disagreements are needed for large ontologies with hundreds or thousands of terms and semantic relationships. A recent approach of defining terms using logical definitions is now increasingly bein...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-418
更新日期:2011-10-27 00:00:00
abstract:BACKGROUND:In real-time PCR, it is necessary to consider the efficiency of amplification (EA) of amplicons in order to determine initial target levels properly. EAs can be deduced from standard curves, but these involve extra effort and cost and may yield invalid EAs. Alternatively, EA can be extracted from individual ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-95
更新日期:2008-02-12 00:00:00
abstract:BACKGROUND:We introduce the decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information (ProCKSI). ProCKSI integrates various protein similarity measures through an easy to use interface that allows the comparison of multiple proteins simultaneously. It employs the Universal Simila...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-416
更新日期:2007-10-26 00:00:00
abstract:BACKGROUND:DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations. RESULTS:Here, we describe a method to detect copy number varia...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-80
更新日期:2009-03-06 00:00:00
abstract:BACKGROUND:While researchers have utilized versions of the Affymetrix human GeneChip for the assessment of expression patterns in non human primate (NHP) samples, there has been no comprehensive sequence analysis study undertaken to demonstrate that the probe sequences designed to detect human transcripts are reliably ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-165
更新日期:2004-10-26 00:00:00
abstract:BACKGROUND:Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on. Unique subsequences are markers of special interest in genome comparison, expression profiling, and genetic engineering. Relat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-167
更新日期:2008-03-26 00:00:00
abstract:BACKGROUND:We introduce a novel method, called PuFFIN, that takes advantage of paired-end short reads to build genome-wide nucleosome maps with larger numbers of detected nucleosomes and higher accuracy than existing tools. In contrast to other approaches that require users to optimize several parameters according to t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S9-S11
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:Abruptness of pigment patterns at the periphery of a skin lesion is one of the most important dermoscopic features for detection of malignancy. In current clinical setting, abrupt cutoff of a skin lesion determined by an examination of a dermatologist. This process is subjective, nonquantitative, and error-p...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1892-5
更新日期:2017-12-28 00:00:00
abstract:BACKGROUND:The information in large collections of phylogenetic trees is useful for many comparative genomic studies. Therefore, there is a need for flexible tools that allow exploration of such collections in order to retrieve relevant data as quickly as possible. RESULTS:In this paper, we present TPMS (Tree Pattern-...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-109
更新日期:2013-03-27 00:00:00