Abstract:
BACKGROUND:Protein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted. Consequently, assessing the qualities of predicted protein models in perspective is one of the key components of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, which could be roughly divided into three categories: single methods, quasi-single methods and clustering (or consensus) methods. Although these methods achieve much success at different levels, accurate protein model quality assessment is still an open problem. RESULTS:Here, we present the MQAPRank, a global protein model quality assessment program based on learning-to-rank. The MQAPRank first sorts the decoy models by using single method based on learning-to-rank algorithm to indicate their relative qualities for the target protein. And then it takes the first five models as references to predict the qualities of other models by using average GDT_TS scores between reference models and other models. Benchmarked on CASP11 and 3DRobot datasets, the MQAPRank achieved better performances than other leading protein model quality assessment methods. Recently, the MQAPRank participated in the CASP12 under the group name FDUBio and achieved the state-of-the-art performances. CONCLUSIONS:The MQAPRank provides a convenient and powerful tool for protein model quality assessment with the state-of-the-art performances, it is useful for protein structure prediction and model quality assessment usages.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Jing X,Dong Qdoi
10.1186/s12859-017-1691-zsubject
Has Abstractpub_date
2017-05-25 00:00:00pages
275issue
1issn
1471-2105pii
10.1186/s12859-017-1691-zjournal_volume
18pub_type
杂志文章abstract:BACKGROUND:With the ever increasing use of computational models in the biosciences, the need to share models and reproduce the results of published studies efficiently and easily is becoming more important. To this end, various standards have been proposed that can be used to describe models, simulations, data or other...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0369-z
更新日期:2014-12-14 00:00:00
abstract:BACKGROUND:Protein subcellular localization is an important determinant of protein function and hence, reliable methods for prediction of localization are needed. A number of prediction algorithms have been developed based on amino acid compositions or on the N-terminal characteristics (signal peptides) of proteins. Ho...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-152
更新日期:2005-06-17 00:00:00
abstract:BACKGROUND:Most known eukaryotic genomes contain mobile copied elements called transposable elements. In some species, these elements account for the majority of the genome sequence. They have been subject to many mutations and other genomic events (copies, deletions, captures) during transposition. The identification ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-474
更新日期:2010-09-22 00:00:00
abstract:BACKGROUND:Elevated sequencing error rates are the most predominant obstacle in single-nucleotide polymorphism (SNP) detection, which is a major goal in the bulk of current studies using next-generation sequencing (NGS). Beyond routinely handled generic sources of errors, certain base calling errors relate to specific ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-S5-S1
更新日期:2013-01-01 00:00:00
abstract:BACKGROUND:Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be generated from a continuous...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1790-x
更新日期:2017-09-13 00:00:00
abstract:: ...
journal_title:BMC bioinformatics
pub_type: 历史文章,杂志文章
doi:10.1186/s12859-019-2618-7
更新日期:2019-03-14 00:00:00
abstract:BACKGROUND:During the last few years, DNA sequence analysis has become one of the primary means of taxonomic identification of species, particularly so for species that are minute or otherwise lack distinct, readily obtainable morphological characters. Although the number of sequences available for comparison in public...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-178
更新日期:2005-07-18 00:00:00
abstract:BACKGROUND:Introduction of spaced speeds opened a way of sensitivity improvement in homology search without loss of search speed. Since then, the efforts of finding optimal seed which maximizes the sensitivity have been continued today. The sensitivity of a seed is generally computed by its hit probability. However, th...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S1-S37
更新日期:2010-01-18 00:00:00
abstract:BACKGROUND:The use of classification algorithms is becoming increasingly important for the field of computational biology. However, not only the quality of the classification, but also its biological interpretation is important. This interpretation may be eased if interacting elements can be identified and visualized, ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-139
更新日期:2014-05-12 00:00:00
abstract:BACKGROUND:High-throughput methods can directly detect the set of interacting proteins in model species but the results are often incomplete and exhibit high false positive and false negative rates. A number of researchers have recently presented methods for integrating direct and indirect data for predicting interacti...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-S10-S6
更新日期:2007-01-01 00:00:00
abstract:BACKGROUND:Research related to cancer is vast, and continues in earnest in many directions. Due to the complexity of cancer, a better understanding of tumor growth dynamics can be gleaned from a dynamic computational model. We present a comprehensive, fully executable, spatial and temporal 3D computational model of the...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1168-5
更新日期:2016-08-24 00:00:00
abstract:BACKGROUND:Two-dimensional data needs to be processed and analysed in almost any experimental laboratory. Some tasks in this context may be performed with generic software such as spreadsheet programs which are available ubiquitously, others may require more specialised software that requires paid licences. Additionall...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-201
更新日期:2012-08-14 00:00:00
abstract:BACKGROUND:The Cell Ontology (CL) is an ontology for the representation of in vivo cell types. As biological ontologies such as the CL grow in complexity, they become increasingly difficult to use and maintain. By making the information in the ontology computable, we can use automated reasoners to detect errors and ass...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-6
更新日期:2011-01-05 00:00:00
abstract:BACKGROUND:Substitution matrices are key parameters for the alignment of two protein sequences, and consequently for most comparative genomics studies. The composition of biological sequences can vary importantly between species and groups of species, and classical matrices such as those in the BLOSUM series fail to ac...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-457
更新日期:2011-11-24 00:00:00
abstract:BACKGROUND:Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phe...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2210-6
更新日期:2018-06-11 00:00:00
abstract:BACKGROUND:Determining differentially expressed genes (DEGs) between biological samples is the key to understand how genotype gives rise to phenotype. RNA-seq and microarray are two main technologies for profiling gene expression levels. However, considerable discrepancy has been found between DEGs detected using the t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0847-y
更新日期:2016-01-11 00:00:00
abstract:BACKGROUND:As a rule, peptides are more flexible and unstructured than proteins with their substantial stabilizing hydrophobic cores. Nevertheless, a few stably folding peptides have been discovered. This raises the question whether there may be more such peptides that are unknown as yet. These molecules could be helpf...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-109
更新日期:2008-02-19 00:00:00
abstract:BACKGROUND:In this study, we extended the replica exchange Monte Carlo (REMC) sampling method to protein-small molecule docking conformational prediction using RosettaLigand. In contrast to the traditional Monte Carlo (MC) and REMC sampling methods, these methods use multi-objective optimization Pareto front informatio...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1733-6
更新日期:2017-07-10 00:00:00
abstract:BACKGROUND:Visualization tools for deep learning models typically focus on discovering key input features without considering how such low level features are combined in intermediate layers to make decisions. Moreover, many of these methods examine a network's response to specific input examples that may be insufficien...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2957-4
更新日期:2019-07-19 00:00:00
abstract:BACKGROUND:In recent years, substantial effort has been applied to de novo regulatory motif discovery. At this time, more than 150 software tools exist to detect regulatory binding sites given a set of genomic sequences. As the number of software packages increases, it becomes more important to identify the tools with ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S9-S6
更新日期:2008-08-12 00:00:00
abstract:BACKGROUND:Complex networks are studied across many fields of science and are particularly important to understand biological processes. Motifs in networks are small connected sub-graphs that occur significantly in higher frequencies than in random networks. They have recently gathered much attention as a useful concep...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-318
更新日期:2009-10-04 00:00:00
abstract:BACKGROUND:Correctly identifying genomic regions enriched with histone modifications and transcription factors is key to understanding their regulatory and developmental roles. Conceptually, these regions are divided into two categories, narrow peaks and broad domains, and different algorithms are used to identify each...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-0991-z
更新日期:2016-03-24 00:00:00
abstract:BACKGROUND:Chemical named entities represent an important facet of biomedical text. RESULTS:We have developed a system to use character-based n-grams, Maximum Entropy Markov Models and rescoring to recognise chemical names and other such entities, and to make confidence estimates for the extracted entities. An adjusta...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S11-S4
更新日期:2008-11-19 00:00:00
abstract:BACKGROUND:The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-525
更新日期:2008-12-08 00:00:00
abstract:BACKGROUND:Set comparisons permeate a large number of data analysis workflows, in particular workflows in biological sciences. Venn diagrams are frequently employed for such analysis but current tools are limited. RESULTS:We have developed InteractiVenn, a more flexible tool for interacting with Venn diagrams includin...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0611-3
更新日期:2015-05-22 00:00:00
abstract:BACKGROUND:Time- and dose-to-event phenotypes used in basic science and translational studies are commonly measured imprecisely or incompletely due to limitations of the experimental design or data collection schema. For example, drug-induced toxicities are not reported by the actual time or dose triggering the event, ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2899-x
更新日期:2019-05-28 00:00:00
abstract:BACKGROUND:Cross-species comparisons of gene neighborhoods (also called genomic contexts) in microbes may provide insight into determining functionally related or co-regulated sets of genes, suggest annotations of previously un-annotated genes, and help to identify horizontal gene transfer events across microbial speci...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-18
更新日期:2013-01-16 00:00:00
abstract:BACKGROUND:Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S17-S9
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:The low success rate and high cost of drug discovery requires the development of new paradigms to identify molecules of therapeutic value. The Anatomical Therapeutic Chemical (ATC) Code System is a World Health Organization (WHO) proposed classification that assigns multi-level codes to compounds based on th...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1660-6
更新日期:2017-06-07 00:00:00
abstract:BACKGROUND:The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform them in information easily accessible to biologists. RESULTS:By exploiting expression data publicly available in the Gene Expression Omnibus (GEO) d...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S1-S6
更新日期:2014-01-01 00:00:00