Abstract:
:DNA-binding hot spot residues of proteins are dominant and fundamental interface residues that contribute most of the binding free energy of protein-DNA interfaces. As experimental methods for identifying hot spots are expensive and time consuming, computational approaches are urgently required in predicting hot spots on a large scale. In this work, we systematically assessed a wide variety of 114 features from a combination of the protein sequence, structure, network and solvent accessible information and their combinations along with various feature selection strategies for hot spot prediction. We then trained and compared four commonly used machine learning models, namely, support vector machine (SVM), random forest, Naïve Bayes and k-nearest neighbor, for the identification of hot spots using 10-fold cross-validation and the independent test set. Our results show that (1) features based on the solvent accessible surface area have significant effect on hot spot prediction; (2) different but complementary features generally enhance the prediction performance; and (3) SVM outperforms other machine learning methods on both training and independent test sets. In an effort to improve predictive performance, we developed a feature-based method, namely, PrPDH (Prediction of Protein-DNA binding Hot spots), for the prediction of hot spots in protein-DNA binding interfaces using SVM based on the selected 10 optimal features. Comparative results on benchmark data sets indicate that our predictor is able to achieve generally better performance in predicting hot spots compared to the state-of-the-art predictors. A user-friendly web server for PrPDH is well established and is freely available at http://bioinfo.ahu.edu.cn:8080/PrPDH.
journal_name
Brief Bioinformjournal_title
Briefings in bioinformaticsauthors
Zhang S,Zhao L,Zheng CH,Xia Jdoi
10.1093/bib/bbz037subject
Has Abstractpub_date
2020-05-21 00:00:00pages
1038-1046issue
3eissn
1467-5463issn
1477-4054pii
5424984journal_volume
21pub_type
杂志文章abstract::The formation of phenotypic traits, such as biomass production, tumor volume and viral abundance, undergoes a complex process in which interactions between genes and developmental stimuli take place at each level of biological organization from cells to organisms. Traditional studies emphasize the impact of genes by d...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbs049
更新日期:2014-01-01 00:00:00
abstract::While the number of sequenced genes is increasing dramatically, the number of different protein structural families is expected to be more limited. Changes in enzymatic activity or protein interactions can dramatically modify the role of homologous proteins in different organisms or mutants. However, experimental data...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/2.1.30
更新日期:2001-03-01 00:00:00
abstract::Relative changes in mRNA as well as protein levels induced by sublethal doses of antibiotics on bacteria are measured and results visualised in the context of metabolic pathway diagrams. The mRNA levels present at a given time point after the addition of the antibiotic are measured using microarrays from Affymetrix. A...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/1.3.297
更新日期:2000-09-01 00:00:00
abstract::Pathway enrichment analysis has been widely used to identify cancer risk pathways, and contributes to elucidating the mechanism of tumorigenesis. However, most of the existing approaches use the outdated pathway information and neglect the complex gene interactions in pathway. Here, we first reviewed the existing wide...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbx091
更新日期:2019-01-18 00:00:00
abstract::Plant transcriptome encompasses numerous endogenous, regulatory non-coding RNAs (ncRNAs) that play a major biological role in regulating key physiological mechanisms. While studies have shown that ncRNAs are extremely diverse and ubiquitous, the functions of the vast majority of ncRNAs are still unknown. With ever-inc...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa322
更新日期:2020-12-18 00:00:00
abstract::The bipartite network representation of the drug-target interactions (DTIs) in a biosystem enhances understanding of the drugs' multifaceted action modes, suggests therapeutic switching for approved drugs and unveils possible side effects. As experimental testing of DTIs is costly and time-consuming, computational pre...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbx041
更新日期:2018-11-27 00:00:00
abstract:MOTIVATION:Long noncoding RNAs (lncRNAs) correspond to a eukaryotic noncoding RNA class that gained great attention in the past years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast the vari...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/bby034
更新日期:2019-03-25 00:00:00
abstract::The presence of different transcripts of a gene across samples can be analysed by whole-transcriptome microarrays. Reproducing results from published microarray data represents a challenge owing to the vast amounts of data and the large variety of preprocessing and filtering steps used before the actual analysis is ca...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbt011
更新日期:2014-07-01 00:00:00
abstract::The number of bioinformatics tools and resources that support molecular and cell biology approaches is continuously expanding. Moreover, systems and network biology analyses are accompanied more and more by integrated bioinformatics methods. Traditional information-centered university teaching methods often fail, as (...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbt024
更新日期:2013-09-01 00:00:00
abstract::Online sequence repositories are teeming with RNA sequencing (RNA-Seq) data from a wide range of eukaryotes. Although most of these data sets contain large numbers of organelle-derived reads, researchers tend to ignore these data, focusing instead on the nuclear-derived transcripts. Consequently, GenBank contains mass...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbw088
更新日期:2017-11-01 00:00:00
abstract::Despite The Central Dogma states the destiny of gene as 'DNA makes RNA and RNA makes protein', the nucleic acids not only store and transmit genetic information but also, surprisingly, join in intracellular vital movement as a regulator of gene expression. Bioinformatics has contributed to knowledge for a series of em...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa023
更新日期:2020-04-06 00:00:00
abstract::Integrative analyses of genomic, epigenomic and transcriptomic features for human and various model organisms have revealed that many such features are nonrandomly distributed in the genome. Significant enrichment (or depletion) of genomic features is anticipated to be biologically important. Detection of genomic regi...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbt053
更新日期:2014-11-01 00:00:00
abstract::The so-called 'omics' approaches used in modern biology aim at massively characterizing the molecular repertories of living systems at different levels. Metabolomics is one of the last additions to the 'omics' family and it deals with the characterization of the set of metabolites in a given biological system. As meta...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbs055
更新日期:2013-11-01 00:00:00
abstract::We discuss and review different ways to map cellular components and their temporal interaction with other such components to different non-spatially explicit mathematical models. The essential choices made in the literature are between discrete and continuous state spaces, between rule and event-based state updates an...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/bbp062
更新日期:2010-01-01 00:00:00
abstract::Protein dynamics is central to all biological processes, including signal transduction, cellular regulation and biological catalysis. Among them, in-depth exploration of ligand-driven protein dynamics contributes to an optimal understanding of protein function, which is particularly relevant to drug discovery. Hence, ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbz141
更新日期:2020-12-01 00:00:00
abstract::The adoption of agent technologies and multi-agent systems constitutes an emerging area in bioinformatics. In this article, we report on the activity of the Working Group on Agents in Bioinformatics (BIOAGENTS) founded during the first AgentLink III Technical Forum meeting on the 2nd of July, 2004, in Rome. The meetin...
journal_title:Briefings in bioinformatics
pub_type:
doi:10.1093/bib/bbl014
更新日期:2007-01-01 00:00:00
abstract::In clinical cancer treatment, genomic alterations would often affect the response of patients to anticancer drugs. Studies have shown that molecular features of tumors could be biomarkers predictive of sensitivity or resistance to anticancer agents, but the identification of actionable mutations are often constrained ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbz109
更新日期:2020-12-01 00:00:00
abstract::Alternative polyadenylation (APA) in breast tumor samples results in the removal/addition of cis-regulatory elements such as microRNA (miRNA) target sites in the 3'-untranslated region (3'-UTRs) of genes. Although previous computational APA studies focused on a subset of genes strongly affected by APA (APA genes), we ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa191
更新日期:2020-08-26 00:00:00
abstract::Systematic sequencing of cancer genomes has revealed prevalent heterogeneity, with patients harboring various combinatorial patterns of genetic alteration. In particular, a phenomenon that a group of genes exhibits mutually exclusive patterns has been widespread across cancers, covering a broad spectrum of crucial can...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbx109
更新日期:2019-01-18 00:00:00
abstract::Protein remote homology detection is one of the most fundamental and central problems for the studies of protein structures and functions, aiming to detect the distantly evolutionary relationships among proteins via computational methods. During the past decades, many computational approaches have been proposed to sol...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/bbw108
更新日期:2018-03-01 00:00:00
abstract::Computational and mathematical modelling has become a valuable tool for investigating biological systems. Modelling enables prediction of how biological components interact to deliver system-level properties and extrapolation of biological system performance to contexts and experimental conditions where this is unknow...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bby092
更新日期:2018-09-18 00:00:00
abstract::Gene expression profiling holds great potential as a new approach to histological diagnosis and precision medicine of cancers of unknown primary (CUP). Batch effects and different data types greatly decrease the predictive performance of biomarker-based algorithms, and few methods have been widely applied to identify ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa031
更新日期:2020-04-08 00:00:00
abstract::Understanding the genetic basis of human traits/diseases and the underlying mechanisms of how these traits/diseases are affected by genetic variations is critical for public health. Current genome-wide functional genomics data uncovered a large number of functional elements in the noncoding regions of human genome, pr...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/bbu018
更新日期:2015-05-01 00:00:00
abstract::Heterophylly, i.e. morphological changes in leaves along the axis of an individual plant, is regarded as a strategy used by plants to cope with environmental change. However, little is known of the extent to which heterophylly is controlled by genes and how each underlying gene exerts its effect on heterophyllous vari...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbx011
更新日期:2018-07-20 00:00:00
abstract::Gene expression data have played an essential role in many biomedical studies. When the number of genes is large and sample size is limited, there is a 'lack of information' problem, leading to low-quality findings. To tackle this problem, both horizontal and vertical data integrations have been developed, where verti...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa169
更新日期:2020-08-14 00:00:00
abstract::Correlated reaction sets (Co-Sets) are mathematically defined modules in biochemical reaction networks which facilitate the study of biological processes by decomposing complex reaction networks into conceptually simple units. According to the degree of association, Co-Sets can be classified into three types: perfect,...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbp068
更新日期:2011-03-01 00:00:00
abstract::With the availability of gene expression data by RNA-seq, powerful statistical approaches for grouping similar gene expression profiles across different environments have become increasingly important. We describe and assess a computational model for clustering genes into distinct groups based on the pattern of gene e...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbt029
更新日期:2014-07-01 00:00:00
abstract::Effectively representing Medical Subject Headings (MeSH) headings (terms) such as disease and drug as discriminative vectors could greatly improve the performance of downstream computational prediction models. However, these terms are often abstract and difficult to quantify. In this paper, we converted the MeSH tree ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa037
更新日期:2020-03-31 00:00:00
abstract::Information Integrator is an extension to IBM's relational database DB2, which uses data federation to provide benefits to molecular biology researchers through two unique capabilities: increased flexibility in combining data from disparate sources, and SQL access to non-SQL data, easing the task of automating data an...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/4.4.375
更新日期:2003-12-01 00:00:00
abstract::Accurately identifying the interactions between genomic factors and the response of cancer drugs plays important roles in drug discovery, drug repositioning and cancer treatment. A number of studies revealed that interactions between genes and drugs were 'many-genes-to-many drugs' interactions, i.e. common modules, op...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa087
更新日期:2020-06-26 00:00:00