Abstract:
:A number of supervised machine learning models have recently been introduced for the prediction of drug-target interactions based on chemical structure and genomic sequence information. Although these models could offer improved means for many network pharmacology applications, such as repositioning of drugs for new therapeutic uses, the prediction models are often being constructed and evaluated under overly simplified settings that do not reflect the real-life problem in practical applications. Using quantitative drug-target bioactivity assays for kinase inhibitors, as well as a popular benchmarking data set of binary drug-target interactions for enzyme, ion channel, nuclear receptor and G protein-coupled receptor targets, we illustrate here the effects of four factors that may lead to dramatic differences in the prediction results: (i) problem formulation (standard binary classification or more realistic regression formulation), (ii) evaluation data set (drug and target families in the application use case), (iii) evaluation procedure (simple or nested cross-validation) and (iv) experimental setting (whether training and test sets share common drugs and targets, only drugs or targets or neither). Each of these factors should be taken into consideration to avoid reporting overoptimistic drug-target interaction prediction results. We also suggest guidelines on how to make the supervised drug-target interaction prediction studies more realistic in terms of such model formulations and evaluation setups that better address the inherent complexity of the prediction task in the practical applications, as well as novel benchmarking data sets that capture the continuous nature of the drug-target interactions for kinase inhibitors.
journal_name
Brief Bioinformjournal_title
Briefings in bioinformaticsauthors
Pahikkala T,Airola A,Pietilä S,Shakyawar S,Szwajda A,Tang J,Aittokallio Tdoi
10.1093/bib/bbu010subject
Has Abstractpub_date
2015-03-01 00:00:00pages
325-37issue
2eissn
1467-5463issn
1477-4054pii
bbu010journal_volume
16pub_type
杂志文章abstract::Protein phosphorylation is a reversible and ubiquitous post-translational modification that primarily occurs at serine, threonine and tyrosine residues and regulates a variety of biological processes. In this paper, we first briefly summarized the current progresses in computational prediction of eukaryotic protein ph...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bby122
更新日期:2020-03-23 00:00:00
abstract::Phenotypes have gained increased notoriety in the clinical and biological domain owing to their application in numerous areas such as the discovery of disease genes and drug targets, phylogenetics and pharmacogenomics. Phenotypes, defined as observable characteristics of organisms, can be seen as one of the bridges th...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbv083
更新日期:2016-09-01 00:00:00
abstract::RNA editing is a widespread co/posttranscriptional mechanism affecting primary RNAs by specific nucleotide modifications, which plays relevant roles in molecular processes including regulation of gene expression and/or the processing of noncoding RNAs. In recent years, the detection of editing sites has been improved ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/bbx129
更新日期:2019-03-22 00:00:00
abstract::Systems pharmacology is an emerging field that integrates systems biology and pharmacology to advance the process of drug discovery, development and the understanding of therapeutic mechanisms. The aim of the present work is to highlight the role that the systems pharmacology plays across the traditional herbal medici...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbt035
更新日期:2014-09-01 00:00:00
abstract::The so-called 'omics' approaches used in modern biology aim at massively characterizing the molecular repertories of living systems at different levels. Metabolomics is one of the last additions to the 'omics' family and it deals with the characterization of the set of metabolites in a given biological system. As meta...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbs055
更新日期:2013-11-01 00:00:00
abstract::Short tandem repeats are highly polymorphic and associated with a wide range of phenotypic variation, some of which cause neurodegenerative disease in humans. With advances in high-throughput sequencing technologies, there are novel opportunities to study genetic variation. While available sequencing technologies and ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/bbu001
更新日期:2015-03-01 00:00:00
abstract::The presence of different transcripts of a gene across samples can be analysed by whole-transcriptome microarrays. Reproducing results from published microarray data represents a challenge owing to the vast amounts of data and the large variety of preprocessing and filtering steps used before the actual analysis is ca...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbt011
更新日期:2014-07-01 00:00:00
abstract::Cooperative regulation among multiple microRNAs (miRNAs) is a complex type of posttranscriptional regulation in human; however, the global view of the system-level regulatory principles across cancers is still unclear. Here, we investigated miRNA-miRNA cooperative regulatory landscape across 18 cancer types and summar...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/bby038
更新日期:2019-09-27 00:00:00
abstract::The increasing ease with which massive genetic information can be obtained from patients or healthy individuals has stimulated the development of interpretive bioinformatics tools as aids in clinical practice. Most such tools analyze evolutionary information and simple physical-chemical properties to predict whether r...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbz146
更新日期:2021-01-18 00:00:00
abstract::Since the completion of the Human Genome Project, it has been widely established that most DNA is not transcribed into proteins. These non-protein-coding regions are believed to be moderators within transcriptional and post-transcriptional processes, which play key roles in the onset of diseases. Long non-coding RNAs ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/bbv114
更新日期:2017-01-01 00:00:00
abstract::This Briefing reviews the widely used, currently active, up-to-date databases derived from the worldwide Protein Data Bank (PDB) to facilitate browsing, finding and exploring its entries. These databases contain visualization and analysis tools tailored to specific kinds of molecules and interactions, often including ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbw049
更新日期:2017-07-01 00:00:00
abstract::The cell-free DNA (cfDNA) methylation profile in liquid biopsy has been utilized to diagnose early-stage disease and estimate therapy response. However, typical clinical procedures are capable of purifying only very small amounts of cfDNA. Whole-genome bisulfite sequencing (WGBS) is the gold standard for measuring DNA...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa332
更新日期:2020-12-15 00:00:00
abstract::Effective drug discovery contributes to the treatment of numerous diseases but is limited by high costs and long cycles. The Quantitative Structure-Activity Relationship (QSAR) method was introduced to evaluate the activity of a large number of compounds virtually, reducing the time and labor costs required for chemic...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa276
更新日期:2020-11-03 00:00:00
abstract::A proliferation of chemical, reaction and enzyme databases, new computational methods and software tools for data-driven rational biosynthesis design have emerged in recent years. With the coming of the era of big data, particularly in the bio-medical field, data-driven rational biosynthesis design could potentially b...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbz065
更新日期:2020-07-15 00:00:00
abstract::The adoption of agent technologies and multi-agent systems constitutes an emerging area in bioinformatics. In this article, we report on the activity of the Working Group on Agents in Bioinformatics (BIOAGENTS) founded during the first AgentLink III Technical Forum meeting on the 2nd of July, 2004, in Rome. The meetin...
journal_title:Briefings in bioinformatics
pub_type:
doi:10.1093/bib/bbl014
更新日期:2007-01-01 00:00:00
abstract::As a group of important plant species in agriculture and biology, polyploids have been increasingly studied in terms of their genome structure and organization. There are two types of polyploids, allopolyploids and autopolyploids, each resulting from a different genetic origin, which undergo meiotic divisions of a dis...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbt075
更新日期:2015-01-01 00:00:00
abstract::In view of great difficulties in the pathogenesis analysis of Alzheimer's disease (AD) presently, profiling the modifiable risk factors is crucial for early detection and intervention of AD. However, the causal associations among them have yet to be identified, and the effective integration and application of these da...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa213
更新日期:2020-09-21 00:00:00
abstract::Mediation analysis has been a useful tool for investigating the effect of mediators that lie in the path from the independent variable to the outcome. With the increasing dimensionality of mediators such as in (epi)genomics studies, high-dimensional mediation model is needed. In this work, we focus on epigenetic studi...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa113
更新日期:2020-07-01 00:00:00
abstract::While leading to millions of people's deaths every year the treatment of viral infectious diseases remains a huge public health challenge.Therefore, an in-depth understanding of human-virus protein-protein interactions (PPIs) as the molecular interface between a virus and its host cell is of paramount importance to ob...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa425
更新日期:2021-01-30 00:00:00
abstract::The structural description of peptide ligands bound to G protein-coupled receptors (GPCRs) is important for the discovery of new drugs and deeper understanding of the molecular mechanisms of life. Here we describe a three-stage protocol for the molecular docking of peptides to GPCRs using a set of different programs: ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa109
更新日期:2020-06-10 00:00:00
abstract::With the availability of gene expression data by RNA-seq, powerful statistical approaches for grouping similar gene expression profiles across different environments have become increasingly important. We describe and assess a computational model for clustering genes into distinct groups based on the pattern of gene e...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbt029
更新日期:2014-07-01 00:00:00
abstract::Numerous studies have shown that copy number variation (CNV) in lncRNA regions play critical roles in the initiation and progression of cancer. However, our knowledge about their functionalities is still limited. Here, we firstly provided a computational method to identify lncRNAs with copy number variation (lncRNAs-C...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbz113
更新日期:2020-12-01 00:00:00
abstract::Phylogenomic databases provide orthology predictions for species with fully sequenced genomes. Although the goal seems well-defined, the content of these databases differs greatly. Seven ortholog databases (Ensembl Compara, eggNOG, HOGENOM, InParanoid, OMA, OrthoDB, Panther) were compared on the basis of reference tre...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbr034
更新日期:2011-09-01 00:00:00
abstract::Fibrosis is a key component in the pathogenic mechanism of a variety of diseases. These diseases involving fibrosis may share common mechanisms and therapeutic targets, and therefore common intervention strategies and medicines may be applicable for these diseases. For this reason, deliberately introducing anti-fibros...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa115
更新日期:2020-06-22 00:00:00
abstract::Precision medicine has changed thinking in cancer therapy, highlighting a better understanding of the individual clinical interventions. But what role do the drivers and pathways identified from pan-cancer genome analysis play in the tumor? In this letter, we will highlight the importance of in silico modeling in prec...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbz033
更新日期:2020-05-21 00:00:00
abstract::While elementary flux mode (EFM) analysis is now recognized as a cornerstone computational technique for cellular pathway analysis and engineering, EFM application to genome-scale models remains computationally prohibitive. This article provides a review of aspects of EFM computation that elucidates bottlenecks in sca...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbz094
更新日期:2020-12-01 00:00:00
abstract::Despite gene expression programs being notoriously complex, RNA abundance is usually assumed as a proxy for transcriptional activity. Recently developed approaches, able to disentangle transcriptional and post-transcriptional regulatory processes, have revealed a more complex scenario. It is now possible to work out h...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa389
更新日期:2020-12-22 00:00:00
abstract::Gene expression profiling holds great potential as a new approach to histological diagnosis and precision medicine of cancers of unknown primary (CUP). Batch effects and different data types greatly decrease the predictive performance of biomarker-based algorithms, and few methods have been widely applied to identify ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa031
更新日期:2020-04-08 00:00:00
abstract::A class-imbalanced classifier is a decision rule to predict the class membership of new samples from an available data set where the class sizes differ considerably. When the class sizes are very different, most standard classification algorithms may favor the larger (majority) class resulting in poor accuracy in the ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/bbs006
更新日期:2013-01-01 00:00:00
abstract::Mathematical models can serve as a tool to formalize biological knowledge from diverse sources, to investigate biological questions in a formal way, to test experimental hypotheses, to predict the effect of perturbations and to identify underlying mechanisms. We present a pipeline of computational tools that performs ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbx163
更新日期:2019-07-19 00:00:00