Structural and Sequence Similarity Makes a Significant Impact on Machine-Learning-Based Scoring Functions for Protein-Ligand Interactions.

Abstract:

:The prediction of protein-ligand binding affinity has recently been improved remarkably by machine-learning-based scoring functions. For example, using a set of simple descriptors representing the atomic distance counts, the RF-Score improves the Pearson correlation coefficient to about 0.8 on the core set of the PDBbind 2007 database, which is significantly higher than the performance of any conventional scoring function on the same benchmark. A few studies have been made to discuss the performance of machine-learning-based methods, but the reason for this improvement remains unclear. In this study, by systemically controlling the structural and sequence similarity between the training and test proteins of the PDBbind benchmark, we demonstrate that protein structural and sequence similarity makes a significant impact on machine-learning-based methods. After removal of training proteins that are highly similar to the test proteins identified by structure alignment and sequence alignment, machine-learning-based methods trained on the new training sets do not outperform the conventional scoring functions any more. On the contrary, the performance of conventional functions like X-Score is relatively stable no matter what training data are used to fit the weights of its energy terms.

journal_name

J Chem Inf Model

authors

Li Y,Yang J

doi

10.1021/acs.jcim.7b00049

subject

Has Abstract

pub_date

2017-04-24 00:00:00

pages

1007-1012

issue

4

eissn

1549-9596

issn

1549-960X

journal_volume

57

pub_type

杂志文章
  • Universal Activation Index for Class A GPCRs.

    abstract::An index of the activation of Class A G-protein-coupled receptors (GPCRs) has been trained using interhelix distances from a series of microsecond molecular-dynamics simulations and tested for 268 published X-ray structures. In a three-class model that includes intermediate structures, 63% of the active structures are...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00604

    authors: Ibrahim P,Wifling D,Clark T

    更新日期:2019-09-23 00:00:00

  • BFMP: a method for discretizing and visualizing pyranose conformations.

    abstract::We report a new classification method for pyranose ring conformations called Best-fit, Four-Membered Plane (BFMP), which describes pyranose ring conformations based on reference planes defined by four atoms. The method is able to characterize all asymmetrical and symmetrical shapes of a pyran ring, is readily automate...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500325b

    authors: Makeneni S,Foley BL,Woods RJ

    更新日期:2014-10-27 00:00:00

  • Assessing different classification methods for virtual screening.

    abstract::How well do different classification methods perform in selecting the ligands of a protein target out of large compound collections not used to train the model? Support vector machines, random forest, artificial neural networks, k-nearest-neighbor classification with genetic-algorithm-optimized feature selection, tren...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci050519k

    authors: Plewczynski D,Spieser SA,Koch U

    更新日期:2006-05-01 00:00:00

  • Sharing Data from Molecular Simulations.

    abstract::Given the need for modern researchers to produce open, reproducible scientific output, the lack of standards and best practices for sharing data and workflows used to produce and analyze molecular dynamics (MD) simulations has become an important issue in the field. There are now multiple well-established packages to ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00665

    authors: Abraham M,Apostolov R,Barnoud J,Bauer P,Blau C,Bonvin AMJJ,Chavent M,Chodera J,Čondić-Jurkić K,Delemotte L,Grubmüller H,Howard RJ,Jordan EJ,Lindahl E,Ollila OHS,Selent J,Smith DGA,Stansfeld PJ,Tiemann JKS,Trellet M

    更新日期:2019-10-28 00:00:00

  • Random Forest Refinement of Pairwise Potentials for Protein-Ligand Decoy Detection.

    abstract::An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function's ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluat...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00356

    authors: Pei J,Zheng Z,Kim H,Song LF,Walworth S,Merz MR,Merz KM Jr

    更新日期:2019-07-22 00:00:00

  • Estimation of carcinogenicity using molecular fragments tree.

    abstract::Carcinogenicity is an important toxicological endpoint that poses high concern to drug discovery. In this study, we developed a method to extract structural alerts (SAs) and modulating factors of carcinogens on the basis of statistical analyses. First, the Gaston algorithm, a frequent subgraph mining method, was used ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300266p

    authors: Wang Y,Lu J,Wang F,Shen Q,Zheng M,Luo X,Zhu W,Jiang H,Chen K

    更新日期:2012-08-27 00:00:00

  • Chemoinformatics-based classification of prohibited substances employed for doping in sport.

    abstract::Representative molecules from 10 classes of prohibited substances were taken from the World Anti-Doping Agency (WADA) list, augmented by molecules from corresponding activity classes found in the MDDR database. Together with some explicitly allowed compounds, these formed a set of 5245 molecules. Five types of fingerp...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0601160

    authors: Cannon EO,Bender A,Palmer DS,Mitchell JB

    更新日期:2006-11-01 00:00:00

  • Molecular modeling of potential anticancer agents from African medicinal plants.

    abstract::Naturally occurring anticancer compounds represent about half of the chemotherapeutic drugs which have been put in the market against cancer until date. Computer-based or in silico virtual screening methods are often used in lead/hit discovery protocols. In this study, the "drug-likeness" of ~400 compounds from Africa...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci5003697

    authors: Ntie-Kang F,Nwodo JN,Ibezim A,Simoben CV,Karaman B,Ngwa VF,Sippl W,Adikwu MU,Mbaze LM

    更新日期:2014-09-22 00:00:00

  • GESSE: Predicting Drug Side Effects from Drug-Target Relationships.

    abstract::The in silico prediction of unwanted side effects (SEs) caused by the promiscuous behavior of drugs and their targets is highly relevant to the pharmaceutical industry. Considerable effort is now being put into computational and experimental screening of several suspected off-target proteins in the hope that SEs might...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00120

    authors: Pérez-Nueno VI,Souchet M,Karaboga AS,Ritchie DW

    更新日期:2015-09-28 00:00:00

  • Rapid evaluation of synthetic and molecular complexity for in silico chemistry.

    abstract::Methods that rapidly evaluate molecular complexity and synthetic feasibility are becoming increasingly important for in silico chemistry. We propose a new metric based on relative atomic electronegativities and bond parameters that evaluate both synthetic and molecular complexity (SMCM) starting from chemical structur...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0501387

    authors: Allu TK,Oprea TI

    更新日期:2005-09-01 00:00:00

  • Homology model-guided 3D-QSAR studies of HIV-1 integrase inhibitors.

    abstract::In the present study, we report the exploration of binding modes of potent HIV-1 integrase (IN) inhibitors MK-0518 (raltegravir) and GS-9137 (elvitegravir) as well as chalcone and related amide IN inhibitors we recently synthesized and the development of 3D-QSAR models for integrase inhibition. Homology models of DNA-...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200485a

    authors: Sharma H,Cheng X,Buolamwini JK

    更新日期:2012-02-27 00:00:00

  • Structure-based CoMFA as a predictive model - CYP2C9 inhibitors as a test case.

    abstract::In this study, we tried to establish a general scheme to create a model that could predict the affinity of small compounds to their target proteins. This scheme consists of a search for ligand-binding sites on a protein, a generation of bound conformations (poses) of ligands in each of the sites by docking, identifica...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci800313h

    authors: Yasuo K,Yamaotsu N,Gouda H,Tsujishita H,Hirono S

    更新日期:2009-04-01 00:00:00

  • Consensus QSAR models: do the benefits outweigh the complexity?

    abstract::This study has assessed the use of consensus regression, as compared to single multiple linear regression, models for the development of quantitative structure-activity relationships (QSARs). To provide a comparison, four data sets of varying size and complexity were analyzed: silastic membrane flux, toxicity of pheno...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700016d

    authors: Hewitt M,Cronin MT,Madden JC,Rowe PH,Johnson C,Obi A,Enoch SJ

    更新日期:2007-07-01 00:00:00

  • Template CoMFA: the 3D-QSAR Grail?

    abstract::Template CoMFA, a novel alignment methodology for training or test set structures in 3D-QSAR, is introduced. Its two most significant advantages are its complete automation and its ability to derive a single combined model from multiple structural series affecting a biological target. Its only two inputs are one or mo...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400696v

    authors: Cramer RD,Wendt B

    更新日期:2014-02-24 00:00:00

  • Prediction of synthetic accessibility based on commercially available compound databases.

    abstract::A compound's synthetic accessibility (SA) is an important aspect of drug design, since in some cases computer-designed compounds cannot be synthesized. There have been several reports on SA prediction, most of which have focused on the difficulties of synthetic reactions based on retro-synthesis analyses, reaction dat...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500568d

    authors: Fukunishi Y,Kurosawa T,Mikami Y,Nakamura H

    更新日期:2014-12-22 00:00:00

  • Modeling p K Shift in DNA Triplexes Containing Locked Nucleic Acids.

    abstract::The protonation states for nucleic acid bases are difficult to assess experimentally. In the context of DNA triplex, the protonation state of cytidine in the third strand is particularly important, because it needs to be protonated in order to form Hoogsteen hydrogen bonds. A sugar modification, locked nucleic acid (L...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00741

    authors: Hartono YD,Xu Y,Karshikoff A,Nilsson L,Villa A

    更新日期:2018-04-23 00:00:00

  • BiKi Life Sciences: A New Suite for Molecular Dynamics and Related Methods in Drug Discovery.

    abstract::In this paper, we introduce the BiKi Life Sciences suite. This software makes it easy for computational medicinal chemists to run ad hoc molecular dynamics protocols in a novel and task-oriented environment; as a notebook, BiKi (acronym of Binding Kinetics) keeps memory of any activity together with dependencies among...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00680

    authors: Decherchi S,Bottegoni G,Spitaleri A,Rocchia W,Cavalli A

    更新日期:2018-02-26 00:00:00

  • SARANEA: a freely available program to mine structure-activity and structure-selectivity relationship information in compound data sets.

    abstract::We introduce SARANEA, an open-source Java application for interactive exploration of structure-activity relationship (SAR) and structure-selectivity relationship (SSR) information in compound sets of any source. SARANEA integrates various SAR and SSR analysis functions and utilizes a network-like similarity graph data...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci900416a

    authors: Lounkine E,Wawer M,Wassermann AM,Bajorath J

    更新日期:2010-01-01 00:00:00

  • Three-dimensional quantitative structure-activity relationship of nucleosides acting at the A3 adenosine receptor: analysis of binding and relative efficacy.

    abstract::The binding affinity and relative maximal efficacy of human A3 adenosine receptor (AR) agonists were each subjected to ligand-based three-dimensional quantitative structure-activity relationship analysis. Comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) used a...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci600501z

    authors: Kimand SK,Jacobson KA

    更新日期:2007-05-01 00:00:00

  • Comparison of Implicit and Explicit Solvation Models for Iota-Cyclodextrin Conformation Analysis from Replica Exchange Molecular Dynamics.

    abstract::Large ring cyclodextrins have become increasingly important for drug delivery applications. In this work, we have performed replica-exchange molecular dynamics simulations using both implicit and explicit water solvation models to study the conformational diversity of iota-cyclodextrin containing 14 α-1,4 glycosidic l...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00595

    authors: Khuntawee W,Kunaseth M,Rungnim C,Intagorn S,Wolschann P,Kungwan N,Rungrotmongkol T,Hannongbua S

    更新日期:2017-04-24 00:00:00

  • Expert system for predicting reaction conditions: the Michael reaction case.

    abstract::A generic chemical transformation may often be achieved under various synthetic conditions. However, for any specific reagents, only one or a few among the reported synthetic protocols may be successful. For example, Michael β-addition reactions may proceed under different choices of solvent (e.g., hydrophobic, aproti...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500698a

    authors: Marcou G,Aires de Sousa J,Latino DA,de Luca A,Horvath D,Rietsch V,Varnek A

    更新日期:2015-02-23 00:00:00

  • ReFlex3D: Refined Flexible Alignment of Molecules Using Shape and Electrostatics.

    abstract::We present an algorithm, ReFlex3D, for the refinement of flexible molecular alignments based on their three-dimensional shape and electrostatic properties. The algorithm is designed to be used with fast conformer generators to refine an initial overlay between two molecules and thus to obtain improved overlaps as judg...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00618

    authors: Schmidt TC,Cosgrove DA,Boström J

    更新日期:2018-04-23 00:00:00

  • Catalytic Role of Gln202 in the Carboligation Reaction Mechanism of Yeast AHAS: A QM/MM Study.

    abstract::Acetohydroxyacid synthase (AHAS) is a thiamin diphosphate-dependent enzyme involved in the biosynthesis of valine, leucine, isoleucine, and lysine. Experimental evidence has shown that mutation of the Gln202 residue results in a decrease in the enzymatic activity, thus suggesting the main role of the carboligation cat...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00863

    authors: Mendoza F,Medina FE,Jiménez VA,Jaña GA

    更新日期:2020-02-24 00:00:00

  • Comments on the article "Evaluation of pK(a) estimation methods on 211 druglike compounds".

    abstract::The recent article "Evaluation of pK(a) Estimation Methods on 211 Druglike Compounds" ( Manchester, J.; et al. J. Chem Inf. Model. 2010, 50, 565-571 ) reports poor results for the program Epik. Here, we highlight likely sources for the poor performance and describe work done to improve the performance. Running Epik in...

    journal_title:Journal of chemical information and modeling

    pub_type: 评论,杂志文章

    doi:10.1021/ci100332m

    authors: Shelley JC,Calkins D,Sullivan AP

    更新日期:2011-01-24 00:00:00

  • SERAPhiC: a benchmark for in silico fragment-based drug design.

    abstract::Our main objective was to compile a data set of high-quality protein-fragment complexes and make it publicly available. Once assembled, the data set was challenged using docking procedures to address the following questions: (i) Can molecular docking correctly reproduce the experimentally solved structures? (ii) How t...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci2003363

    authors: Favia AD,Bottegoni G,Nobeli I,Bisignano P,Cavalli A

    更新日期:2011-11-28 00:00:00

  • Posetic quantitative superstructure/activity relationships (QSSARs) for chlorobenzenes.

    abstract::As a result of the widespread industrial use of polychlorinated hydrocarbons, they have accumulated in nearly all types of environmental compartments, especially in aquatic systems. Particularly, chloroaromatics are among the most undesirable industrial effluents because of their persistence and toxicity. To predict c...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0501342

    authors: Ivanciuc T,Ivanciuc O,Klein DJ

    更新日期:2005-07-01 00:00:00

  • Novel Consensus Architecture To Improve Performance of Large-Scale Multitask Deep Learning QSAR Models.

    abstract::Advances in the development of high-throughput screening and automated chemistry have rapidly accelerated the production of chemical and biological data, much of them freely accessible through literature aggregator services such as ChEMBL and PubChem. Here, we explore how to use this comprehensive mapping of chemical ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00526

    authors: Zakharov AV,Zhao T,Nguyen DT,Peryea T,Sheils T,Yasgar A,Huang R,Southall N,Simeonov A

    更新日期:2019-11-25 00:00:00

  • Ligand- and structure-based virtual screening for clathrodin-derived human voltage-gated sodium channel modulators.

    abstract::Voltage-gated sodium channels (VGSC) are attractive targets for drug discovery because of the broad therapeutic potential of their modulators. On the basis of the structure of marine alkaloid clathrodin, we have recently discovered novel subtype-selective VGSC modulators I and II that were used as starting points for ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400505e

    authors: Tomašić T,Hartzoulakis B,Zidar N,Chan F,Kirby RW,Madge DJ,Peigneur S,Tytgat J,Kikelj D

    更新日期:2013-12-23 00:00:00

  • How do metabolites differ from their parent molecules and how are they excreted?

    abstract::Understanding which physicochemical properties, or property distributions, are favorable for successful design and development of drugs, nutritional supplements, cosmetics, and agrochemicals is of great importance. In this study we have analyzed molecules from three distinct chemical spaces (i) approved drugs, (ii) hu...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300487z

    authors: Kirchmair J,Howlett A,Peironcely JE,Murrell DS,Williamson MJ,Adams SE,Hankemeier T,van Buren L,Duchateau G,Klaffke W,Glen RC

    更新日期:2013-02-25 00:00:00

  • Probing the Binding Pathway of BRACO19 to a Parallel-Stranded Human Telomeric G-Quadruplex Using Molecular Dynamics Binding Simulation with AMBER DNA OL15 and Ligand GAFF2 Force Fields.

    abstract::Human telomeric DNA G-quadruplex has been identified as a good therapeutic target in cancer treatment. G-quadruplex-specific ligands that stabilize the G-quadruplex have great potential to be developed as anticancer agents. Two crystal structures (an apo form of parallel stranded human telomeric G-quadruplex and its h...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00287

    authors: Machireddy B,Kalra G,Jonnalagadda S,Ramanujachary K,Wu C

    更新日期:2017-11-27 00:00:00