Rational selection of training and test sets for the development of validated QSAR models.

Abstract:

:Quantitative Structure-Activity Relationship (QSAR) models are used increasingly to screen chemical databases and/or virtual chemical libraries for potentially bioactive molecules. These developments emphasize the importance of rigorous model validation to ensure that the models have acceptable predictive power. Using k nearest neighbors (kNN) variable selection QSAR method for the analysis of several datasets, we have demonstrated recently that the widely accepted leave-one-out (LOO) cross-validated R2 (q2) is an inadequate characteristic to assess the predictive ability of the models [Golbraikh, A., Tropsha, A. Beware of q2! J. Mol. Graphics Mod. 20, 269-276, (2002)]. Herein, we provide additional evidence that there exists no correlation between the values of q2 for the training set and accuracy of prediction (R2) for the test set and argue that this observation is a general property of any QSAR model developed with LOO cross-validation. We suggest that external validation using rationally selected training and test sets provides a means to establish a reliable QSAR model. We propose several approaches to the division of experimental datasets into training and test sets and apply them in QSAR studies of 48 functionalized amino acid anticonvulsants and a series of 157 epipodophyllotoxin derivatives with antitumor activity. We formulate a set of general criteria for the evaluation of predictive power of QSAR models.

journal_name

J Comput Aided Mol Des

authors

Golbraikh A,Shen M,Xiao Z,Xiao YD,Lee KH,Tropsha A

doi

10.1023/a:1025386326946

subject

Has Abstract

pub_date

2003-02-01 00:00:00

pages

241-53

issue

2-4

eissn

0920-654X

issn

1573-4951

journal_volume

17

pub_type

杂志文章
  • Protein-ligand interfaces are polarized: discovery of a strong trend for intermolecular hydrogen bonds to favor donors on the protein side with implications for predicting and designing ligand complexes.

    abstract::Understanding how proteins encode ligand specificity is fascinating and similar in importance to deciphering the genetic code. For protein-ligand recognition, the combination of an almost infinite variety of interfacial shapes and patterns of chemical groups makes the problem especially challenging. Here we analyze da...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-018-0105-2

    authors: Raschka S,Wolf AJ,Bemister-Buffington J,Kuhn LA

    更新日期:2018-04-01 00:00:00

  • Calculation of hydrophobic parameters directly from three-dimensional structures using comparative molecular field analysis.

    abstract::Capacity ratio (log k') values, which are a measure of hydrophobicity, were calculated directly from the three-dimensional structures of 17 furans and 54 triazines using the comparative molecular field analysis approach. The H2O probe and the GRID force field, including hydrogen-bond potentials, yielded excellent corr...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/BF00125172

    authors: Kim KH

    更新日期:1995-08-01 00:00:00

  • Modern drug design: the implication of using artificial neuronal networks and multiple molecular dynamic simulations.

    abstract::We report the implementation of molecular modeling approaches developed as a part of the 2016 Grand Challenge 2, the blinded competition of computer aided drug design technologies held by the D3R Drug Design Data Resource ( https://drugdesigndata.org/ ). The challenge was focused on the ligands of the farnesoid X rece...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-017-0085-7

    authors: Yakovenko O,Jones SJM

    更新日期:2018-01-01 00:00:00

  • Discovery of DNA dyes Hoechst 34580 and 33342 as good candidates for inhibiting amyloid beta formation: in silico and in vitro study.

    abstract::Combining Lipinski's rule with the docking and steered molecular dynamics simulations and using the PubChem data base of about 1.4 million compounds, we have obtained DNA dyes Hoechst 34580 and Hoechst 33342 as top-leads for the Alzheimer's disease. The binding properties of these ligands to amyloid beta (Aβ) fibril w...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-016-9932-1

    authors: Thai NQ,Tseng NH,Vu MT,Nguyen TT,Linh HQ,Hu CK,Chen YR,Li MS

    更新日期:2016-08-01 00:00:00

  • Genetic algorithm for the design of molecules with desired properties.

    abstract::The design of molecules with desired properties is still a challenge because of the largely unpredictable end results. Computational methods can be used to assist and speed up this process. In particular, genetic algorithms have proved to be powerful tools with a wide range of applications, e.g. in the field of drug d...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1023/a:1021928016359

    authors: Kamphausen S,Höltge N,Wirsching F,Morys-Wortmann C,Riester D,Goetz R,Thürk M,Schwienhorst A

    更新日期:2002-08-01 00:00:00

  • Design of chemical space networks using a Tanimoto similarity variant based upon maximum common substructures.

    abstract::Chemical space networks (CSNs) have recently been introduced as an alternative to other coordinate-free and coordinate-based chemical space representations. In CSNs, nodes represent compounds and edges pairwise similarity relationships. In addition, nodes are annotated with compound property information such as biolog...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-015-9872-1

    authors: Zhang B,Vogt M,Maggiora GM,Bajorath J

    更新日期:2015-10-01 00:00:00

  • Classification of protein disulphide-bridge topologies.

    abstract::The preferential occurrence of certain disulphide-bridge topologies in proteins has prompted us to design a method and a program, KNOT-MATCH, for their classification. The program has been applied to a database of proteins with less than 65% homology and more than two disulphide bridges. We have investigated whether t...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1023/a:1011164224144

    authors: Mas JM,Aloy P,Martí-Renom MA,Oliva B,de Llorens R,Avilés FX,Querol E

    更新日期:2001-05-01 00:00:00

  • A proposed common spatial pharmacophore and the corresponding active conformations of some peptide leukotriene receptor antagonists.

    abstract::Molecular modeling studies were carried out by a combined use of conformational analysis and 3D-QSAR methods of identify molecular features common to a series of hydroxyacetophenone (HAP) and non-hydroxyacetophenone (non-HAP) peptide leukotriene (pLT) receptor antagonists. In attempts to develop a ligand-binding model...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/BF00124498

    authors: Hariprasad V,Kulkarni VM

    更新日期:1996-08-01 00:00:00

  • QSPR ensemble modelling of the 1:1 and 1:2 complexation of Co²⁺, Ni²⁺, and Cu²⁺ with organic ligands: relationships between stability constants.

    abstract::Quantitative structure-property relationship (QSPR) modeling of stability constants for the metal:ligand ratio 1:1 (logK) and 1:2 (logβ2) complexes of 3 transition metal ions with diverse organic ligands in aqueous solution was performed using ensemble multiple linear regression analysis and substructural molecular fr...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-014-9741-3

    authors: Solov'ev V,Varnek A,Tsivadze A

    更新日期:2014-05-01 00:00:00

  • QSAR and classification models of a novel series of COX-2 selective inhibitors: 1,5-diarylimidazoles based on support vector machines.

    abstract::The support vector machine, which is a novel algorithm from the machine learning community, was used to develop quantitation and classification models which can be used as a potential screening mechanism for a novel series of COX-2 selective inhibitors. Each compound was represented by calculated structural descriptor...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-004-2722-1

    authors: Liu HX,Zhang RS,Yao XJ,Liu MC,Hu ZD,Fan BT

    更新日期:2004-06-01 00:00:00

  • Ligand efficiency metrics considered harmful.

    abstract::Ligand efficiency metrics are used in drug discovery to normalize biological activity or affinity with respect to physicochemical properties such as lipophilicity and molecular size. This Perspective provides an overview of ligand efficiency metrics and summarizes thermodynamics of protein-ligand binding. Different cl...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-014-9757-8

    authors: Kenny PW,Leitão A,Montanari CA

    更新日期:2014-07-01 00:00:00

  • Substrate recognition by norovirus polymerase: microsecond molecular dynamics study.

    abstract::Molecular dynamics simulations of complexes between Norwalk virus RNA dependent RNA polymerase and its natural CTP and 2dCTP (both containing the O5'-C5'-C4'-O4' sequence of atoms bridging the triphosphate and sugar moiety) or modified coCTP (C5'-O5'-C4'-O4'), cocCTP (C5'-O5'-C4'-C4'') substrates were produced by mean...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-013-9652-8

    authors: Maláč K,Barvík I

    更新日期:2013-04-01 00:00:00

  • Modelling of carbohydrate-aromatic interactions: ab initio energetics and force field performance.

    abstract::Aromatic amino acid residues are often present in carbohydrate-binding sites of proteins. These binding sites are characterized by a placement of a carbohydrate moiety in a stacking orientation to an aromatic ring. This arrangement is an example of CH/pi interactions. Ab initio interaction energies for 20 carbohydrate...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-005-9033-z

    authors: Spiwok V,Lipovová P,Skálová T,Vondrácková E,Dohnálek J,Hasek J,Králová B

    更新日期:2005-12-01 00:00:00

  • QXP: powerful, rapid computer algorithms for structure-based drug design.

    abstract::New methods for docking, template fitting and building pseudo-receptors are described. Full conformational searches are carried out for flexible cyclic and acyclic molecules. QXP (quick explore) search algorithms are derived from the method of Monte Carlo perturbation with energy minimization in Cartesian space. An ad...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1023/a:1007907728892

    authors: McMartin C,Bohacek RS

    更新日期:1997-07-01 00:00:00

  • Rational creation and systematic analysis of cervical cancer kinase-inhibitor binding profile.

    abstract::The kinase-regulatory cell signaling networks play a central role in the pathogenesis of human cervical cancer (hCC). However, only few kinase inhibitors have been successfully developed for treatment of this cancer to date. Considering that the active sites of protein kinases are highly conserved and small-molecule i...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-019-00211-1

    authors: Han M,Sun D

    更新日期:2019-07-01 00:00:00

  • Identification of novel inhibitors for Pim-1 kinase using pharmacophore modeling based on a novel method for selecting pharmacophore generation subsets.

    abstract::Targeting Proviral integration-site of murine Moloney leukemia virus 1 kinase, hereafter called Pim-1 kinase, is a promising strategy for treating different kinds of human cancer. Headed for this a total list of 328 formerly reported Pim-1 kinase inhibitors has been explored and divided based on the pharmacophoric fea...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-015-9887-7

    authors: Shahin R,Swellmeen L,Shaheen O,Aboalhaija N,Habash M

    更新日期:2016-01-01 00:00:00

  • Surflex-Dock: Docking benchmarks and real-world application.

    abstract::Benchmarks for molecular docking have historically focused on re-docking the cognate ligand of a well-determined protein-ligand complex to measure geometric pose prediction accuracy, and measurement of virtual screening performance has been focused on increasingly large and diverse sets of target protein structures, c...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-011-9533-y

    authors: Spitzer R,Jain AN

    更新日期:2012-06-01 00:00:00

  • Visualisation and integration of G protein-coupled receptor related information help the modelling: description and applications of the Viseur program.

    abstract::G Protein-Coupled Receptors (GPCRs) constitute a superfamily of receptors that forms an important therapeutic target. The number of known GPCR sequences and related information increases rapidly. For these reasons, we are developing the Viseur program to integrate the available information related to GPCRs. The Viseur...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1023/a:1008170432484

    authors: Campagne F,Jestin R,Reversat JL,Bernassau JM,Maigret B

    更新日期:1999-11-01 00:00:00

  • Binding free energy predictions of farnesoid X receptor (FXR) agonists using a linear interaction energy (LIE) approach with reliability estimation: application to the D3R Grand Challenge 2.

    abstract::Computational protein binding affinity prediction can play an important role in drug research but performing efficient and accurate binding free energy calculations is still challenging. In the context of phase 2 of the Drug Design Data Resource (D3R) Grand Challenge 2 we used our automated eTOX ALLIES approach to app...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-017-0055-0

    authors: Rifai EA,van Dijk M,Vermeulen NPE,Geerke DP

    更新日期:2018-01-01 00:00:00

  • Fractional description of free energies of solvation.

    abstract::A new and rigorous method for the fractional description of solvation and transfer free energies is presented. The method is based on the use of the Miertus-Scrocco-Tomasi self-consistent reaction field method (MST-SCRF), and allows for a rigorous partition of the total solvation free energy into surface elements. The...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1023/a:1008036526741

    authors: Luque FJ,Barril X,Orozco M

    更新日期:1999-03-01 00:00:00

  • Large-scale evaluation of cytochrome P450 2C9 mediated drug interaction potential with machine learning-based consensus modeling.

    abstract::Cytochrome P450 (CYP) enzymes play an important role in the metabolism of xenobiotics. Since they are connected to drug interactions, screening for potential inhibitors is of utmost importance in drug discovery settings. Our study provides an extensive classification model for P450-drug interactions with one of the mo...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-020-00308-y

    authors: Rácz A,Keserű GM

    更新日期:2020-08-01 00:00:00

  • The IUPAC aqueous and non-aqueous experimental pKa data repositories of organic acids and bases.

    abstract::Accurate and well-curated experimental pKa data of organic acids and bases in both aqueous and non-aqueous media are invaluable in many areas of chemical research, including pharmaceutical, agrochemical, specialty chemical and property prediction research. In pharmaceutical research, pKa data are relevant in ligand de...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-014-9764-9

    authors: Slater AM

    更新日期:2014-10-01 00:00:00

  • Antimalarial activity of synthetic 1,2,4-trioxanes and cyclic peroxy ketals, a quantum similarity study.

    abstract::In this work, the antimalarial activity of two series of 20 and 7 synthetic 1,2,4-trioxanes and a set of 20 cyclic peroxy ketals are tested for correlation search by means of Molecular Quantum Similarity Measures (MQSM). QSAR models, dealing with different biological responses (IC90, IC50 and ED90) of the parasite Pla...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1023/a:1015917510236

    authors: Gironés X,Gallegos A,Carbó-Dorca R

    更新日期:2001-12-01 00:00:00

  • Can we separate active from inactive conformations?

    abstract::Molecular modeling methodologies such as molecular docking, pharmacophore modeling, and 3D-QSAR, rely on conformational searches of small molecules as a starting point. All of these methodologies seek conformations of the small molecules as they bind to target proteins, i.e., their active conformations. Thus the quest...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1023/a:1016320106741

    authors: Diller DJ,Merz KM Jr

    更新日期:2002-02-01 00:00:00

  • Molecular dynamics study of peptide segments of the BH3 domain of the proapoptotic proteins Bak, Bax, Bid and Hrk bound to the Bcl-xL and Bcl-2 proteins.

    abstract::Overexpression of Bcl-2 and Bcl-xL proteins, both inhibitors of apoptosis or programmed cell death, is related to the generation and development of several types of cancer as well as to an elevated resistance to chemotherapeutic treatments. Given that synthetic peptide fragments of the BH3 domain are capable to bind t...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1023/b:jcam.0000022559.72848.1c

    authors: Pinto M,Perez JJ,Rubio-Martinez J

    更新日期:2004-01-01 00:00:00

  • New insights into the stereochemical requirements of the bradykinin B2 receptor antagonists binding.

    abstract::Bradykinin (BK) is a member of the kinin family, released in response to inflammation, trauma, burns, shock, allergy and some cardiovascular diseases, provoking vasodilatation and increased vascular permeability among other effects. Their actions are mediated through at least two G-protein coupled receptors, B1 a rece...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-015-9890-z

    authors: Lupala CS,Gomez-Gutierrez P,Perez JJ

    更新日期:2016-01-01 00:00:00

  • Binding free energy calculations to rationalize the interactions of huprines with acetylcholinesterase.

    abstract::In the present study, the binding free energy of a family of huprines with acetylcholinesterase (AChE) is calculated by means of the free energy perturbation method, based on hybrid quantum mechanics and molecular mechanics potentials. Binding free energy calculations and the analysis of the geometrical parameters hig...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-018-0114-1

    authors: Nascimento ÉCM,Oliva M,Andrés J

    更新日期:2018-05-01 00:00:00

  • ToGo-WF: prediction of RNA tertiary structures and RNA-RNA/protein interactions using the KNIME workflow.

    abstract::Recent progress in molecular biology has revealed that many non-coding RNAs regulate gene expression or catalyze biochemical reactions in tumors, viruses and several other diseases. The tertiary structure of RNA molecules and RNA-RNA/protein interaction sites are of increasing importance as potential targets for new m...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-019-00195-y

    authors: Yamasaki S,Amemiya T,Yabuki Y,Horimoto K,Fukui K

    更新日期:2019-05-01 00:00:00

  • The impact of data integrity on decision making in early lead discovery.

    abstract::Data driven decision making is a key element of today's pharmaceutical research, including early drug discovery. It comprises questions like which target to pursue, which chemical series to pursue, which compound to make next, or which compound to select for advanced profiling and promotion to pre-clinical development...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1007/s10822-015-9871-2

    authors: Beck B,Seeliger D,Kriegl JM

    更新日期:2015-09-01 00:00:00

  • Fine specificity of antigen binding to two class I major histocompatibility proteins (B*2705 and B*2703) differing in a single amino acid residue.

    abstract::Starting from the X-ray structure of a class I major histocompatibility complex (MHC)-encoded protein (HLA-B*2705), a naturally presented self-nonapeptide and two synthetic analogues were simulated in the binding groove of two human leukocyte antigen (HLA) alleles (B*2703 and B*2705) differing in a single amino acid r...

    journal_title:Journal of computer-aided molecular design

    pub_type: 杂志文章

    doi:10.1023/a:1007963901092

    authors: Rognan D,Krebs S,Kuonen O,Lamas JR,López de Castro JA,Folkers G

    更新日期:1997-09-01 00:00:00