Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models.

Abstract:

:Machine learning has exhibited powerful capabilities in many areas. However, machine learning models are mostly database dependent, requiring a new model if the database changes. Therefore, a universal model is highly desired to accommodate the widest variety of databases. Fortunately, this universality may be achieved by ensemble learning, which can integrate multiple learners to meet the demands of diversified databases. Therefore, we propose a general procedure for learning ensemble establishment based on noncovalent interactions (NCIs) databases. Additionally, accurate NCI computation is quite demanding for first-principles methods, for which a competent machine learning model can be an efficient solution to obtain high NCI accuracy with minimal computational resources. In regard to these aspects, multiple schemes of ensemble learning models (Bagging, Boosting, and Stacking frameworks), are explored in this study. The models are based on various low levels of density functional theory (DFT) calculations for the benchmark databases S66, S22, and X40. All NCIs computed by the DFT calculations can be improved to high-level accuracy (root-mean-square error RMSE = 0.22 kcal/mol in contrast to CCSD(T)/CBS benchmark) by established ensemble learning models. Compared with single machine learning models, ensemble models show better accuracy (RMSE of the best model is further lowered by ∼25%), robustness and goodness-of-fit according to evaluation parameters suggested by the OECD. Among ensemble learning models, heterogeneous Stacking ensemble models show the most valuable application potential. The standardized procedure of constructing learning ensembles has been well utilized on several NCI data sets, and this procedure may also be applicable for other chemical databases.

journal_name

J Chem Inf Model

authors

Li W,Miao W,Cui J,Fang C,Su S,Li H,Hu L,Lu Y,Chen G

doi

10.1021/acs.jcim.8b00878

subject

Has Abstract

pub_date

2019-05-28 00:00:00

pages

1849-1857

issue

5

eissn

1549-9596

issn

1549-960X

journal_volume

59

pub_type

杂志文章
  • How Reactive are Druggable Cysteines in Protein Kinases?

    abstract::Targeted covalent inhibitors (TCIs) have been successfully developed as high-affinity and selective inhibitors of enzymes of the protein kinase family. These drugs typically act by undergoing an electrophilic addition with an active-site cysteine residue, so design of a TCI begins with the identification of a "druggab...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00454

    authors: Awoonor-Williams E,Rowley CN

    更新日期:2018-09-24 00:00:00

  • Secondary structure characterization based on amino acid composition and availability in proteins.

    abstract::The importance of thorough analyses of the secondary structures in proteins as basic structural units cannot be overemphasized. Although recent computational methods have achieved reasonably high accuracy for predicting secondary structures from amino acid sequences, a simple and fundamental empirical approach to char...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci900452z

    authors: Otaki JM,Tsutsumi M,Gotoh T,Yamamoto H

    更新日期:2010-04-26 00:00:00

  • Universal Activation Index for Class A GPCRs.

    abstract::An index of the activation of Class A G-protein-coupled receptors (GPCRs) has been trained using interhelix distances from a series of microsecond molecular-dynamics simulations and tested for 268 published X-ray structures. In a three-class model that includes intermediate structures, 63% of the active structures are...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00604

    authors: Ibrahim P,Wifling D,Clark T

    更新日期:2019-09-23 00:00:00

  • Fragment-Based Computational Method for Designing GPCR Ligands.

    abstract::G protein-coupled receptors (GPCRs) are the largest family of cell surface receptors, which is arguably the most important family of drug target. With the technology breakthroughs in X-ray crystallography and cryo-electron microscopy, more than 300 GPCR-ligand complex structures have been publicly reported since 2007,...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00699

    authors: Li Y,Sun Y,Song Y,Dai D,Zhao Z,Zhang Q,Zhong W,Hu LA,Ma Y,Li X,Wang R

    更新日期:2020-09-28 00:00:00

  • Potent Human Telomerase Inhibitors: Molecular Dynamic Simulations, Multiple Pharmacophore-Based Virtual Screening, and Biochemical Assays.

    abstract::Telomere maintenance is a universal cancer hallmark, and small molecules that disrupt telomere maintenance generally have anticancer properties. Since the vast majority of cancer cells utilize telomerase activity for telomere maintenance, the enzyme has been considered as an anticancer drug target. Recently, rational ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00336

    authors: Shirgahi Talari F,Bagherzadeh K,Golestanian S,Jarstfer M,Amanlou M

    更新日期:2015-12-28 00:00:00

  • Hidden active information in a random compound library: extraction using a pseudo-structure-activity relationship model.

    abstract::We propose a hypothesis that "a model of active compound can be provided by integrating information of compounds high-ranked by docking simulation of a random compound library". In our hypothesis, the inclusion of true active compounds in the high-ranked compound is not necessary. We regard the high-ranked compounds a...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci7003384

    authors: Fukunishi H,Teramoto R,Shimada J

    更新日期:2008-03-01 00:00:00

  • Tautomer Standardization in Chemical Databases: Deriving Business Rules from Quantum Chemistry.

    abstract::Databases of small, potentially bioactive molecules are ubiquitous across the industry and academia. Designed such that each unique compound should appear only once, the multiplicity of ways in which many compounds can be represented means that these databases require methods for standardizing the representation of ch...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00232

    authors: Baker CM,Kidley NJ,Papachristos K,Hotson M,Carson R,Gravestock D,Pouliot M,Harrison J,Dowling A

    更新日期:2020-08-24 00:00:00

  • Molecular Dynamics Simulation of the Conformational Preferences of Pseudouridine Derivatives: Improving the Distribution in the Glycosidic Torsion Space.

    abstract::There are only four derivatives of pseudouridine (Ψ) that are known to occur naturally in RNA as post-transcriptional modifications. We have studied the conformational consequences of pseudouridylation and further modifications using replica exchange molecular dynamics simulations at the nucleoside level, and the simu...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00369

    authors: Dutta N,Sarzynska J,Lahiri A

    更新日期:2020-10-26 00:00:00

  • Dependence of QSAR models on the selection of trial descriptor sets: a demonstration using nanotoxicity endpoints of decorated nanotubes.

    abstract::Little attention has been given to the selection of trial descriptor sets when designing a QSAR analysis even though a great number of descriptor classes, and often a greater number of descriptors within a given class, are now available. This paper reports an effort to explore interrelationships between QSAR models an...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci3005308

    authors: Shao CY,Chen SZ,Su BH,Tseng YJ,Esposito EX,Hopfinger AJ

    更新日期:2013-01-28 00:00:00

  • Metabotropic glutamate receptor-mediated currents at the climbing fiber to Purkinje cell synapse.

    abstract::Different forms of synaptic plasticity in the cerebellum expressed at the synapses onto Purkinje cells (PCs) are mediated by membrane metabotropic glutamate receptors (mGluRs). There are three main mGluR groups with a total of 8 subtypes. Although mGluRs are also found at the climbing fiber (CF) to PC synapses, the di...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci050161s

    authors: Andjus PR,Bajić A,Zhu L,Strata P

    更新日期:2005-11-01 00:00:00

  • Assessing different classification methods for virtual screening.

    abstract::How well do different classification methods perform in selecting the ligands of a protein target out of large compound collections not used to train the model? Support vector machines, random forest, artificial neural networks, k-nearest-neighbor classification with genetic-algorithm-optimized feature selection, tren...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci050519k

    authors: Plewczynski D,Spieser SA,Koch U

    更新日期:2006-05-01 00:00:00

  • Rigidity Strengthening: A Mechanism for Protein-Ligand Binding.

    abstract::Protein-ligand binding is essential to almost all life processes. The understanding of protein-ligand interactions is fundamentally important to rational drug and protein design. Based on large scale data sets, we show that protein rigidity strengthening or flexibility reduction is a mechanism in protein-ligand bindin...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00226

    authors: Nguyen DD,Xiao T,Wang M,Wei GW

    更新日期:2017-07-24 00:00:00

  • Equally Weighted Multiscale Elastic Network Model and Its Comparison with Traditional and Parameter-Free Models.

    abstract::Dynamical properties of proteins play an essential role in their function exertion. The elastic network model (ENM) is an effective and efficient tool in characterizing the intrinsic dynamical properties encoded in biomacromolecule structures. The Gaussian network model (GNM) and anisotropic network model (ANM) are th...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c01178

    authors: Gong W,Liu Y,Zhao Y,Wang S,Han Z,Li C

    更新日期:2021-01-26 00:00:00

  • Identification of novel potential antibiotics against Staphylococcus using structure-based drug screening targeting dihydrofolate reductase.

    abstract::The emergence of multidrug-resistant Staphylococcus aureus (S. aureus) makes the treatment of infectious diseases in hospitals more difficult and increases the mortality of the patients. In this study, we attempted to identify novel potent antibiotic candidate compounds against S. aureus dihydrofolate reductase (saDHF...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400686d

    authors: Kobayashi M,Kinjo T,Koseki Y,Bourne CR,Barrow WW,Aoki S

    更新日期:2014-04-28 00:00:00

  • Dynamics of noncovalent interactions in all-α and all-β class proteins: implications for the stability of amyloid aggregates.

    abstract::A fully folded functional protein is stabilized by several noncovalent interactions. When a protein undergoes conformational motions, the existing noncovalent interactions may be maintained. They may also break or new interactions may be formed. Knowledge of the dynamical nature of the different types of noncovalent i...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200302q

    authors: Jain A,Sankararamakrishnan R

    更新日期:2011-12-27 00:00:00

  • Probing the Binding Pathway of BRACO19 to a Parallel-Stranded Human Telomeric G-Quadruplex Using Molecular Dynamics Binding Simulation with AMBER DNA OL15 and Ligand GAFF2 Force Fields.

    abstract::Human telomeric DNA G-quadruplex has been identified as a good therapeutic target in cancer treatment. G-quadruplex-specific ligands that stabilize the G-quadruplex have great potential to be developed as anticancer agents. Two crystal structures (an apo form of parallel stranded human telomeric G-quadruplex and its h...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00287

    authors: Machireddy B,Kalra G,Jonnalagadda S,Ramanujachary K,Wu C

    更新日期:2017-11-27 00:00:00

  • Assessing the Protective Activity of a Recently Discovered Phenolic Compound against Oxidative Stress Using Computational Chemistry.

    abstract::The protection exerted by 3,5-dihydroxy-4-methoxybenzyl alcohol (DHMBA), a phenolic compound recently isolated from the Pacific oyster, against oxidative stress (OS) is investigated using the density functional theory. Our results indicate that DHMBA is an outstanding peroxyl radical scavenger, being about 15 times an...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00513

    authors: Villuendas-Rey Y,Alvarez-Idaboy JR,Galano A

    更新日期:2015-12-28 00:00:00

  • Combinatorial × computational × cheminformatics (C3) approach to characterization of congeneric libraries of organic pollutants.

    abstract::Congeners are molecules based on the same carbon skeleton but are different by the number of substituents and/or a substitution pattern. Examples are 1-chloronaphthalene, 1,4-dichloronaphthalene, and 1,3,8-trichloronaphthalene. Various persistent organic pollutants (POPs) exist in the environment as families of congen...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300289b

    authors: Haranczyk M,Urbaszek P,Ng EG,Puzyn T

    更新日期:2012-11-26 00:00:00

  • Simulation of 2D NMR Spectra of Carbohydrates Using GODESS Software.

    abstract::Glycan Optimized Dual Empirical Spectrum Simulation (GODESS) is a web service, which has been recently shown to be one of the most accurate tools for simulation of (1)H and (13)C 1D NMR spectra of natural carbohydrates and their derivatives. The new version of GODESS supports visualization of the simulated (1)H and (1...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00083

    authors: Kapaev RR,Toukach PV

    更新日期:2016-06-27 00:00:00

  • Use of surface charges from DFT calculations to predict intestinal absorption.

    abstract::A model for prediction of percent intestinal absorption (%Abs) of neutral molecules was developed based upon surface charges of the molecule calculated by density functional theory (DFT). The surface charges are decomposed into sigma moments which are correlated to a partition coefficient representing transfer of the ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci049653f

    authors: Jones R,Connolly PC,Klamt A,Diedenhofen M

    更新日期:2005-09-01 00:00:00

  • Search for novel aminoglycosides by combining fragment-based virtual screening and 3D-QSAR scoring.

    abstract::Aminoglycosides are antibiotics targeting the 16S RNA A site of the bacterial ribosome. There have been many efforts directed toward design of their synthetic derivatives, however with only few successes. As RNA binders, aminoglycosides are also a difficult target for computational drug design, since most of the exist...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci800361a

    authors: Setny P,Trylska J

    更新日期:2009-02-01 00:00:00

  • Database of Nuclear Independent Chemical Shifts (NICS) versus NICSZZ of Polycyclic Aromatic Hydrocarbons (PAHs).

    abstract::In the present contribution, we have developed a database, called the FAR-database, where the acronym FAR stands for Fused Aromatic Rings, which presents the results of nuclear independent chemical shifts calculations, NICS(0), NICS(1), NICS(0)ZZ, and NICS(1)ZZ, of 660 neutral benzenoid-PAHs and cyclopenta-fused PAHs....

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00909

    authors: Alvarez-Ramírez F,Ruiz-Morales Y

    更新日期:2020-02-24 00:00:00

  • Efficient Strategy for the Calculation of Solvation Free Energies in Water and Chloroform at the Quantum Mechanical/Molecular Mechanical Level.

    abstract::The partitioning of solute molecules between immiscible solvents with significantly different polarities is of great importance. The polarization between the solute and solvent molecules plays an essential role in determining the solubility of the solute, which makes computational studies utilizing molecular mechanics...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00001

    authors: Wang M,Li P,Jia X,Liu W,Shao Y,Hu W,Zheng J,Brooks BR,Mei Y

    更新日期:2017-10-23 00:00:00

  • Ligand-Based Discovery of a New Scaffold for Allosteric Modulation of the μ-Opioid Receptor.

    abstract::With the hope of discovering effective analgesics with fewer side effects, attention has recently shifted to allosteric modulators of the opioid receptors. In the past two years, the first chemotypes of positive or silent allosteric modulators (PAMs or SAMs, respectively) of μ- and δ-opioid receptor types have been re...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00388

    authors: Bisignano P,Burford NT,Shang Y,Marlow B,Livingston KE,Fenton AM,Rockwell K,Budenholzer L,Traynor JR,Gerritz SW,Alt A,Filizola M

    更新日期:2015-09-28 00:00:00

  • Benchmark Sets for Binding Hot Spot Identification in Fragment-Based Ligand Discovery.

    abstract::Binding hot spots are regions of proteins that, due to their potentially high contribution to the binding free energy, have high propensity to bind small molecules. We present benchmark sets for testing computational methods for the identification of binding hot spots with emphasis on fragment-based ligand discovery. ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00877

    authors: Wakefield AE,Yueh C,Beglov D,Castilho MS,Kozakov D,Keserű GM,Whitty A,Vajda S

    更新日期:2020-12-28 00:00:00

  • Tyrosine Regulates β-Sheet Structure Formation in Amyloid-β42: A New Clustering Algorithm for Disordered Proteins.

    abstract::Our recent studies show that the single Tyr residue in the sequence of amyloid-β42 (Aβ42) is reactive toward various ligands, including metals and adenosine trisphospate (see: Coskuner , O. J. Biol. Inorg. Chem. 2016 , 21 , 957 - 973 and Coskuner , O. ; Murray , I. V. J. J. Alzheimer's Dis. 2014 , 41 , 561 - 574 ). Ho...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00761

    authors: Coskuner O,Uversky VN

    更新日期:2017-06-26 00:00:00

  • BCL::MolAlign: Three-Dimensional Small Molecule Alignment for Pharmacophore Mapping.

    abstract::Small molecule flexible alignment is a critical component of both ligand- and structure-based methods in computer-aided drug discovery. Despite its importance, the availability of high-quality flexible alignment software packages is limited. Here, we present BCL::MolAlign, a freely available property-based molecular a...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00020

    authors: Brown BP,Mendenhall J,Meiler J

    更新日期:2019-02-25 00:00:00

  • Identifying biologically active compound classes using phenotypic screening data and sampling statistics.

    abstract::Scoring the activity of compounds in phenotypic high-throughput assays presents a unique challenge because of the limited resolution and inherent measurement error of these assays. Techniques that leverage the structural similarity of compounds within an assay can be used to improve the hit-recovery rate from screenin...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci050087d

    authors: Klekota J,Brauner E,Schreiber SL

    更新日期:2005-11-01 00:00:00

  • In silico drug screening approach for the design of magic bullets: a successful example with anti-HIV fullerene derivatized amino acids.

    abstract::A database has been derived from recently reported [60]fullerene derivatives, and their binding scores with HIV-1 PR have been computed using docking techniques. Computational methods have been used to predict which derivatives may have high binding affinities, and for these compounds biological tests have been perfor...

    journal_title:Journal of chemical information and modeling

    pub_type: 信件

    doi:10.1021/ci900047s

    authors: Durdagi S,Supuran CT,Strom TA,Doostdar N,Kumar MK,Barron AR,Mavromoustakos T,Papadopoulos MG

    更新日期:2009-05-01 00:00:00

  • Structural insight into the unique binding properties of pyridylethanol(phenylethyl)amine inhibitor in human CYP51.

    abstract::Sterol 14α-demethylase (CYP51) is the main drug target for the treatment of fungal infections. The discovery of new efficient fungal CYP51 inhibitors requires an understanding of the structural requirements for selectivity for the fungal over the human ortholog. In this study, a binding mode of the pyridylethanol(phen...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500556k

    authors: Zelenko U,Hodošček M,Rozman D,Golič Grdadolnik S

    更新日期:2014-12-22 00:00:00