Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models.

Abstract:

:Machine learning has exhibited powerful capabilities in many areas. However, machine learning models are mostly database dependent, requiring a new model if the database changes. Therefore, a universal model is highly desired to accommodate the widest variety of databases. Fortunately, this universality may be achieved by ensemble learning, which can integrate multiple learners to meet the demands of diversified databases. Therefore, we propose a general procedure for learning ensemble establishment based on noncovalent interactions (NCIs) databases. Additionally, accurate NCI computation is quite demanding for first-principles methods, for which a competent machine learning model can be an efficient solution to obtain high NCI accuracy with minimal computational resources. In regard to these aspects, multiple schemes of ensemble learning models (Bagging, Boosting, and Stacking frameworks), are explored in this study. The models are based on various low levels of density functional theory (DFT) calculations for the benchmark databases S66, S22, and X40. All NCIs computed by the DFT calculations can be improved to high-level accuracy (root-mean-square error RMSE = 0.22 kcal/mol in contrast to CCSD(T)/CBS benchmark) by established ensemble learning models. Compared with single machine learning models, ensemble models show better accuracy (RMSE of the best model is further lowered by ∼25%), robustness and goodness-of-fit according to evaluation parameters suggested by the OECD. Among ensemble learning models, heterogeneous Stacking ensemble models show the most valuable application potential. The standardized procedure of constructing learning ensembles has been well utilized on several NCI data sets, and this procedure may also be applicable for other chemical databases.

journal_name

J Chem Inf Model

authors

Li W,Miao W,Cui J,Fang C,Su S,Li H,Hu L,Lu Y,Chen G

doi

10.1021/acs.jcim.8b00878

subject

Has Abstract

pub_date

2019-05-28 00:00:00

pages

1849-1857

issue

5

eissn

1549-9596

issn

1549-960X

journal_volume

59

pub_type

杂志文章
  • Plant Metabolite Databases: From Herbal Medicines to Modern Drug Discovery.

    abstract::Traditional herbal medicine has been an inseparable part of the traditional medical science in many countries throughout history. Nowadays, the popularity of using herbal medicines in daily life, as well as clinical practices, has gradually expanded to numerous Western countries with positive impacts and acceptance. T...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00826

    authors: Nguyen-Vo TH,Nguyen L,Do N,Nguyen TN,Trinh K,Cao H,Le L

    更新日期:2020-03-23 00:00:00

  • Assessing the Conformational Equilibrium of Carboxylic Acid via Quantum Mechanical and Molecular Dynamics Studies on Acetic Acid.

    abstract::Accurate hydrogen placement in molecular modeling is crucial for studying the interactions and dynamics of biomolecular systems. The carboxyl functional group is a prototypical example of a functional group that requires protonation during structure preparation. To our knowledge, when in their neutral form, carboxylic...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00835

    authors: Lim VT,Bayly CI,Fusti-Molnar L,Mobley DL

    更新日期:2019-05-28 00:00:00

  • Interpretation of Quantitative Structure-Activity Relationship Models: Past, Present, and Future.

    abstract::This paper is an overview of the most significant and impactful interpretation approaches of quantitative structure-activity relationship (QSAR) models, their development, and application. The evolution of the interpretation paradigm from "model → descriptors → (structure)" to "model → structure" is indicated. The lat...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章,评审

    doi:10.1021/acs.jcim.7b00274

    authors: Polishchuk P

    更新日期:2017-11-27 00:00:00

  • A Coarse-Grained Force Field Parameterized for MgCl2 and CaCl2 Aqueous Solutions.

    abstract::Calcium and magnesium ions play important roles in many physicochemical processes. To facilitate the investigation of phenomena related to these ions that occur over large length and time scales, a coarse-grained force field (CGFF) is developed for MgCl2 and CaCl2 aqueous solutions. The ions are modeled by CG beads wi...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00206

    authors: Gong Z,Sun H

    更新日期:2017-07-24 00:00:00

  • Molecular Structure Extraction from Documents Using Deep Learning.

    abstract::Chemical structure extraction from documents remains a hard problem because of both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally but still routinely encounter s...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00669

    authors: Staker J,Marshall K,Abel R,McQuaw CM

    更新日期:2019-03-25 00:00:00

  • Ensemble feature selection: consistent descriptor subsets for multiple QSAR models.

    abstract::Selecting a small subset of descriptors from a large pool to build a predictive quantitative structure-activity relationship (QSAR) model is an important step in the QSAR modeling process. In general, subset selection is very hard to solve, even approximately, with guaranteed performance bounds. Traditional approaches...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci600563w

    authors: Dutta D,Guha R,Wild D,Chen T

    更新日期:2007-05-01 00:00:00

  • Linear and nonlinear 3D-QSAR approaches in tandem with ligand-based homology modeling as a computational strategy to depict the pyrazolo-triazolo-pyrimidine antagonists binding site of the human adenosine A2A receptor.

    abstract::The integration of ligand- and structure-based strategies might sensitively increase the success of drug discovery process. We have recently described the application of Molecular Electrostatic Potential autocorrelated vectors (autoMEPs) in generating both linear (Partial Least-Square, PLS) and nonlinear (Response Sur...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700300w

    authors: Michielan L,Bacilieri M,Schiesaro A,Bolcato C,Pastorin G,Spalluto G,Cacciari B,Klotz KN,Kaseda C,Moro S

    更新日期:2008-02-01 00:00:00

  • Discovery of wild-type and Y181C mutant non-nucleoside HIV-1 reverse transcriptase inhibitors using virtual screening with multiple protein structures.

    abstract::To discover non-nucleoside inhibitors of HIV-1 reverse transcriptase (NNRTIs) that are effective against both wild-type (WT) virus and variants that encode the clinically troublesome Tyr181Cys (Y181C) RT mutation, virtual screening by docking was carried out using three RT structures and more than 2 million commercial...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci900068k

    authors: Nichols SE,Domaoal RA,Thakur VV,Tirado-Rives J,Anderson KS,Jorgensen WL

    更新日期:2009-05-01 00:00:00

  • Structural and Functional Characterization of Allatostatin Receptor Type-C of Thaumetopoea pityocampa, a Potential Target for Next-Generation Pest Control Agents.

    abstract::Insect neuropeptide receptors, including allatostatin receptor type C (AstR-C), a G protein-coupled receptor, are among the potential targets for designing next-generation pesticides that despite their importance in offering a new mode-of-action have been overlooked. Focusing on AstR-C of Thaumetopoea pityocampa, a co...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00985

    authors: Shahraki A,Işbilir A,Dogan B,Lohse MJ,Durdagi S,Birgul-Iyison N

    更新日期:2021-01-21 00:00:00

  • Probabilistic models for capturing more physicochemical properties on protein-protein interface.

    abstract::Protein-protein interactions play a key role in a multitude of biological processes, such as signal transduction, de novo drug design, immune responses, and enzymatic activities. It is of great interest to understand how proteins interact with each other. The general approach is to explore all possible poses and ident...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci5002372

    authors: Guo F,Li SC,Du P,Wang L

    更新日期:2014-06-23 00:00:00

  • Underestimated Halogen Bonds Forming with Protein Backbone in Protein Data Bank.

    abstract::Halogen bonds (XBs) are attracting increasing attention in biological systems. Protein Data Bank (PDB) archives experimentally determined XBs in biological macromolecules. However, no software for structure refinement in X-ray crystallography takes into account XBs, which might result in the weakening or even vanishin...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00235

    authors: Zhang Q,Xu Z,Shi J,Zhu W

    更新日期:2017-07-24 00:00:00

  • HLA-DM Stabilizes the Empty MHCII Binding Groove: A Model Using Customized Natural Move Monte Carlo.

    abstract::MHC class II molecules bind peptides derived from extracellular proteins that have been ingested by antigen-presenting cells and display them to the immune system. Peptide loading occurs within the antigen-presenting cell and is facilitated by HLA-DM. HLA-DM stabilizes the open conformation of the MHCII binding groove...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00104

    authors: Demharter S,Knapp B,Deane C,Minary P

    更新日期:2019-06-24 00:00:00

  • The assembly-inducing laulimalide/peloruside a binding site on tubulin: molecular modeling and biochemical studies with [³H]peloruside A.

    abstract::We used synthetic peloruside A for the commercial preparation of [³H]peloruside A. The radiolabeled compound bound to preformed tubulin polymer in amounts stoichiometric with the polymer's tubulin content, with an apparent K(d) value of 0.35 μM. A less active peloruside A analogue, (11-R)-peloruside A and laulimalide ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci1002894

    authors: Nguyen TL,Xu X,Gussio R,Ghosh AK,Hamel E

    更新日期:2010-11-22 00:00:00

  • FORTRAN interface for code interoperability in quantum chemistry: the Q5Cost library.

    abstract::Ab initio quantum-chemistry programs produce and use large amounts of data, which are usually stored on disk in the form of binary files. A FORTRAN library, named Q5Cost, has been designed and implemented in order to allow the storage of these data sets in a special data format built with the HDF5 technology. This dat...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci7000567

    authors: Borini S,Monari A,Rossi E,Tajti A,Angeli C,Bendazzoli GL,Cimiraglia R,Emerson A,Evangelisti S,Maynau D,Sanchez-Marin J,Szalay PG

    更新日期:2007-05-01 00:00:00

  • Systematics of high-genus fullerenes.

    abstract::In this article, we present a systematic way to classify a family of high-genus fullerenes (HGFs) by decomposing them into two types of necklike structures, which are the negatively curved parts of parent toroidal carbon nanotubes. By replacing the faces of a uniform polyhedron with these necks, an HGF polyhedron corr...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci9001124

    authors: Chuang C,Jin BY

    更新日期:2009-07-01 00:00:00

  • Assessment of the Cruzain Cysteine Protease Reversible and Irreversible Covalent Inhibition Mechanism.

    abstract::Reversible and irreversible covalent ligands are advanced cysteine protease inhibitors in the drug development pipeline. K777 is an irreversible inhibitor of cruzain, a necessary enzyme for the survival of the Trypanosoma cruzi (T. cruzi) parasite, the causative agent of Chagas disease. Despite their importance, irrev...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b01138

    authors: Silva JRA,Cianni L,Araujo D,Batista PHJ,de Vita D,Rosini F,Leitão A,Lameira J,Montanari CA

    更新日期:2020-03-23 00:00:00

  • Simulation of 2D NMR Spectra of Carbohydrates Using GODESS Software.

    abstract::Glycan Optimized Dual Empirical Spectrum Simulation (GODESS) is a web service, which has been recently shown to be one of the most accurate tools for simulation of (1)H and (13)C 1D NMR spectra of natural carbohydrates and their derivatives. The new version of GODESS supports visualization of the simulated (1)H and (1...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00083

    authors: Kapaev RR,Toukach PV

    更新日期:2016-06-27 00:00:00

  • Structure-activity relationships in non-ligand binding pocket (non-LBP) diarylhydrazide antiandrogens.

    abstract::We report the synthesis and a study of the structure-activity relationships of a new series of diarylhydrazides as potential selective non-ligand binding pocket androgen receptor antagonists. Their biological activity as antiandrogens in the context of the development of treatments for castration resistant prostate ca...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400189m

    authors: Caboni L,Egan B,Kelly B,Blanco F,Fayne D,Meegan MJ,Lloyd DG

    更新日期:2013-08-26 00:00:00

  • Ligand coordinate analysis of SC-558 from the active site to the surface of COX-2: a molecular dynamics study.

    abstract::We have performed a ligand coordinate analysis to monitor the movement of the inhibitor SC-558 from the active site of the COX-2 protein to the exterior using molecular dynamics techniques. This study provides an insight into the intermolecular interactions formed by the ligand during this journey. The published cryst...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci050142i

    authors: Sai Ram KV,Rambabu G,Sarma JA,Desiraju GR

    更新日期:2006-07-01 00:00:00

  • Polarizable Force Field for Molecular Ions Based on the Classical Drude Oscillator.

    abstract::Development of accurate force field parameters for molecular ions in the context of a polarizable energy function based on the classical Drude oscillator is a crucial step toward an accurate polarizable model for modeling and simulations of biological macromolecules. Toward this goal we have undertaken a hierarchical ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00132

    authors: Lin FY,Lopes PEM,Harder E,Roux B,MacKerell AD Jr

    更新日期:2018-05-29 00:00:00

  • Identification of ligand templates using local structure alignment for structure-based drug design.

    abstract::With a rapid increase in the number of high-resolution protein-ligand structures, the known protein-ligand structures can be used to gain insight into ligand-binding modes in a target protein. On the basis of the fact that the structurally similar binding sites share information about their ligands, we have developed ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300178e

    authors: Lee HS,Im W

    更新日期:2012-10-22 00:00:00

  • Searching for New Leads To Treat Epilepsy: Target-Based Virtual Screening for the Discovery of Anticonvulsant Agents.

    abstract::The purpose of this investigation is to contribute to the development of new anticonvulsant drugs to treat patients with refractory epilepsy. We applied a virtual screening protocol that involved the search into molecular databases of new compounds and known drugs to find small molecules that interact with the open co...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00721

    authors: Palestro PH,Enrique N,Goicoechea S,Villalba ML,Sabatier LL,Martin P,Milesi V,Bruno Blanch LE,Gavernet L

    更新日期:2018-07-23 00:00:00

  • Role of water in ligand binding to maltose-binding protein: insight from a new docking protocol based on the 3D-RISM-KH molecular theory of solvation.

    abstract::Maltose-binding protein is a periplasmic binding protein responsible for transport of maltooligosaccarides through the periplasmic space of Gram-negative bacteria, as a part of the ABC transport system. The molecular mechanisms of the initial ligand binding and induced large scale motion of the protein's domains still...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500520q

    authors: Huang W,Blinov N,Wishart DS,Kovalenko A

    更新日期:2015-02-23 00:00:00

  • Training a scoring function for the alignment of small molecules.

    abstract::A comprehensive data set of aligned ligands with highly similar binding pockets from the Protein Data Bank has been built. Based on this data set, a scoring function for recognizing good alignment poses for small molecules has been developed. This function is based on atoms and hydrogen-bond projected features. The co...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci100227h

    authors: Chan SL,Labute P

    更新日期:2010-09-27 00:00:00

  • Benchmark performance of MultiCASE Inc. software in Ames mutagenicity set.

    abstract::The predictive performances of MC4PC were evaluated using its learning machine functionality. Its superior characteristics are demonstrated in this following up study using the newly published Ames mutagenicity benchmark set. ...

    journal_title:Journal of chemical information and modeling

    pub_type: 评论,信件

    doi:10.1021/ci1000899

    authors: Saiakhov RD,Klopman G

    更新日期:2010-09-27 00:00:00

  • In silico analysis of the thermodynamic stability changes of psychrophilic and mesophilic alpha-amylases upon exhaustive single-site mutations.

    abstract::Identifying sequence modifications that distinguish psychrophilic from mesophilic proteins is important for designing enzymes with different thermodynamic stabilities and to understand the underlying mechanisms. The PoPMuSiC algorithm is used to introduce, in silico, all the single-site mutations in four mesophilic an...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci050473v

    authors: Gilis D

    更新日期:2006-05-01 00:00:00

  • RosENet: Improving Binding Affinity Prediction by Leveraging Molecular Mechanics Energies with an Ensemble of 3D Convolutional Neural Networks.

    abstract::The worldwide increase and proliferation of drug resistant microbes, coupled with the lag in new drug development, represents a major threat to human health. In order to reduce the time and cost for exploring the chemical search space, drug discovery increasingly relies on computational biology approaches. One key ste...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00075

    authors: Hassan-Harrirou H,Zhang C,Lemmin T

    更新日期:2020-06-22 00:00:00

  • Combinatorial × computational × cheminformatics (C3) approach to characterization of congeneric libraries of organic pollutants.

    abstract::Congeners are molecules based on the same carbon skeleton but are different by the number of substituents and/or a substitution pattern. Examples are 1-chloronaphthalene, 1,4-dichloronaphthalene, and 1,3,8-trichloronaphthalene. Various persistent organic pollutants (POPs) exist in the environment as families of congen...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300289b

    authors: Haranczyk M,Urbaszek P,Ng EG,Puzyn T

    更新日期:2012-11-26 00:00:00

  • Spatial sign preprocessing: a simple way to impart moderate robustness to multivariate estimators.

    abstract::The spatial sign is a multivariate extension of the concept of sign. Recently multivariate estimators of covariance structures based on spatial signs have been examined by various authors. These new estimators are found to be robust to outlying observations. From a computational point of view, estimators based on spat...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci050498u

    authors: Serneels S,De Nolf E,Van Espen PJ

    更新日期:2006-05-01 00:00:00

  • Probing the Binding Pathway of BRACO19 to a Parallel-Stranded Human Telomeric G-Quadruplex Using Molecular Dynamics Binding Simulation with AMBER DNA OL15 and Ligand GAFF2 Force Fields.

    abstract::Human telomeric DNA G-quadruplex has been identified as a good therapeutic target in cancer treatment. G-quadruplex-specific ligands that stabilize the G-quadruplex have great potential to be developed as anticancer agents. Two crystal structures (an apo form of parallel stranded human telomeric G-quadruplex and its h...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00287

    authors: Machireddy B,Kalra G,Jonnalagadda S,Ramanujachary K,Wu C

    更新日期:2017-11-27 00:00:00