What Does the Machine Learn? Knowledge Representations of Chemical Reactivity.

Abstract:

:In a departure from conventional chemical approaches, data-driven models of chemical reactions have recently been shown to be statistically successful using machine learning. These models, however, are largely black box in character and have not provided the kind of chemical insights that historically advanced the field of chemistry. To examine the knowledgebase of machine-learning models-what does the machine learn-this article deconstructs black-box machine-learning models of a diverse chemical reaction data set. Through experimentation with chemical representations and modeling techniques, the analysis provides insights into the nature of how statistical accuracy can arise, even when the model lacks informative physical principles. By peeling back the layers of these complicated models we arrive at a minimal, chemically intuitive model (and no machine learning involved). This model is based on systematic reaction-type classification and Evans-Polanyi relationships within reaction types which are easily visualized and interpreted. Through exploring this simple model, we gain deeper understanding of the data set and uncover a means for expert interactions to improve the model's reliability.

journal_name

J Chem Inf Model

authors

Kammeraad JA,Goetz J,Walker EA,Tewari A,Zimmerman PM

doi

10.1021/acs.jcim.9b00721

subject

Has Abstract

pub_date

2020-03-23 00:00:00

pages

1290-1301

issue

3

eissn

1549-9596

issn

1549-960X

journal_volume

60

pub_type

杂志文章
  • Ligand-based molecular modeling study on a chemically diverse series of cholecystokinin-B/gastrin receptor antagonists: generation of predictive model.

    abstract::Pharmacophore hypotheses were developed for six structurally diverse series of cholecystokinin-B/gastrin receptor (CCK-BR) antagonists. A training set consisting of 33 compounds was carefully selected. The activity spread of the training set molecules was from 0.1 to 2100 nM. The most predictive pharmacophore model (h...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci050257m

    authors: Chopra M,Mishra AK

    更新日期:2005-11-01 00:00:00

  • Determining the validity of a QSAR model--a classification approach.

    abstract::The determination of the validity of a QSAR model when applied to new compounds is an important concern in the field of QSAR and QSPR modeling. Various scoring techniques can be applied to specific types of models. We present a technique with which we can state whether a new compound will be well predicted by a previo...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0497511

    authors: Guha R,Jurs PC

    更新日期:2005-01-01 00:00:00

  • Pharmacophore-based virtual screening and experimental validation of novel inhibitors against cyanobacterial fructose-1,6-/sedoheptulose-1,7-bisphosphatase.

    abstract::Cyanobacterial fructose-1,6-/sedoheptulose-1,7-bisphoshatase (cy-FBP/SBPase) is a potential enzymatic target for screening of novel inhibitors that can combat harmful algal blooms. In the present study, we targeted the substrate binding pocket of cy-FBP/SBPase. A series of novel hit compounds from the SPECs database w...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci4007529

    authors: Sun Y,Zhang R,Li D,Feng L,Wu D,Feng L,Huang P,Ren Y,Feng J,Xiao S,Wan J

    更新日期:2014-03-24 00:00:00

  • Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection.

    abstract::The evaluation of regression QSAR model performance, in fitting, robustness, and external prediction, is of pivotal importance. Over the past decade, different external validation parameters have been proposed: Q(F1)(2), Q(F2)(2), Q(F3)(2), r(m)(2), and the Golbraikh-Tropsha method. Recently, the concordance correlati...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300084j

    authors: Chirico N,Gramatica P

    更新日期:2012-08-27 00:00:00

  • De Novo Drug Design of Targeted Chemical Libraries Based on Artificial Intelligence and Pair-Based Multiobjective Optimization.

    abstract::Artificial intelligence and multiobjective optimization represent promising solutions to bridge chemical and biological landscapes by addressing the automated de novo design of compounds as a result of a humanlike creative process. In the present study, we conceived a novel pair-based multiobjective approach implement...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00517

    authors: Domenico A,Nicola G,Daniela T,Fulvio C,Nicola A,Orazio N

    更新日期:2020-10-26 00:00:00

  • Retrospect and Prospect of Single Particle Cryo-Electron Microscopy: The Class of Integral Membrane Proteins as an Example.

    abstract::A giant technological leap in the field of cryo-electron microscopy (cryo-EM) has assured the achievement of near-atomic resolution structures of biological macromolecules. As a recognition of this accomplishment, the Nobel Prize in Chemistry was awarded in 2017 to Jacques Dubochet, Joachim Frank, and Richard Henderso...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b01015

    authors: Akbar S,Mozumder S,Sengupta J

    更新日期:2020-05-26 00:00:00

  • Physicochemical stereodescriptors of atomic chiral centers.

    abstract::Physicochemical atomic stereodescriptors (PAS) were implemented that represent the chirality of an atomic chiral center on the basis of empirical physicochemical properties of the ligands. The ligands are ranked according to a specific property, and the chiral center takes an S/R-like descriptor relative to that prope...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci600235w

    authors: Zhang QY,Aires-de-Sousa J

    更新日期:2006-11-01 00:00:00

  • Ensemble feature selection: consistent descriptor subsets for multiple QSAR models.

    abstract::Selecting a small subset of descriptors from a large pool to build a predictive quantitative structure-activity relationship (QSAR) model is an important step in the QSAR modeling process. In general, subset selection is very hard to solve, even approximately, with guaranteed performance bounds. Traditional approaches...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci600563w

    authors: Dutta D,Guha R,Wild D,Chen T

    更新日期:2007-05-01 00:00:00

  • Knowledge-based scoring functions in drug design: 2. Can the knowledge base be enriched?

    abstract::Fast and accurate predicting of the binding affinities of large sets of diverse protein−ligand complexes is an important, yet extremely challenging, task in drug discovery. The development of knowledge-based scoring functions exploiting structural information of known protein−ligand complexes represents a valuable con...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci100343j

    authors: Shen Q,Xiong B,Zheng M,Luo X,Luo C,Liu X,Du Y,Li J,Zhu W,Shen J,Jiang H

    更新日期:2011-02-28 00:00:00

  • Ligand- and structure-based virtual screening for clathrodin-derived human voltage-gated sodium channel modulators.

    abstract::Voltage-gated sodium channels (VGSC) are attractive targets for drug discovery because of the broad therapeutic potential of their modulators. On the basis of the structure of marine alkaloid clathrodin, we have recently discovered novel subtype-selective VGSC modulators I and II that were used as starting points for ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400505e

    authors: Tomašić T,Hartzoulakis B,Zidar N,Chan F,Kirby RW,Madge DJ,Peigneur S,Tytgat J,Kikelj D

    更新日期:2013-12-23 00:00:00

  • FAME 3: Predicting the Sites of Metabolism in Synthetic Compounds and Natural Products for Phase 1 and Phase 2 Metabolic Enzymes.

    abstract::In this work we present the third generation of FAst MEtabolizer (FAME 3), a collection of extra trees classifiers for the prediction of sites of metabolism (SoMs) in small molecules such as drugs, druglike compounds, natural products, agrochemicals, and cosmetics. FAME 3 was derived from the MetaQSAR database ( Pedre...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00376

    authors: Šícho M,Stork C,Mazzolari A,de Bruyn Kops C,Pedretti A,Testa B,Vistoli G,Svozil D,Kirchmair J

    更新日期:2019-08-26 00:00:00

  • Customizable Generation of Synthetically Accessible, Local Chemical Subspaces.

    abstract::Screening large libraries of chemicals has been an efficient strategy to discover bioactive compounds; however a portion of the potential for success is limited to the available libraries. Synergizing combinatorial and computational chemistries has emerged as a time-efficient strategy to explore the chemical space mor...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00648

    authors: Pottel J,Moitessier N

    更新日期:2017-03-27 00:00:00

  • New combined model for the prediction of regioselectivity in cytochrome P450/3A4 mediated metabolism.

    abstract::Cytochrome P450 3A4 metabolizes nearly 50% of the drugs currently in clinical use with a broad range of substrate specificity. Early prediction of metabolites of xenobiotic compounds is crucial for cost efficient drug discovery and development. We developed a new combined model, MLite, for the prediction of regioselec...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci7003576

    authors: Oh WS,Kim DN,Jung J,Cho KH,No KT

    更新日期:2008-03-01 00:00:00

  • Modeling Boronic Acid Based Fluorescent Saccharide Sensors: Computational Investigation of d-Fructose Binding to Dimethylaminomethylphenylboronic Acid.

    abstract::Designing organic saccharide sensors for use in aqueous solution is a nontrivial endeavor. Incorporation of hydrogen bonding groups on a sensor's receptor unit to target saccharides is an obvious strategy but not one that is likely to ensure analyte-receptor interactions over analyte-solvent or receptor-solvent intera...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00987

    authors: Kearns FL,Robart C,Kemp MT,Vankayala SL,Chapin BM,Anslyn EV,Woodcock HL,Larkin JD

    更新日期:2019-05-28 00:00:00

  • Determination of Structural Ensembles of Flexible Molecules in Solution from NMR Data Undergoing Spin Diffusion.

    abstract::Spin diffusion is a formidable problem when interpreting NMR data of chemical compounds. We developed a method to reconstruct the conformational ensemble of flexible molecules displaying spin diffusion, which minimizes the subjective bias in the interpretation of experimental data and which can be used routinely to ob...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00259

    authors: Vasile F,Tiana G

    更新日期:2019-06-24 00:00:00

  • Probabilistic models for capturing more physicochemical properties on protein-protein interface.

    abstract::Protein-protein interactions play a key role in a multitude of biological processes, such as signal transduction, de novo drug design, immune responses, and enzymatic activities. It is of great interest to understand how proteins interact with each other. The general approach is to explore all possible poses and ident...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci5002372

    authors: Guo F,Li SC,Du P,Wang L

    更新日期:2014-06-23 00:00:00

  • Probing fragment complementation by rigid-body docking: in silico reconstitution of calbindin D9k.

    abstract::Fragment complementation is gaining an increasing impact as a nonperturbing method to probe noncovalent interactions within protein supersecondary structures. In this study, the fast Fourier transform rigid-body docking algorithm ZDOCK has been employed for in silico reconstitution of the calcium binding protein calbi...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0501995

    authors: Dell'Orco D,Seeber M,De Benedetti PG,Fanelli F

    更新日期:2005-09-01 00:00:00

  • Structure-activity relationships in non-ligand binding pocket (non-LBP) diarylhydrazide antiandrogens.

    abstract::We report the synthesis and a study of the structure-activity relationships of a new series of diarylhydrazides as potential selective non-ligand binding pocket androgen receptor antagonists. Their biological activity as antiandrogens in the context of the development of treatments for castration resistant prostate ca...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400189m

    authors: Caboni L,Egan B,Kelly B,Blanco F,Fayne D,Meegan MJ,Lloyd DG

    更新日期:2013-08-26 00:00:00

  • Structural basis for the mutation-induced dysfunction of human CYP2J2: a computational study.

    abstract::Arachidonic acid is an essential fatty acid in cells, acting as a key inflammatory intermediate in inflammatory reactions. In cardiac tissues, CYP2J2 can adopt arachidonic acid as a major substrate to produce epoxyeicosatrienoic acids (EETs), which can protect endothelial cells from ischemic or hypoxic injuries and ha...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400003p

    authors: Cong S,Ma XT,Li YX,Wang JF

    更新日期:2013-06-24 00:00:00

  • Characterization of DNA primary sequences by a new similarity/diversity measure based on the partial ordering.

    abstract::The similarity/diversity measures play a fundamental role in library searching, virtual screening, and quantitative structure-activity relationship/quantitative structure-property relationship modeling as well as in genomics and proteomics. In this paper, a new similarity/diversity measure is proposed as a new approac...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci060099e

    authors: Todeschini R,Consonni V,Mauri A,Ballabio D

    更新日期:2006-09-01 00:00:00

  • Loop Grafting between Similar Local Environments for Fc-Silent Antibodies.

    abstract::Reduction of the affinity of the fragment crystallizable (Fc) region with immune receptors by substitution of one or a few amino acids, known as Fc-silencing, is an established approach to reduce the immune effector functions of monoclonal antibody therapeutics. This approach to Fc-silencing, however, is problematic a...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b01198

    authors: Lešnik S,Hodošček M,Podobnik B,Konc J

    更新日期:2020-11-23 00:00:00

  • Tautomer Standardization in Chemical Databases: Deriving Business Rules from Quantum Chemistry.

    abstract::Databases of small, potentially bioactive molecules are ubiquitous across the industry and academia. Designed such that each unique compound should appear only once, the multiplicity of ways in which many compounds can be represented means that these databases require methods for standardizing the representation of ch...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00232

    authors: Baker CM,Kidley NJ,Papachristos K,Hotson M,Carson R,Gravestock D,Pouliot M,Harrison J,Dowling A

    更新日期:2020-08-24 00:00:00

  • Rigorous Computational Study Reveals What Docking Overlooks: Double Trouble from Membrane Association in Protein Kinase C Modulators.

    abstract::Increasing protein kinase C (PKC) activity is of potential therapeutic value. Its activation involves an interaction between the C1 domain and diacylglycerol (DAG) at intracellular membrane surfaces; DAG mimetics hold promise as new drugs. We previously developed the isophthalate derivative HMI-1a3, an effective but h...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00624

    authors: Lautala S,Provenzani R,Koivuniemi A,Kulig W,Talman V,Róg T,Tuominen RK,Yli-Kauhaluoma J,Bunker A

    更新日期:2020-11-23 00:00:00

  • Pharmacophore identification, in silico screening, and virtual library design for inhibitors of the human factor Xa.

    abstract::Factor Xa inhibitors are innovative anticoagulant agents that provide a better safety/efficacy profile compared to other anticoagulative drugs. A chemical feature-based modeling approach was applied to identify crucial pharmacophore patterns from 3D crystal structures of inhibitors bound to human factor Xa (Pdb entrie...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci049778k

    authors: Krovat EM,Frühwirth KH,Langer T

    更新日期:2005-01-01 00:00:00

  • Allosteric Response of DNA Recognition Helices of Catabolite Activator Protein to cAMP and DNA Binding.

    abstract::The homodimeric catabolite activator protein (CAP) regulates the transcription of several bacterial genes based on the cellular concentration of cyclic adenosine monophosphate (cAMP). The binding of cAMP to CAP triggers allosteric communication between the cAMP binding domains (CBD) and DNA binding domains (DBD) of CA...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00617

    authors: Prabhakant A,Panigrahi A,Krishnan M

    更新日期:2020-12-28 00:00:00

  • Exploring Tunable Hyperparameters for Deep Neural Networks with Industrial ADME Data Sets.

    abstract::Deep learning has drawn significant attention in different areas including drug discovery. It has been proposed that it could outperform other machine learning algorithms, especially with big data sets. In the field of pharmaceutical industry, machine learning models are built to understand quantitative structure-acti...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00671

    authors: Zhou Y,Cahya S,Combs SA,Nicolaou CA,Wang J,Desai PV,Shen J

    更新日期:2019-03-25 00:00:00

  • Two model system of the alpha1A-adrenoceptor docked with selected ligands.

    abstract::In this study, we have developed a two model system to mimic the active and inactive states of a G-protein coupled receptor specifically the alpha1A adrenergic receptor. We have docked two agonists, epinephrine (phenylamine type) and oxymetazoline (imidazoline type), as well as two antagonists, prazosin and 5-methylur...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700026v

    authors: Asher WB,Hoskins SN,Slasor LA,Morris DH,Cook EM,Bautista DL

    更新日期:2007-09-01 00:00:00

  • Comparison of several molecular docking programs: pose prediction and virtual screening accuracy.

    abstract::Molecular docking programs are widely used modeling tools for predicting ligand binding modes and structure based virtual screening. In this study, six molecular docking programs (DOCK, FlexX, GLIDE, ICM, PhDOCK, and Surflex) were evaluated using metrics intended to assess docking pose and virtual screening accuracy. ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci900056c

    authors: Cross JB,Thompson DC,Rai BK,Baber JC,Fan KY,Hu Y,Humblet C

    更新日期:2009-06-01 00:00:00

  • Hidden active information in a random compound library: extraction using a pseudo-structure-activity relationship model.

    abstract::We propose a hypothesis that "a model of active compound can be provided by integrating information of compounds high-ranked by docking simulation of a random compound library". In our hypothesis, the inclusion of true active compounds in the high-ranked compound is not necessary. We regard the high-ranked compounds a...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci7003384

    authors: Fukunishi H,Teramoto R,Shimada J

    更新日期:2008-03-01 00:00:00

  • Direct Observation of β-Barrel Intermediates in the Self-Assembly of Toxic SOD128-38 and Absence in Nontoxic Glycine Mutants.

    abstract::Soluble low-molecular-weight oligomers formed during the early stage of amyloid aggregation are considered the major toxic species in amyloidosis. The structure-function relationship between oligomeric assemblies and the cytotoxicity in amyloid diseases are still elusive due to the heterogeneous and transient nature o...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c01319

    authors: Sun Y,Huang J,Duan X,Ding F

    更新日期:2021-01-14 00:00:00