Tautomer Standardization in Chemical Databases: Deriving Business Rules from Quantum Chemistry.

Abstract:

:Databases of small, potentially bioactive molecules are ubiquitous across the industry and academia. Designed such that each unique compound should appear only once, the multiplicity of ways in which many compounds can be represented means that these databases require methods for standardizing the representation of chemistry. This is commonly achieved through the use of "Chemistry Business Rules", sets of predefined rules that describe the "house style" of the database in question. At Syngenta, the historical approach to the design of chemistry business rules has been to focus on consistency of representation, with chemical relevance given secondary consideration. In this work, we overturn that convention. Through the use of quantum chemistry calculations, we define a set of chemistry business rules for tautomer standardization that reproduces gas-phase energetic preferences. We go on to show that, compared to our historic approach, this method yields tautomers that are in better agreement with those observed experimentally in condensed phases and that are better suited for use in predictive models.

journal_name

J Chem Inf Model

authors

Baker CM,Kidley NJ,Papachristos K,Hotson M,Carson R,Gravestock D,Pouliot M,Harrison J,Dowling A

doi

10.1021/acs.jcim.0c00232

subject

Has Abstract

pub_date

2020-08-24 00:00:00

pages

3781-3791

issue

8

eissn

1549-9596

issn

1549-960X

journal_volume

60

pub_type

杂志文章
  • Characterization of DNA primary sequences by a new similarity/diversity measure based on the partial ordering.

    abstract::The similarity/diversity measures play a fundamental role in library searching, virtual screening, and quantitative structure-activity relationship/quantitative structure-property relationship modeling as well as in genomics and proteomics. In this paper, a new similarity/diversity measure is proposed as a new approac...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci060099e

    authors: Todeschini R,Consonni V,Mauri A,Ballabio D

    更新日期:2006-09-01 00:00:00

  • Consensus QSAR models: do the benefits outweigh the complexity?

    abstract::This study has assessed the use of consensus regression, as compared to single multiple linear regression, models for the development of quantitative structure-activity relationships (QSARs). To provide a comparison, four data sets of varying size and complexity were analyzed: silastic membrane flux, toxicity of pheno...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700016d

    authors: Hewitt M,Cronin MT,Madden JC,Rowe PH,Johnson C,Obi A,Enoch SJ

    更新日期:2007-07-01 00:00:00

  • Benchmark data set for in silico prediction of Ames mutagenicity.

    abstract::Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci900161g

    authors: Hansen K,Mika S,Schroeter T,Sutter A,ter Laak A,Steger-Hartmann T,Heinrich N,Müller KR

    更新日期:2009-09-01 00:00:00

  • Discovery and Evaluation of Anti-Fibrinolytic Plasmin Inhibitors Derived from 5-(4-Piperidyl)isoxazol-3-ol (4-PIOL).

    abstract::Inhibition of plasmin has been found to effectively reduce fibrinolysis and to avoid hemorrhage. This can be achieved by addressing its kringle 1 domain with the known drug and lysine analogue tranexamic acid. Guided by shape similarities toward a previously discovered lead compound, 5-(4-piperidyl)isoxazol-3-ol, a se...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00255

    authors: Schmidt TC,Eriksson PO,Gustafsson D,Cosgrove D,Frølund B,Boström J

    更新日期:2017-07-24 00:00:00

  • Multidimensional Drift of Sequence Attributes and Functional Profiles in the Superfamily of the Three-Finger Proteins and Their Structural Homologues.

    abstract::Functional diversity of the three-finger-protein domain (TFPD) had been acquired via hypervariability of some sequence positions and extensive insertion/deletion of short AA-segments that caused multidimensional drift of several sequence attributes such as the overall (HI) and local hydrophobicity levels, the isoelect...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00322

    authors: Galat A

    更新日期:2015-09-28 00:00:00

  • Searching for recursively defined generic chemical patterns in nonenumerated fragment spaces.

    abstract::Retrieving molecules with specific structural features is a fundamental requirement of today's molecular database technologies. Estimates claim the chemical space relevant for drug discovery to be around 10⁶⁰ molecules. This figure is many orders of magnitude larger than the amount of molecules conventional databases ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400107k

    authors: Ehrlich HC,Henzler AM,Rarey M

    更新日期:2013-07-22 00:00:00

  • Hidden active information in a random compound library: extraction using a pseudo-structure-activity relationship model.

    abstract::We propose a hypothesis that "a model of active compound can be provided by integrating information of compounds high-ranked by docking simulation of a random compound library". In our hypothesis, the inclusion of true active compounds in the high-ranked compound is not necessary. We regard the high-ranked compounds a...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci7003384

    authors: Fukunishi H,Teramoto R,Shimada J

    更新日期:2008-03-01 00:00:00

  • Molecular Structure-Based Large-Scale Prediction of Chemical-Induced Gene Expression Changes.

    abstract::The quantitative structure-activity relationship (QSAR) approach has been used to model a wide range of chemical-induced biological responses. However, it had not been utilized to model chemical-induced genomewide gene expression changes until very recently, owing to the complexity of training and evaluating a very la...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00281

    authors: Liu R,AbdulHameed MDM,Wallqvist A

    更新日期:2017-09-25 00:00:00

  • In silico target predictions: defining a benchmarking data set and comparison of performance of the multiclass Naïve Bayes and Parzen-Rosenblatt window.

    abstract::In this study, two probabilistic machine-learning algorithms were compared for in silico target prediction of bioactive molecules, namely the well-established Laplacian-modified Naïve Bayes classifier (NB) and the more recently introduced (to Cheminformatics) Parzen-Rosenblatt Window. Both classifiers were trained in ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300435j

    authors: Koutsoukas A,Lowe R,Kalantarmotamedi Y,Mussa HY,Klaffke W,Mitchell JB,Glen RC,Bender A

    更新日期:2013-08-26 00:00:00

  • Relationships between Molecular Complexity, Biological Activity, and Structural Diversity.

    abstract::Following the theoretical model by Hann et al. moderately complex structures are preferable lead compounds since they lead to specific binding events involving the complete ligand molecule. To make this concept usable in practice for library design, we studied several complexity measures on the biological activity of ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0503558

    authors: Schuffenhauer A,Brown N,Selzer P,Ertl P,Jacoby E

    更新日期:2006-03-01 00:00:00

  • Systematic analysis of enzyme-catalyzed reaction patterns and prediction of microbial biodegradation pathways.

    abstract::The roles of chemical compounds in biological systems are now systematically analyzed by high-throughput experimental technologies. To automate the processing and interpretation of large-scale data it is necessary to develop bioinformatics methods to extract information from the chemical structures of these small mole...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700006f

    authors: Oh M,Yamada T,Hattori M,Goto S,Kanehisa M

    更新日期:2007-07-01 00:00:00

  • Similarity searching in databases of flexible 3D structures using autocorrelation vectors derived from smoothed bounded distance matrices.

    abstract::This paper presents an exploratory study of a novel method for flexible 3-D similarity searching based on autocorrelation vectors and smoothed bounded distance matrices. Although the new approach is unable to outperform an existing 2-D similarity searching in terms of enrichment factors, it is able to retrieve differe...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0503863

    authors: Rhodes N,Clark DE,Willett P

    更新日期:2006-03-01 00:00:00

  • Modeling Boronic Acid Based Fluorescent Saccharide Sensors: Computational Investigation of d-Fructose Binding to Dimethylaminomethylphenylboronic Acid.

    abstract::Designing organic saccharide sensors for use in aqueous solution is a nontrivial endeavor. Incorporation of hydrogen bonding groups on a sensor's receptor unit to target saccharides is an obvious strategy but not one that is likely to ensure analyte-receptor interactions over analyte-solvent or receptor-solvent intera...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00987

    authors: Kearns FL,Robart C,Kemp MT,Vankayala SL,Chapin BM,Anslyn EV,Woodcock HL,Larkin JD

    更新日期:2019-05-28 00:00:00

  • Comparative modeling and benchmarking data sets for human histone deacetylases and sirtuin families.

    abstract::Histone deacetylases (HDACs) are an important class of drug targets for the treatment of cancers, neurodegenerative diseases, and other types of diseases. Virtual screening (VS) has become fairly effective approaches for drug discovery of novel and highly selective histone deacetylase inhibitors (HDACIs). To facilitat...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci5005515

    authors: Xia J,Tilahun EL,Kebede EH,Reid TE,Zhang L,Wang XS

    更新日期:2015-02-23 00:00:00

  • Effects of Ligand Environment in Zr(IV) Assisted Peptide Hydrolysis.

    abstract::In this DFT study, activities of 11 different N2O4, N2O3, and NO2 core containing Zr(IV) complexes, 4,13-diaza-18-crown-6 (I'N2O4), 1,4,10-trioxa-7,13-diazacyclopentadecane (I'N2O3), and 2-(2-methoxy)ethanol (I'NO2), respectively, and their analogues in peptide hydrolysis have been investigated. Based on the experimen...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00781

    authors: Zhang T,Sharma G,Paul TJ,Hoffmann Z,Prabhakar R

    更新日期:2017-05-22 00:00:00

  • iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides.

    abstract::Umami or the taste of monosodium glutamate represents one of the major attractive taste modalities in humans. Therefore, knowledge about biophysical and biochemical properties of the umami taste is important for both scientific research and the food industry. Experimental approaches for predicting umami peptides are l...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00707

    authors: Charoenkwan P,Yana J,Nantasenamat C,Hasan MM,Shoombuatong W

    更新日期:2020-12-28 00:00:00

  • The valence state combination model: a generic framework for handling tautomers and protonation states.

    abstract::The consistent handling of molecules is probably the most basic and important requirement in the field of cheminformatics. Reliable results can only be obtained if the underlying calculations are independent of the specific way molecules are represented in the input data. However, ensuring consistency is a complex tas...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400724v

    authors: Urbaczek S,Kolodzik A,Rarey M

    更新日期:2014-03-24 00:00:00

  • Identification of Enzyme Genes Using Chemical Structure Alignments of Substrate-Product Pairs.

    abstract::Although there are several databases that contain data on many metabolites and reactions in biochemical pathways, there is still a big gap in the numbers between experimentally identified enzymes and metabolites. It is supposed that many catalytic enzyme genes are still unknown. Although there are previous studies tha...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00216

    authors: Moriya Y,Yamada T,Okuda S,Nakagawa Z,Kotera M,Tokimatsu T,Kanehisa M,Goto S

    更新日期:2016-03-28 00:00:00

  • PyPLIF HIPPOS: A Molecular Interaction Fingerprinting Tool for Docking Results of AutoDock Vina and PLANTS.

    abstract::We describe here our tool named PyPLIF HIPPOS, which was newly developed to analyze the docking results of AutoDock Vina and PLANTS. Its predecessor, PyPLIF (https://github.com/radifar/pyplif), is a molecular interaction fingerprinting tool for the docking results of PLANTS, exclusively. Unlike its predecessor, PyPLIF...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00305

    authors: Istyastono EP,Radifar M,Yuniarti N,Prasasty VD,Mungkasi S

    更新日期:2020-08-24 00:00:00

  • H274Y's Effect on Oseltamivir Resistance: What Happens Before the Drug Enters the Binding Site.

    abstract::Increased reports of oseltamivir (OTV)-resistant strains of the influenza virus, such as the H274Y mutation on its neuraminidase (NA), have created some cause for concern. Many studies have been conducted in the attempt to uncover the mechanism of OTV resistance in H274Y NA. However, most of the reported studies on H2...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00331

    authors: Yusuf M,Mohamed N,Mohamad S,Janezic D,Damodaran KV,Wahab HA

    更新日期:2016-01-25 00:00:00

  • Geometric accuracy of three-dimensional molecular overlays.

    abstract::This study examines the dependence of molecular alignment accuracy on a variety of factors including the choice of molecular template, alignment method, conformational flexibility, and type of protein target. We used eight test systems for which X-ray data on 145 ligand-protein complexes were available. The use of X-r...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci060134h

    authors: Chen Q,Higgs RE,Vieth M

    更新日期:2006-09-01 00:00:00

  • Adaptive BP-Dock: An Induced Fit Docking Approach for Full Receptor Flexibility.

    abstract::We present an induced fit docking approach called Adaptive BP-Dock that integrates perturbation response scanning (PRS) with the flexible docking protocol of RosettaLigand in an adaptive manner. We first perturb the binding pocket residues of a receptor and obtain a new conformation based on the residue response fluct...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00587

    authors: Bolia A,Ozkan SB

    更新日期:2016-04-25 00:00:00

  • Affinity and Selectivity Assessment of Covalent Inhibitors by Free Energy Calculations.

    abstract::Covalent inhibitors have been gaining increased attention in drug discovery due to their beneficial properties such as long residence time, high biochemical efficiency, and specificity. Optimization of covalent inhibitors is a complex task that involves parallel monitoring of the noncovalent recognition elements and t...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00834

    authors: Mihalovits LM,Ferenczy GG,Keserű GM

    更新日期:2020-12-28 00:00:00

  • Use of 3D QSAR models for database screening: a feasibility study.

    abstract::The applicability and scope of 3D QSAR methods (CoMFA, CoMSIA) to screen databases are examined. A protocol requiring minimal user intervention has been established to align training and test set molecules using FlexS. As model system isozymes of human carbonic anhydrase (hCA) are used, all results are exemplified stu...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci7002945

    authors: Hillebrecht A,Klebe G

    更新日期:2008-02-01 00:00:00

  • Flux (1): a virtual synthesis scheme for fragment-based de novo design.

    abstract::It is demonstrated that the fragmentation of druglike molecules by applying simplistic pseudo-retrosynthesis results in a stock of chemically meaningful building blocks for de novo molecule generation. A stochastic search algorithm in conjunction with ligand-based similarity scoring (Flux: fragment-based ligand builde...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0503560

    authors: Fechner U,Schneider G

    更新日期:2006-03-01 00:00:00

  • Prediction of the Favorable Hydration Sites in a Protein Binding Pocket and Its Application to Scoring Function Formulation.

    abstract::The important role of water molecules in protein-ligand binding energetics has attracted wide attention in recent years. A range of computational methods has been developed to predict the favorable locations of water molecules in a protein binding pocket. Most of the current methods are based on extensive molecular dy...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00619

    authors: Li Y,Gao Y,Holloway MK,Wang R

    更新日期:2020-09-28 00:00:00

  • In silico drug screening approach for the design of magic bullets: a successful example with anti-HIV fullerene derivatized amino acids.

    abstract::A database has been derived from recently reported [60]fullerene derivatives, and their binding scores with HIV-1 PR have been computed using docking techniques. Computational methods have been used to predict which derivatives may have high binding affinities, and for these compounds biological tests have been perfor...

    journal_title:Journal of chemical information and modeling

    pub_type: 信件

    doi:10.1021/ci900047s

    authors: Durdagi S,Supuran CT,Strom TA,Doostdar N,Kumar MK,Barron AR,Mavromoustakos T,Papadopoulos MG

    更新日期:2009-05-01 00:00:00

  • Effect of data standardization on chemical clustering and similarity searching.

    abstract::Standardization is used to ensure that the variables in a similarity calculation make an equal contribution to the computed similarity value. This paper compares the use of seven different methods that have been suggested previously for the standardization of integer-valued or real-valued data, comparing the results w...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci800224h

    authors: Chu CW,Holliday JD,Willett P

    更新日期:2009-02-01 00:00:00

  • Jaqpot Quattro: A Novel Computational Web Platform for Modeling and Analysis in Nanoinformatics.

    abstract::Engineered nanomaterials (ENMs) are increasingly infiltrating our lives as a result of their applications across multiple fields. However, ENM formulations may result in the modulation of pathways and mechanisms of toxic action that endanger human health and the environment. Alternative testing methods such as in sili...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00223

    authors: Chomenidis C,Drakakis G,Tsiliki G,Anagnostopoulou E,Valsamis A,Doganis P,Sopasakis P,Sarimveis H

    更新日期:2017-09-25 00:00:00

  • Predicted Biological Activity of Purchasable Chemical Space.

    abstract::Whereas 400 million distinct compounds are now purchasable within the span of a few weeks, the biological activities of most are unknown. To facilitate access to new chemistry for biology, we have combined the Similarity Ensemble Approach (SEA) with the maximum Tanimoto similarity to the nearest bioactive to predict a...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00316

    authors: Irwin JJ,Gaskins G,Sterling T,Mysinger MM,Keiser MJ

    更新日期:2018-01-22 00:00:00