Abstract:
:Big data is one of the key transformative factors which increasingly influences all aspects of modern life. Although this transformation brings vast opportunities it also generates novel challenges, not the least of which is organizing and searching this data deluge. The field of medicinal chemistry is not different: more and more data are being generated, for instance, by technologies such as DNA encoded libraries, peptide libraries, text mining of large literature corpora, and new in silico enumeration methods. Handling those huge sets of molecules effectively is quite challenging and requires compromises that often come at the expense of the interpretability of the results. In order to find an intuitive and meaningful approach to organizing large molecular data sets, we adopted a probabilistic framework called "topic modeling" from the text-mining field. Here we present the first chemistry-related implementation of this method, which allows large molecule sets to be assigned to "chemical topics" and investigating the relationships between those. In this first study, we thoroughly evaluate this novel method in different experiments and discuss both its disadvantages and advantages. We show very promising results in reproducing human-assigned concepts using the approach to identify and retrieve chemical series from sets of molecules. We have also created an intuitive visualization of the chemical topics output by the algorithm. This is a huge benefit compared to other unsupervised machine-learning methods, like clustering, which are commonly used to group sets of molecules. Finally, we applied the new method to the 1.6 million molecules of the ChEMBL22 data set to test its robustness and efficiency. In about 1 h we built a 100-topic model of this large data set in which we could identify interesting topics like "proteins", "DNA", or "steroids". Along with this publication we provide our data sets and an open-source implementation of the new method (CheTo) which will be part of an upcoming version of the open-source cheminformatics toolkit RDKit.
journal_name
J Chem Inf Modeljournal_title
Journal of chemical information and modelingauthors
Schneider N,Fechner N,Landrum GA,Stiefl Ndoi
10.1021/acs.jcim.7b00249subject
Has Abstractpub_date
2017-08-28 00:00:00pages
1816-1831issue
8eissn
1549-9596issn
1549-960Xjournal_volume
57pub_type
杂志文章abstract::Virtual screening is a powerful methodology to search for new small molecule inhibitors against a desired molecular target. Usually, it involves evaluating thousands of compounds (derived from large databases) in order to select a set of potential binders that will be tested in the wet-lab. The number of tested compou...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.7b00241
更新日期:2017-08-28 00:00:00
abstract::Universal generative topographic maps (GTMs) provide two-dimensional representations of chemical space selected for their "polypharmacological competence", that is, the ability to simultaneously represent meaningful activity and property landscapes, associated with many distinct targets and properties. Several such GT...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.8b00650
更新日期:2019-01-28 00:00:00
abstract::Binding hot spots are regions of proteins that, due to their potentially high contribution to the binding free energy, have high propensity to bind small molecules. We present benchmark sets for testing computational methods for the identification of binding hot spots with emphasis on fragment-based ligand discovery. ...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c00877
更新日期:2020-12-28 00:00:00
abstract::The [H2X2]+ (X = Cl, Br) formula could refer to two possible stable structures, namely, the hydrogen-bonded complex and the three-electron-bonded one. In contrary to the results published by other authors, we claim that for the F-type structures the hydrogen-bonded form is the only possible one and the [HFFH]+ complex...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci600355g
更新日期:2007-05-01 00:00:00
abstract::Metabolism of xenobiotic and endogenous compounds is frequently complex, not completely elucidated, and therefore often ambiguous. The prediction of sites of metabolism (SoM) can be particularly helpful as a first step toward the identification of metabolites, a process especially relevant to drug discovery. This pape...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci400058s
更新日期:2013-06-24 00:00:00
abstract::Advances in computer-aided translation technology have made tremendous progress in accuracy in the past few years. Chemical Abstracts Service of the American Chemical Society summarizes scientific works from more than 50 languages and allows the users to search papers in nine selected languages. Currently, only the ab...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c00274
更新日期:2020-07-27 00:00:00
abstract::DNA damage alters genetic information and adversely affects gene expression pathways leading to various complex genetic disorders and cancers. DNA repair proteins recognize and rectify DNA damage and mismatches with high fidelity. A critical molecular event that occurs during most protein-mediated DNA repair processes...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.7b00636
更新日期:2018-03-26 00:00:00
abstract::Reversible covalent inhibitors have drawn increasing attention in drug design, as they are likely more potent than noncovalent inhibitors and less toxic than covalent inhibitors. Despite those advantages, the computational prediction of reversible covalent binding presents a formidable challenge because the binding pr...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.8b00959
更新日期:2019-05-28 00:00:00
abstract::The early stages of drug discovery rely on hit-to-lead programs, where initial hits undergo partial optimization to improve binding affinities for their biological target. This is an expensive and time-consuming process, requiring multiple iterations of trial and error designs, an ideal scenario for applying computer ...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00938
更新日期:2020-03-23 00:00:00
abstract::A homology model of the Arabidopsis thaliana UV resistance locus 8 (UVR8) protein is presented herein, showing a seven-bladed β-propeller conformation similar to the globular structure of RCC1. The UVR8 amino acid sequence contains a very high amount of conserved tryptophans, and the homology model shows that seven of...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci200017f
更新日期:2011-06-27 00:00:00
abstract::Porcupine is a component of the Wnt pathway which regulates cell proliferation, migration, stem cell self-renewal, and differentiation. The Wnt pathway has been shown to be dysregulated in a variety of cancers. Porcupine is a membrane bound O-acyltransferase that palmitoylates Wnt. Inhibiting porcupine blocks the secr...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.5b00159
更新日期:2015-07-27 00:00:00
abstract::Histone deacetylases (HDACs) are an important class of drug targets for the treatment of cancers, neurodegenerative diseases, and other types of diseases. Virtual screening (VS) has become fairly effective approaches for drug discovery of novel and highly selective histone deacetylase inhibitors (HDACIs). To facilitat...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci5005515
更新日期:2015-02-23 00:00:00
abstract::New molecular descriptors, RED (Renyi entropy descriptors), based on the generalized entropies introduced by Renyi are presented. Topological descriptors based on molecular features have proven to be useful for describing molecular profiles. Renyi entropy is used as a variability measure to contract a feature-pair dis...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci900275w
更新日期:2009-11-01 00:00:00
abstract::Engineering shape-controlled bionanomaterials requires comprehensive understanding of interactions between biomolecules and inorganic surfaces. We explore the origin of facet-selective binding of peptides adsorbed onto Pt(100) and Pt(111) crystallographic planes. Using molecular dynamics simulations, we show that upon...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci400630d
更新日期:2013-12-23 00:00:00
abstract::An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function's ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluat...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00356
更新日期:2019-07-22 00:00:00
abstract::In this study, we used the Martini Coarse-Grained model with no applied restraints to predict the binding mode of some peptides to G-Protein Coupled Receptors (GPCRs). Both the Neurotensin-1 and the chemokine CXCR4 receptors were used as test cases. Their ligands, NTS8-13 and CVX15 peptides, respectively, were initial...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.6b00503
更新日期:2017-03-27 00:00:00
abstract::Partial covalent interactions (PCIs) in proteins, which include hydrogen bonds, salt bridges, cation-π, and π-π interactions, contribute to thermodynamic stability and facilitate interactions with other biomolecules. Several score functions have been developed within the Rosetta protein modeling framework that identif...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.7b00398
更新日期:2018-05-29 00:00:00
abstract::3-Hydroxy-3-methylglutaryl coenzyme A reductase (HMGR) is a primary target in the current clinical treatment of hypercholesterolemia with specific inhibitors of "statin" family. Statins are excellent inhibitors of the class I (human) enzyme but relatively poor inhibitors of the class II enzyme, which are well-known as...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci300163v
更新日期:2012-07-23 00:00:00
abstract::Membrane-bound protein receptors are a primary biological drug target, but the computational analysis of membrane proteins has been limited. In order to improve molecular mechanics Poisson-Boltzmann surface area (MMPBSA) binding free energy calculations for membrane protein-ligand systems, we have optimized a new hete...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00363
更新日期:2019-06-24 00:00:00
abstract::A comprehensive data set of aligned ligands with highly similar binding pockets from the Protein Data Bank has been built. Based on this data set, a scoring function for recognizing good alignment poses for small molecules has been developed. This function is based on atoms and hydrogen-bond projected features. The co...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci100227h
更新日期:2010-09-27 00:00:00
abstract::The membrane permeability of cyclic peptides and peptidomimetics, which are generally larger and more complex than typical drug molecules, is likely strongly influenced by the conformational behavior of these compounds in polar and apolar environments. The size and complexity of peptides often limit their bioavailabil...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.6b00251
更新日期:2016-08-22 00:00:00
abstract::Template CoMFA, a novel alignment methodology for training or test set structures in 3D-QSAR, is introduced. Its two most significant advantages are its complete automation and its ability to derive a single combined model from multiple structural series affecting a biological target. Its only two inputs are one or mo...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci400696v
更新日期:2014-02-24 00:00:00
abstract::Acetohydroxyacid synthase (AHAS) is a thiamin diphosphate-dependent enzyme involved in the biosynthesis of valine, leucine, isoleucine, and lysine. Experimental evidence has shown that mutation of the Gln202 residue results in a decrease in the enzymatic activity, thus suggesting the main role of the carboligation cat...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00863
更新日期:2020-02-24 00:00:00
abstract::We have applied the two most commonly used methods for automatic matched pair identification, obtained the optimum settings, and discovered that the two methods are synergistic. A turbocharging approach to matched pair analysis is advocated in which a first round (a conservative categorical approach that uses an analo...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.7b00335
更新日期:2017-10-23 00:00:00
abstract::3-Phosphoinositide-dependent protein kinase-1 (PDK1) is a promising target for developing novel anticancer drugs. In order to understand the structure-activity correlation of indolinone-based PDK1 inhibitors, we have carried out a combined molecular docking and three-dimensional quantitative structure-activity relatio...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci800147v
更新日期:2008-09-01 00:00:00
abstract::The Torsion Library contains hundreds of rules for small molecule conformations which have been derived from the Cambridge Structural Database (CSD) and are curated by molecular design experts. The torsion rules are encoded as SMARTS patterns and categorize rotatable bonds via a traffic light coloring scheme. We have ...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.5b00522
更新日期:2016-01-25 00:00:00
abstract::Given the need for modern researchers to produce open, reproducible scientific output, the lack of standards and best practices for sharing data and workflows used to produce and analyze molecular dynamics (MD) simulations has become an important issue in the field. There are now multiple well-established packages to ...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00665
更新日期:2019-10-28 00:00:00
abstract::In this study, we have developed a two model system to mimic the active and inactive states of a G-protein coupled receptor specifically the alpha1A adrenergic receptor. We have docked two agonists, epinephrine (phenylamine type) and oxymetazoline (imidazoline type), as well as two antagonists, prazosin and 5-methylur...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci700026v
更新日期:2007-09-01 00:00:00
abstract::The primary goal of this project was to evaluate the performance of the Standard and Enforced Geometry Optimization (SEGO) method which we have recently developed. The SEGO method has been designed for an automatic location of multiple minima on the molecular Potential Energy Surface (PES), and its usefulness has been...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00352
更新日期:2019-08-26 00:00:00
abstract::Human telomeric DNA G-quadruplex has been identified as a good therapeutic target in cancer treatment. G-quadruplex-specific ligands that stabilize the G-quadruplex have great potential to be developed as anticancer agents. Two crystal structures (an apo form of parallel stranded human telomeric G-quadruplex and its h...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.7b00287
更新日期:2017-11-27 00:00:00