Improved Scaffold Hopping in Ligand-Based Virtual Screening Using Neural Representation Learning.

Abstract:

:Deep learning has demonstrated significant potential in advancing state of the art in many problem domains, especially those benefiting from automated feature extraction. Yet, the methodology has seen limited adoption in the field of ligand-based virtual screening (LBVS) as traditional approaches typically require large, target-specific training sets, which limits their value in most prospective applications. Here, we report the development of a neural network architecture and a learning framework designed to yield a generally applicable tool for LBVS. Our approach uses the molecular graph as input and involves learning a representation that places compounds of similar biological profiles in close proximity within a hyperdimensional feature space; this is achieved by simultaneously leveraging historical screening data against a multitude of targets during training. Cosine distance between molecules in this space becomes a general similarity metric and can readily be used to rank order database compounds in LBVS workflows. We demonstrate the resulting model generalizes exceptionally well to compounds and targets not used in its training. In three commonly employed LBVS benchmarks, our method outperforms popular fingerprinting algorithms without the need for any target-specific training. Moreover, we show the learned representation yields superior performance in scaffold hopping tasks and is largely orthogonal to existing fingerprints. Summarily, we have developed and validated a framework for learning a molecular representation that is applicable to LBVS in a target-agnostic fashion, with as few as one query compound. Our approach can also enable organizations to generate additional value from large screening data repositories, and to this end we are making its implementation freely available at https://github.com/totient-bio/gatnn-vs.

journal_name

J Chem Inf Model

authors

Stojanović L,Popović M,Tijanić N,Rakočević G,Kalinić M

doi

10.1021/acs.jcim.0c00622

subject

Has Abstract

pub_date

2020-10-26 00:00:00

pages

4629-4639

issue

10

eissn

1549-9596

issn

1549-960X

journal_volume

60

pub_type

杂志文章
  • Underestimated Halogen Bonds Forming with Protein Backbone in Protein Data Bank.

    abstract::Halogen bonds (XBs) are attracting increasing attention in biological systems. Protein Data Bank (PDB) archives experimentally determined XBs in biological macromolecules. However, no software for structure refinement in X-ray crystallography takes into account XBs, which might result in the weakening or even vanishin...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00235

    authors: Zhang Q,Xu Z,Shi J,Zhu W

    更新日期:2017-07-24 00:00:00

  • Tuning Interaction Parameters of Thermoplastic Polyurethanes in a Binary Solvent To Achieve Precise Control over Microphase Separation.

    abstract::Thermoplastic polyurethanes (TPUs) are designed using a large variety of basic building blocks but are only synthesized in a limited number of solvent systems. Understanding the behavior of the copolymers in a selected solvent system is of particular interest to tune the intricate balance of microphase separation/mixi...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00781

    authors: Avaz Seven S,Oguz O,Menceloglu YZ,Atilgan C

    更新日期:2019-05-28 00:00:00

  • Rapid evaluation of synthetic and molecular complexity for in silico chemistry.

    abstract::Methods that rapidly evaluate molecular complexity and synthetic feasibility are becoming increasingly important for in silico chemistry. We propose a new metric based on relative atomic electronegativities and bond parameters that evaluate both synthetic and molecular complexity (SMCM) starting from chemical structur...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0501387

    authors: Allu TK,Oprea TI

    更新日期:2005-09-01 00:00:00

  • In silico deconstruction of ATP-competitive inhibitors of glycogen synthase kinase-3β.

    abstract::Fragment-based methods have emerged in the last two decades as alternatives to traditional high throughput screenings for the identification of chemical starting points in drug discovery. One arguable yet popular assumption about fragment-based design is that the fragment binding mode remains conserved upon chemical e...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300355p

    authors: Bisignano P,Lambruschini C,Bicego M,Murino V,Favia AD,Cavalli A

    更新日期:2012-12-21 00:00:00

  • Ligand-Based Discovery of a New Scaffold for Allosteric Modulation of the μ-Opioid Receptor.

    abstract::With the hope of discovering effective analgesics with fewer side effects, attention has recently shifted to allosteric modulators of the opioid receptors. In the past two years, the first chemotypes of positive or silent allosteric modulators (PAMs or SAMs, respectively) of μ- and δ-opioid receptor types have been re...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00388

    authors: Bisignano P,Burford NT,Shang Y,Marlow B,Livingston KE,Fenton AM,Rockwell K,Budenholzer L,Traynor JR,Gerritz SW,Alt A,Filizola M

    更新日期:2015-09-28 00:00:00

  • Geometric accuracy of three-dimensional molecular overlays.

    abstract::This study examines the dependence of molecular alignment accuracy on a variety of factors including the choice of molecular template, alignment method, conformational flexibility, and type of protein target. We used eight test systems for which X-ray data on 145 ligand-protein complexes were available. The use of X-r...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci060134h

    authors: Chen Q,Higgs RE,Vieth M

    更新日期:2006-09-01 00:00:00

  • Multifingerprint based similarity searches for targeted class compound selection.

    abstract::Molecular fingerprints are widely used for similarity-based virtual screening in drug discovery projects. In this paper we discuss the performance and the complementarity of nine two-dimensional fingerprints (Daylight, Unity, AlFi, Hologram, CATS, TRUST, Molprint 2D, ChemGPS, and ALOGP) in retrieving active molecules ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0504723

    authors: Kogej T,Engkvist O,Blomberg N,Muresan S

    更新日期:2006-05-01 00:00:00

  • FlexAID: Revisiting Docking on Non-Native-Complex Structures.

    abstract::Small-molecule protein docking is an essential tool in drug design and to understand molecular recognition. In the present work we introduce FlexAID, a small-molecule docking algorithm that accounts for target side-chain flexibility and utilizes a soft scoring function, i.e. one that is not highly dependent on specifi...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00078

    authors: Gaudreault F,Najmanovich RJ

    更新日期:2015-07-27 00:00:00

  • Aggregation properties of a polymeric anticancer therapeutic: a coarse-grained modeling study.

    abstract::The effects of paclitaxel (PTX) loading fraction and spatial PTX arrangement on poly(γ-glutamyl-glutamate) paclitaxel (PGG-PTX) aggregation were explored using coarse-grained molecular dynamics. Results show that the PTX loading fraction does not significantly impact aggregation, and the spatial PTX arrangement only a...

    journal_title:Journal of chemical information and modeling

    pub_type: 信件

    doi:10.1021/ci200214m

    authors: Peng LX,Yu L,Howell SB,Gough DA

    更新日期:2011-12-27 00:00:00

  • Searching for coordinated activity cliffs using particle swarm optimization.

    abstract::Activity cliffs are formed by structurally similar compounds having large potency differences. Coordinated activity cliffs evolve when compounds within groups of structural neighbors form multiple cliffs with different partners, giving rise to local networks of cliffs in a data set. Using particle swarm optimization, ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci3000503

    authors: Namasivayam V,Bajorath J

    更新日期:2012-04-23 00:00:00

  • SARANEA: a freely available program to mine structure-activity and structure-selectivity relationship information in compound data sets.

    abstract::We introduce SARANEA, an open-source Java application for interactive exploration of structure-activity relationship (SAR) and structure-selectivity relationship (SSR) information in compound sets of any source. SARANEA integrates various SAR and SSR analysis functions and utilizes a network-like similarity graph data...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci900416a

    authors: Lounkine E,Wawer M,Wassermann AM,Bajorath J

    更新日期:2010-01-01 00:00:00

  • Prediction of synthetic accessibility based on commercially available compound databases.

    abstract::A compound's synthetic accessibility (SA) is an important aspect of drug design, since in some cases computer-designed compounds cannot be synthesized. There have been several reports on SA prediction, most of which have focused on the difficulties of synthetic reactions based on retro-synthesis analyses, reaction dat...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500568d

    authors: Fukunishi Y,Kurosawa T,Mikami Y,Nakamura H

    更新日期:2014-12-22 00:00:00

  • Computational Insight Into the Mechanism of SARS-CoV-2 Membrane Fusion.

    abstract::Membrane fusion, a key step in the early stages of virus propagation, allows the release of the viral genome in the host cell cytoplasm. The process is initiated by fusion peptides that are small, hydrophobic components of viral membrane-embedded glycoproteins and are typically conserved within virus families. Here, w...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c01231

    authors: Borkotoky S,Dey D,Banerjee M

    更新日期:2021-01-25 00:00:00

  • Residue preference mapping of ligand fragments in the Protein Data Bank.

    abstract::The interaction between small molecules and proteins is one of the major concerns for structure-based drug design because the principles of protein-ligand interactions and molecular recognition are not thoroughly understood. Fortunately, the analysis of protein-ligand complexes in the Protein Data Bank (PDB) enables u...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci100386y

    authors: Wang L,Xie Z,Wipf P,Xie XQ

    更新日期:2011-04-25 00:00:00

  • Turbocharging Matched Molecular Pair Analysis: Optimizing the Identification and Analysis of Pairs.

    abstract::We have applied the two most commonly used methods for automatic matched pair identification, obtained the optimum settings, and discovered that the two methods are synergistic. A turbocharging approach to matched pair analysis is advocated in which a first round (a conservative categorical approach that uses an analo...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00335

    authors: Lukac I,Zarnecka J,Griffen EJ,Dossetter AG,St-Gallay SA,Enoch SJ,Madden JC,Leach AG

    更新日期:2017-10-23 00:00:00

  • Isomerization and Decomposition of 2-Methylfuran with External Forces.

    abstract::The primary goal of this project was to evaluate the performance of the Standard and Enforced Geometry Optimization (SEGO) method which we have recently developed. The SEGO method has been designed for an automatic location of multiple minima on the molecular Potential Energy Surface (PES), and its usefulness has been...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00352

    authors: Brzyska A,Woliński K

    更新日期:2019-08-26 00:00:00

  • Novel Consensus Architecture To Improve Performance of Large-Scale Multitask Deep Learning QSAR Models.

    abstract::Advances in the development of high-throughput screening and automated chemistry have rapidly accelerated the production of chemical and biological data, much of them freely accessible through literature aggregator services such as ChEMBL and PubChem. Here, we explore how to use this comprehensive mapping of chemical ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00526

    authors: Zakharov AV,Zhao T,Nguyen DT,Peryea T,Sheils T,Yasgar A,Huang R,Southall N,Simeonov A

    更新日期:2019-11-25 00:00:00

  • Protein-protein binding site prediction by local structural alignment.

    abstract::Generalization of an earlier algorithm has led to the development of new local structural alignment algorithms for prediction of protein-protein binding sites. The algorithms use maximum cliques on protein graphs to define structurally similar protein regions. The search for structural neighbors in the new algorithms ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci100265x

    authors: Carl N,Konc J,Vehar B,Janezic D

    更新日期:2010-10-25 00:00:00

  • New combined model for the prediction of regioselectivity in cytochrome P450/3A4 mediated metabolism.

    abstract::Cytochrome P450 3A4 metabolizes nearly 50% of the drugs currently in clinical use with a broad range of substrate specificity. Early prediction of metabolites of xenobiotic compounds is crucial for cost efficient drug discovery and development. We developed a new combined model, MLite, for the prediction of regioselec...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci7003576

    authors: Oh WS,Kim DN,Jung J,Cho KH,No KT

    更新日期:2008-03-01 00:00:00

  • Identifying promising compounds in drug discovery: genetic algorithms and some new statistical techniques.

    abstract::Throughout the drug discovery process, discovery teams are compelled to use statistics for making decisions using data from a variety of inputs. For instance, teams are asked to prioritize compounds for subsequent stages of the drug discovery process, given results from multiple screens. To assist in the prioritizatio...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci600556v

    authors: Mandal A,Johnson K,Wu CF,Bornemeier D

    更新日期:2007-05-01 00:00:00

  • Torsion Library Reloaded: A New Version of Expert-Derived SMARTS Rules for Assessing Conformations of Small Molecules.

    abstract::The Torsion Library contains hundreds of rules for small molecule conformations which have been derived from the Cambridge Structural Database (CSD) and are curated by molecular design experts. The torsion rules are encoded as SMARTS patterns and categorize rotatable bonds via a traffic light coloring scheme. We have ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00522

    authors: Guba W,Meyder A,Rarey M,Hert J

    更新日期:2016-01-25 00:00:00

  • Phosphorylation of Fibronectin Influences the Structural Stability of the Predicted Interchain Domain.

    abstract::As a key player in cell adhesion, the glycoprotein fibronectin is involved in the complex mechanobiology of the extracellular matrix. Although the function of many modules in the fibronectin molecule has already been understood, the structure and biological relevance of the C-terminal cross-linked region (CTXL) still ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00555

    authors: Kulke M,Uhrhan M,Geist N,Brüggemann D,Ohler B,Langel W,Köppen S

    更新日期:2019-10-28 00:00:00

  • Chemical Topic Modeling: Exploring Molecular Data Sets Using a Common Text-Mining Approach.

    abstract::Big data is one of the key transformative factors which increasingly influences all aspects of modern life. Although this transformation brings vast opportunities it also generates novel challenges, not the least of which is organizing and searching this data deluge. The field of medicinal chemistry is not different: ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00249

    authors: Schneider N,Fechner N,Landrum GA,Stiefl N

    更新日期:2017-08-28 00:00:00

  • Conformational determinants of the activity of antiproliferative factor glycopeptide.

    abstract::The antiproliferative factor (APF) involved in interstitial cystitis is a glycosylated nonapeptide (TVPAAVVVA) containing a sialylated core 1 α-O-disaccharide linked to the N-terminal threonine. The chemical structure of APF was deduced using spectroscopic techniques and confirmed using total synthesis. The synthetic ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400147s

    authors: Mallajosyula SS,Adams KM,Barchi JJ,MacKerell AD

    更新日期:2013-05-24 00:00:00

  • Protein Solvent Shell Structure Provides Rapid Analysis of Hydration Dynamics.

    abstract::The solvation layer surrounding a protein is clearly an intrinsic part of protein structure-dynamics-function, and our understanding of how the hydration dynamics influences protein function is emerging. We have recently reported simulations indicating a correlation between regional hydration dynamics and the structur...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00009

    authors: Dahanayake JN,Shahryari E,Roberts KM,Heikes ME,Kasireddy C,Mitchell-Koch KR

    更新日期:2019-05-28 00:00:00

  • Adaptive configuring of radial basis function network by hybrid particle swarm algorithm for QSAR studies of organic compounds.

    abstract::The configuring of a radial basis function network (RBFN) consists of selecting the network parameters (centers and widths in RBF units and weights between the hidden and output layers) and network architecture. The issues of suboptimum and overfitting, however, often occur in RBFN configuring. This paper presented a ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci600218d

    authors: Zhou YP,Jiang JH,Lin WQ,Zou HY,Wu HL,Shen GL,Yu RQ

    更新日期:2006-11-01 00:00:00

  • Evaluating Free Energies of Binding and Conservation of Crystallographic Waters Using SZMAP.

    abstract::The SZMAP method computes binding free energies and the corresponding thermodynamic components for water molecules in the binding site of a protein structure [ SZMAP, 1.0.0 ; OpenEye Scientific Software Inc. : Santa Fe, NM, USA , 2011 ]. In this work, the ability of SZMAP to predict water structure and thermodynamic s...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500746d

    authors: Bayden AS,Moustakas DT,Joseph-McCarthy D,Lamb ML

    更新日期:2015-08-24 00:00:00

  • Development of an informatics platform for therapeutic protein and peptide analytics.

    abstract::The momentum gained by research on biologics has not been met yet with equal thrust on the informatics side. There is a noticeable lack of software for data management that empowers the bench scientists working on the development of biologic therapeutics. SARvision|Biologics is a tool to analyze data associated with b...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400333x

    authors: Hansen MR,Villar HO,Feyfant E

    更新日期:2013-10-28 00:00:00

  • Transplant-insert-constrain-relax-assemble (TICRA): protein-ligand complex structure modeling and application to kinases.

    abstract::We introduce TICRA (transplant-insert-constrain-relax-assemble), a method for modeling the structure of unknown protein-ligand complexes using the X-ray crystal structures of homologous proteins and ligands with known activity. We present results from modeling the structures of protein kinase-inhibitor complexes using...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci100256u

    authors: Meshkat S,Klon AE,Zou J,Wiseman JS,Konteatis Z

    更新日期:2011-01-24 00:00:00

  • Tyrosine Regulates β-Sheet Structure Formation in Amyloid-β42: A New Clustering Algorithm for Disordered Proteins.

    abstract::Our recent studies show that the single Tyr residue in the sequence of amyloid-β42 (Aβ42) is reactive toward various ligands, including metals and adenosine trisphospate (see: Coskuner , O. J. Biol. Inorg. Chem. 2016 , 21 , 957 - 973 and Coskuner , O. ; Murray , I. V. J. J. Alzheimer's Dis. 2014 , 41 , 561 - 574 ). Ho...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00761

    authors: Coskuner O,Uversky VN

    更新日期:2017-06-26 00:00:00