Molecular Structure Extraction from Documents Using Deep Learning.

Abstract:

:Chemical structure extraction from documents remains a hard problem because of both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally but still routinely encounter situations where recognition rates are not yet satisfactory and systematic improvement is challenging. Complications impacting the performance of current approaches include the diversity in visual styles used by various software to render structures, the frequent use of ad hoc annotations, and other challenges related to image quality, including resolution and noise. We present end-to-end deep learning solutions for both segmenting molecular structures from documents and predicting chemical structures from the segmented images. This deep-learning-based approach does not require any handcrafted features, is learned directly from data, and is robust against variations in image quality and style. Using the deep learning approach described herein, we show that it is possible to perform well on both segmentation and prediction of low-resolution images containing moderately sized molecules found in journal articles and patents.

journal_name

J Chem Inf Model

authors

Staker J,Marshall K,Abel R,McQuaw CM

doi

10.1021/acs.jcim.8b00669

subject

Has Abstract

pub_date

2019-03-25 00:00:00

pages

1017-1029

issue

3

eissn

1549-9596

issn

1549-960X

journal_volume

59

pub_type

杂志文章
  • Delineation of agonist binding to the human histamine H4 receptor using mutational analysis, homology modeling, and ab initio calculations.

    abstract::A three-dimensional homology model of the human histamine H 4 receptor was developed to investigate the binding mode of a series of structurally diverse H 4-agonists, i.e. histamine, clozapine, and the recently described selective, nonimidazole agonist VUF 8430. Mutagenesis studies and docking of these ligands in a rh...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700474a

    authors: Jongejan A,Lim HD,Smits RA,de Esch IJ,Haaksma E,Leurs R

    更新日期:2008-07-01 00:00:00

  • Virtual Screening with Generative Topographic Maps: How Many Maps Are Required?

    abstract::Universal generative topographic maps (GTMs) provide two-dimensional representations of chemical space selected for their "polypharmacological competence", that is, the ability to simultaneously represent meaningful activity and property landscapes, associated with many distinct targets and properties. Several such GT...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00650

    authors: Casciuc I,Zabolotna Y,Horvath D,Marcou G,Bajorath J,Varnek A

    更新日期:2019-01-28 00:00:00

  • Modeling compound-target interaction network of traditional Chinese medicines for type II diabetes mellitus: insight for polypharmacology and drug design.

    abstract::In this study, in order to elucidate the action mechanism of traditional Chinese medicines (TCMs) that exhibit clinical efficacy for type II diabetes mellitus (T2DM), an integrated protocol that combines molecular docking and pharmacophore mapping was employed to find the potential inhibitors from TCM for the T2DM-rel...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400146u

    authors: Tian S,Li Y,Li D,Xu X,Wang J,Zhang Q,Hou T

    更新日期:2013-07-22 00:00:00

  • In silico target predictions: defining a benchmarking data set and comparison of performance of the multiclass Naïve Bayes and Parzen-Rosenblatt window.

    abstract::In this study, two probabilistic machine-learning algorithms were compared for in silico target prediction of bioactive molecules, namely the well-established Laplacian-modified Naïve Bayes classifier (NB) and the more recently introduced (to Cheminformatics) Parzen-Rosenblatt Window. Both classifiers were trained in ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300435j

    authors: Koutsoukas A,Lowe R,Kalantarmotamedi Y,Mussa HY,Klaffke W,Mitchell JB,Glen RC,Bender A

    更新日期:2013-08-26 00:00:00

  • Two model system of the alpha1A-adrenoceptor docked with selected ligands.

    abstract::In this study, we have developed a two model system to mimic the active and inactive states of a G-protein coupled receptor specifically the alpha1A adrenergic receptor. We have docked two agonists, epinephrine (phenylamine type) and oxymetazoline (imidazoline type), as well as two antagonists, prazosin and 5-methylur...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700026v

    authors: Asher WB,Hoskins SN,Slasor LA,Morris DH,Cook EM,Bautista DL

    更新日期:2007-09-01 00:00:00

  • Binding Interactions of Ergotamine and Dihydroergotamine to 5-Hydroxytryptamine Receptor 1B (5-HT1b) Using Molecular Dynamics Simulations and Dynamic Network Analysis.

    abstract::Ergotamine (ERG) and dihydroergotamine (DHE), common migraine drugs, have small structural differences but lead to clinically important distinctions in their pharmacological profiles. For example, DHE is less potent than ERG by about 10-fold at the 5-hydroxytrptamine receptor 1B (5-HT1B). Although the high-resolution ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b01082

    authors: Sullivan HJ,Tursi A,Moore K,Campbell A,Floyd C,Wu C

    更新日期:2020-03-23 00:00:00

  • GPCR-Bench: A Benchmarking Set and Practitioners' Guide for G Protein-Coupled Receptor Docking.

    abstract::Virtual screening is routinely used to discover new ligands and in particular new ligand chemotypes for G protein-coupled receptors (GPCRs). To prepare for a virtual screen, we often tailor a docking protocol that will enable us to select the best candidates for further screening. To aid this, we created GPCR-Bench, a...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00660

    authors: Weiss DR,Bortolato A,Tehan B,Mason JS

    更新日期:2016-04-25 00:00:00

  • Interpretation of Quantitative Structure-Activity Relationship Models: Past, Present, and Future.

    abstract::This paper is an overview of the most significant and impactful interpretation approaches of quantitative structure-activity relationship (QSAR) models, their development, and application. The evolution of the interpretation paradigm from "model → descriptors → (structure)" to "model → structure" is indicated. The lat...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章,评审

    doi:10.1021/acs.jcim.7b00274

    authors: Polishchuk P

    更新日期:2017-11-27 00:00:00

  • Discovery of New SIRT2 Inhibitors by Utilizing a Consensus Docking/Scoring Strategy and Structure-Activity Relationship Analysis.

    abstract::SIRT2, which is a NAD+ (nicotinamide adenine dinucleotide) dependent deacetylase, has been demonstrated to play an important role in the occurrence and development of a variety of diseases such as cancer, ischemia-reperfusion, and neurodegenerative diseases. Small molecule inhibitors of SIRT2 are thought to be potenti...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00714

    authors: Huang S,Song C,Wang X,Zhang G,Wang Y,Jiang X,Sun Q,Huang L,Xiang R,Hu Y,Li L,Yang S

    更新日期:2017-04-24 00:00:00

  • Comparison Study of Polar and Nonpolar Contributions to Solvation Free Energy.

    abstract::In this study, we compared the contributions of polar and nonpolar interactions to the solvation free energy of a solute in solvent, which is decomposed into four different terms based on the nature of interactions: (i) electrostatic solvation free energy term counting for the work done to move solute charges from fix...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00368

    authors: Izairi R,Kamberaj H

    更新日期:2017-10-23 00:00:00

  • Get Your Atoms in Order--An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm.

    abstract::Finding a canonical ordering of the atoms in a molecule is a prerequisite for generating a unique representation of the molecule. The canonicalization of a molecule is usually accomplished by applying some sort of graph relaxation algorithm, the most common of which is the Morgan algorithm. There are known issues with...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00543

    authors: Schneider N,Sayle RA,Landrum GA

    更新日期:2015-10-26 00:00:00

  • Comparative Dynamics and Functional Mechanisms of the CYP17A1 Tunnels Regulated by Ligand Binding.

    abstract::As an important member of cytochrome P450 (CYP) enzymes, CYP17A1 is a dual-function monooxygenase with a critical role in the synthesis of many human steroid hormones, making it an attractive therapeutic target. The emerging structural information about CYP17A1 and the growing number of inhibitors for these enzymes ca...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00447

    authors: Xiao F,Song X,Tian P,Gan M,Verkhivker GM,Hu G

    更新日期:2020-07-27 00:00:00

  • Receptor-based virtual ligand screening for the identification of novel CDC25 phosphatase inhibitors.

    abstract::CDC25 phosphatases play critical roles in cell cycle regulation and are attractive targets for anticancer therapies. Several small non-peptide molecules are known to inhibit CDC25, but many of them appear to form a covalent bond with the enzyme or act through oxidation of the thiolate group of the catalytic cysteine. ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700313e

    authors: Montes M,Braud E,Miteva MA,Goddard ML,Mondésert O,Kolb S,Brun MP,Ducommun B,Garbay C,Villoutreix BO

    更新日期:2008-01-01 00:00:00

  • Structure-activity relationships in non-ligand binding pocket (non-LBP) diarylhydrazide antiandrogens.

    abstract::We report the synthesis and a study of the structure-activity relationships of a new series of diarylhydrazides as potential selective non-ligand binding pocket androgen receptor antagonists. Their biological activity as antiandrogens in the context of the development of treatments for castration resistant prostate ca...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400189m

    authors: Caboni L,Egan B,Kelly B,Blanco F,Fayne D,Meegan MJ,Lloyd DG

    更新日期:2013-08-26 00:00:00

  • In silico analysis of the thermodynamic stability changes of psychrophilic and mesophilic alpha-amylases upon exhaustive single-site mutations.

    abstract::Identifying sequence modifications that distinguish psychrophilic from mesophilic proteins is important for designing enzymes with different thermodynamic stabilities and to understand the underlying mechanisms. The PoPMuSiC algorithm is used to introduce, in silico, all the single-site mutations in four mesophilic an...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci050473v

    authors: Gilis D

    更新日期:2006-05-01 00:00:00

  • Cyclohexane-Based Scaffold Molecules Acting as Anion Transport, Anionophores, via Noncovalent Interactions.

    abstract::A theoretical study of a variety of cyclohexane-based anion transporters interacting with the chloride anion has been conducted using density functional theory. The calculations have been performed in the gas phase but also, in order to describe the solvation effects on the interaction, two different solvents-chlorofo...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00154

    authors: Sánchez-Sanz G,Trujillo C

    更新日期:2019-05-28 00:00:00

  • Ranking Reversible Covalent Drugs: From Free Energy Perturbation to Fragment Docking.

    abstract::Reversible covalent inhibitors have drawn increasing attention in drug design, as they are likely more potent than noncovalent inhibitors and less toxic than covalent inhibitors. Despite those advantages, the computational prediction of reversible covalent binding presents a formidable challenge because the binding pr...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00959

    authors: Zhang H,Jiang W,Chatterjee P,Luo Y

    更新日期:2019-05-28 00:00:00

  • GalaxyDock: protein-ligand docking with flexible protein side-chains.

    abstract::An important issue in developing protein-ligand docking methods is how to incorporate receptor flexibility. Consideration of receptor flexibility using an ensemble of precompiled receptor conformations or by employing an effectively enlarged binding pocket has been reported to be useful. However, direct consideration ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300342z

    authors: Shin WH,Seok C

    更新日期:2012-12-21 00:00:00

  • Tyrosine Regulates β-Sheet Structure Formation in Amyloid-β42: A New Clustering Algorithm for Disordered Proteins.

    abstract::Our recent studies show that the single Tyr residue in the sequence of amyloid-β42 (Aβ42) is reactive toward various ligands, including metals and adenosine trisphospate (see: Coskuner , O. J. Biol. Inorg. Chem. 2016 , 21 , 957 - 973 and Coskuner , O. ; Murray , I. V. J. J. Alzheimer's Dis. 2014 , 41 , 561 - 574 ). Ho...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00761

    authors: Coskuner O,Uversky VN

    更新日期:2017-06-26 00:00:00

  • Structure-Based Rational Design of Novel Inhibitors Against Fructose-1,6-Bisphosphate Aldolase from Candida albicans.

    abstract::Class II fructose-1,6-bisphosphate aldolases (FBA-II) are attractive new targets for the discovery of drugs to combat invasive fungal infection, because they are absent in animals and higher plants. Although several FBA-II inhibitors have been reported, none of these inhibitors exhibit antifungal effect so far. In thi...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00763

    authors: Han X,Zhu X,Hong Z,Wei L,Ren Y,Wan F,Zhu S,Peng H,Guo L,Rao L,Feng L,Wan J

    更新日期:2017-06-26 00:00:00

  • Machine Learning Enhanced Spectrum Recognition Based on Computer Vision (SRCV) for Intelligent NMR Data Extraction.

    abstract::A machine learning enhanced spectrum recognition system called spectrum recognition based on computer vision (SRCV) for data extraction from previously analyzed 13C and 1H NMR spectra has been developed. The intelligent system was designed with four function modules to extract data from three areas of NMR images, incl...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c01046

    authors: Jia W,Yang Z,Yang M,Cheng L,Lei Z,Wang X

    更新日期:2021-01-25 00:00:00

  • Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discove

    abstract::All molecules of up to 11 atoms of C, N, O, and F possible under consideration of simple valency, chemical stability, and synthetic feasibility rules were generated and collected in a database (GDB). GDB contains 26.4 million molecules (110.9 million stereoisomers), including three- and four-membered rings and triple ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci600423u

    authors: Fink T,Reymond JL

    更新日期:2007-03-01 00:00:00

  • Chemoisosterism in the proteome.

    abstract::The concept of chemoisosterism of protein environments is introduced as the complementary property to bioisosterism of chemical fragments. In the same way that two chemical fragments are considered bioisosteric if they can bind to the same protein environment, two protein environments will be considered chemoisosteric...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci3002974

    authors: Jalencas X,Mestres J

    更新日期:2013-02-25 00:00:00

  • OPUS-Rota3: Improving Protein Side-Chain Modeling by Deep Neural Networks and Ensemble Methods.

    abstract::Side-chain modeling is critical for protein structure prediction since the uniqueness of the protein structure is largely determined by its side-chain packing conformation. In this paper, differing from most approaches that rely on rotamer library sampling, we first propose a novel side-chain rotamer prediction method...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00951

    authors: Xu G,Wang Q,Ma J

    更新日期:2020-12-28 00:00:00

  • Protein kinases: docking and homology modeling reliability.

    abstract::A database of about 700 high-resolution kinase structures was used to test the reliability of 17 docking procedures (using six docking software packages) by means of self- and cross-docking studies. The analysis of about 80 000 docking calculations suggests that the docking of an unknown ligand into a kinase has a pro...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci100161z

    authors: Tuccinardi T,Botta M,Giordano A,Martinelli A

    更新日期:2010-08-23 00:00:00

  • Combined 3D-QSAR modeling and molecular docking study on indolinone derivatives as inhibitors of 3-phosphoinositide-dependent protein kinase-1.

    abstract::3-Phosphoinositide-dependent protein kinase-1 (PDK1) is a promising target for developing novel anticancer drugs. In order to understand the structure-activity correlation of indolinone-based PDK1 inhibitors, we have carried out a combined molecular docking and three-dimensional quantitative structure-activity relatio...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci800147v

    authors: AbdulHameed MD,Hamza A,Liu J,Zhan CG

    更新日期:2008-09-01 00:00:00

  • Effects of Ligand Environment in Zr(IV) Assisted Peptide Hydrolysis.

    abstract::In this DFT study, activities of 11 different N2O4, N2O3, and NO2 core containing Zr(IV) complexes, 4,13-diaza-18-crown-6 (I'N2O4), 1,4,10-trioxa-7,13-diazacyclopentadecane (I'N2O3), and 2-(2-methoxy)ethanol (I'NO2), respectively, and their analogues in peptide hydrolysis have been investigated. Based on the experimen...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00781

    authors: Zhang T,Sharma G,Paul TJ,Hoffmann Z,Prabhakar R

    更新日期:2017-05-22 00:00:00

  • Full and partial agonism of ionotropic glutamate receptors indicated by molecular dynamics simulations.

    abstract::Ionotropic glutamate receptors (iGluRs) are synaptic proteins that facilitate signal transmission in the central nervous system. Extracellular iGluR cleft closure is linked to receptor activation; however, the mechanism underlying partial agonism is not entirely understood. Full agonists close the bilobed ligand-bindi...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci2000055

    authors: Postila PA,Ylilauri M,Pentikäinen OT

    更新日期:2011-05-23 00:00:00

  • Molecular Dynamics Simulations of Membrane-Bound STIM1 to Investigate Conformational Changes during STIM1 Activation upon Calcium Release.

    abstract::Calcium is involved in important intracellular processes, such as intracellular signaling from cell membrane receptors to the nucleus. Typically, calcium levels are kept at less than 100 nM in the nucleus and cytosol, but some calcium is stored in the endoplasmic reticulum (ER) lumen for rapid release to activate intr...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00475

    authors: Mukherjee S,Karolak A,Debant M,Buscaglia P,Renaudineau Y,Mignen O,Guida WC,Brooks WH

    更新日期:2017-02-27 00:00:00

  • AntiBac-Pred: A Web Application for Predicting Antibacterial Activity of Chemical Compounds.

    abstract::Discovery of new antibacterial agents is a never-ending task of medicinal chemistry. Every new drug brings significant improvement to patients with bacterial infections, but prolonged usage of antibacterials leads to the emergence of resistant strains. Therefore, novel active structures with new modes of action are re...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00436

    authors: Pogodin PV,Lagunin AA,Rudik AV,Druzhilovskiy DS,Filimonov DA,Poroikov VV

    更新日期:2019-11-25 00:00:00