Molecular Structure Extraction from Documents Using Deep Learning.

Abstract:

:Chemical structure extraction from documents remains a hard problem because of both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally but still routinely encounter situations where recognition rates are not yet satisfactory and systematic improvement is challenging. Complications impacting the performance of current approaches include the diversity in visual styles used by various software to render structures, the frequent use of ad hoc annotations, and other challenges related to image quality, including resolution and noise. We present end-to-end deep learning solutions for both segmenting molecular structures from documents and predicting chemical structures from the segmented images. This deep-learning-based approach does not require any handcrafted features, is learned directly from data, and is robust against variations in image quality and style. Using the deep learning approach described herein, we show that it is possible to perform well on both segmentation and prediction of low-resolution images containing moderately sized molecules found in journal articles and patents.

journal_name

J Chem Inf Model

authors

Staker J,Marshall K,Abel R,McQuaw CM

doi

10.1021/acs.jcim.8b00669

subject

Has Abstract

pub_date

2019-03-25 00:00:00

pages

1017-1029

issue

3

eissn

1549-9596

issn

1549-960X

journal_volume

59

pub_type

杂志文章
  • Plant Metabolite Databases: From Herbal Medicines to Modern Drug Discovery.

    abstract::Traditional herbal medicine has been an inseparable part of the traditional medical science in many countries throughout history. Nowadays, the popularity of using herbal medicines in daily life, as well as clinical practices, has gradually expanded to numerous Western countries with positive impacts and acceptance. T...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00826

    authors: Nguyen-Vo TH,Nguyen L,Do N,Nguyen TN,Trinh K,Cao H,Le L

    更新日期:2020-03-23 00:00:00

  • Identification of Enzyme Genes Using Chemical Structure Alignments of Substrate-Product Pairs.

    abstract::Although there are several databases that contain data on many metabolites and reactions in biochemical pathways, there is still a big gap in the numbers between experimentally identified enzymes and metabolites. It is supposed that many catalytic enzyme genes are still unknown. Although there are previous studies tha...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00216

    authors: Moriya Y,Yamada T,Okuda S,Nakagawa Z,Kotera M,Tokimatsu T,Kanehisa M,Goto S

    更新日期:2016-03-28 00:00:00

  • Regioselectivity prediction of CYP1A2-mediated phase I metabolism.

    abstract::A kinetic, reactivity-binding model has been proposed to predict the regioselectivity of substrates meditated by the CYP1A2 enzyme, which is responsible for the metabolism of planar-conjugated compounds such as caffeine. This model consists of a docking simulation for binding energy and a semiempirical molecular orbit...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci800001m

    authors: Jung J,Kim ND,Kim SY,Choi I,Cho KH,Oh WS,Kim DN,No KT

    更新日期:2008-05-01 00:00:00

  • Systematics of high-genus fullerenes.

    abstract::In this article, we present a systematic way to classify a family of high-genus fullerenes (HGFs) by decomposing them into two types of necklike structures, which are the negatively curved parts of parent toroidal carbon nanotubes. By replacing the faces of a uniform polyhedron with these necks, an HGF polyhedron corr...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci9001124

    authors: Chuang C,Jin BY

    更新日期:2009-07-01 00:00:00

  • Rigidity Strengthening: A Mechanism for Protein-Ligand Binding.

    abstract::Protein-ligand binding is essential to almost all life processes. The understanding of protein-ligand interactions is fundamentally important to rational drug and protein design. Based on large scale data sets, we show that protein rigidity strengthening or flexibility reduction is a mechanism in protein-ligand bindin...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00226

    authors: Nguyen DD,Xiao T,Wang M,Wei GW

    更新日期:2017-07-24 00:00:00

  • Search for novel aminoglycosides by combining fragment-based virtual screening and 3D-QSAR scoring.

    abstract::Aminoglycosides are antibiotics targeting the 16S RNA A site of the bacterial ribosome. There have been many efforts directed toward design of their synthetic derivatives, however with only few successes. As RNA binders, aminoglycosides are also a difficult target for computational drug design, since most of the exist...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci800361a

    authors: Setny P,Trylska J

    更新日期:2009-02-01 00:00:00

  • FAME 3: Predicting the Sites of Metabolism in Synthetic Compounds and Natural Products for Phase 1 and Phase 2 Metabolic Enzymes.

    abstract::In this work we present the third generation of FAst MEtabolizer (FAME 3), a collection of extra trees classifiers for the prediction of sites of metabolism (SoMs) in small molecules such as drugs, druglike compounds, natural products, agrochemicals, and cosmetics. FAME 3 was derived from the MetaQSAR database ( Pedre...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00376

    authors: Šícho M,Stork C,Mazzolari A,de Bruyn Kops C,Pedretti A,Testa B,Vistoli G,Svozil D,Kirchmair J

    更新日期:2019-08-26 00:00:00

  • Heteroaromatic π-stacking energy landscapes.

    abstract::In this study we investigate π-stacking interactions of a variety of aromatic heterocycles with benzene using dispersion corrected density functional theory. We calculate extensive potential energy surfaces for parallel-displaced interaction geometries. We find that dispersion contributes significantly to the interact...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500183u

    authors: Huber RG,Margreiter MA,Fuchs JE,von Grafenstein S,Tautermann CS,Liedl KR,Fox T

    更新日期:2014-05-27 00:00:00

  • GESSE: Predicting Drug Side Effects from Drug-Target Relationships.

    abstract::The in silico prediction of unwanted side effects (SEs) caused by the promiscuous behavior of drugs and their targets is highly relevant to the pharmaceutical industry. Considerable effort is now being put into computational and experimental screening of several suspected off-target proteins in the hope that SEs might...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00120

    authors: Pérez-Nueno VI,Souchet M,Karaboga AS,Ritchie DW

    更新日期:2015-09-28 00:00:00

  • CoNTub v2.0--algorithms for constructing C3-symmetric models of three-nanotube junctions.

    abstract::Here, a method is described for easily building three-carbon nanotube junctions. It allows the geometry to be found and bond connectivity of C(3) symmetric nanotube junctions to be established. Such junctions may present a variable degree of pyramidalization and are composed of three identical carbon nanotubes with ar...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200056p

    authors: Melchor S,Martin-Martinez FJ,Dobado JA

    更新日期:2011-06-27 00:00:00

  • Exploring inhibitor release pathways in histone deacetylases using random acceleration molecular dynamics simulations.

    abstract::Molecular channel exploration perseveres to be the prominent solution for eliciting structure and accessibility of active site and other internal spaces of macromolecules. The volume and silhouette characterization of these channels provides answers for the issues of substrate access and ligand swapping between the ob...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200584f

    authors: Kalyaanamoorthy S,Chen YP

    更新日期:2012-02-27 00:00:00

  • Delineation of agonist binding to the human histamine H4 receptor using mutational analysis, homology modeling, and ab initio calculations.

    abstract::A three-dimensional homology model of the human histamine H 4 receptor was developed to investigate the binding mode of a series of structurally diverse H 4-agonists, i.e. histamine, clozapine, and the recently described selective, nonimidazole agonist VUF 8430. Mutagenesis studies and docking of these ligands in a rh...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700474a

    authors: Jongejan A,Lim HD,Smits RA,de Esch IJ,Haaksma E,Leurs R

    更新日期:2008-07-01 00:00:00

  • H274Y's Effect on Oseltamivir Resistance: What Happens Before the Drug Enters the Binding Site.

    abstract::Increased reports of oseltamivir (OTV)-resistant strains of the influenza virus, such as the H274Y mutation on its neuraminidase (NA), have created some cause for concern. Many studies have been conducted in the attempt to uncover the mechanism of OTV resistance in H274Y NA. However, most of the reported studies on H2...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00331

    authors: Yusuf M,Mohamed N,Mohamad S,Janezic D,Damodaran KV,Wahab HA

    更新日期:2016-01-25 00:00:00

  • Inner and Outer Recursive Neural Networks for Chemoinformatics Applications.

    abstract::Deep learning methods applied to problems in chemoinformatics often require the use of recursive neural networks to handle data with graphical structure and variable size. We present a useful classification of recursive neural network approaches into two classes, the inner and outer approach. The inner approach uses r...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00384

    authors: Urban G,Subrahmanya N,Baldi P

    更新日期:2018-02-26 00:00:00

  • Structural insight into the unique binding properties of pyridylethanol(phenylethyl)amine inhibitor in human CYP51.

    abstract::Sterol 14α-demethylase (CYP51) is the main drug target for the treatment of fungal infections. The discovery of new efficient fungal CYP51 inhibitors requires an understanding of the structural requirements for selectivity for the fungal over the human ortholog. In this study, a binding mode of the pyridylethanol(phen...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500556k

    authors: Zelenko U,Hodošček M,Rozman D,Golič Grdadolnik S

    更新日期:2014-12-22 00:00:00

  • Ab Initio Investigation of CO2 Adsorption on 13-Atom 4d Clusters.

    abstract::In this work, we report an ab initio investigation based on density functional theory calculations within van der Waals D3 corrections to investigate the adsorption properties and activation of CO2 on transition-metal (TM) 13-atom clusters (TM = Ru, Rh, Pd, Ag), which is a key step for the development of subnano catal...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00792

    authors: Batista KEA,Ocampo-Restrepo VK,Soares MD,Quiles MG,Piotrowski MJ,Da Silva JLF

    更新日期:2020-02-24 00:00:00

  • Underestimated Halogen Bonds Forming with Protein Backbone in Protein Data Bank.

    abstract::Halogen bonds (XBs) are attracting increasing attention in biological systems. Protein Data Bank (PDB) archives experimentally determined XBs in biological macromolecules. However, no software for structure refinement in X-ray crystallography takes into account XBs, which might result in the weakening or even vanishin...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00235

    authors: Zhang Q,Xu Z,Shi J,Zhu W

    更新日期:2017-07-24 00:00:00

  • Improved Prediction of Drug-Target Interactions Using Self-Paced Learning with Collaborative Matrix Factorization.

    abstract::Identifying drug-target interactions (DTIs) plays an important role in the field of drug discovery, drug side-effects, and drug repositioning. However, in vivo or biochemical experimental methods for identifying new DTIs are extremely expensive and time-consuming. Recently, in silico or various computational methods h...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00408

    authors: Xia LY,Yang ZY,Zhang H,Liang Y

    更新日期:2019-07-22 00:00:00

  • ColBioS-FlavRC: a collection of bioselective flavonoids and related compounds filtered from high-throughput screening outcomes.

    abstract::Flavonoids, the vastest class of natural polyphenols, are extensively investigated for their multiple benefits on human health. Due to their physicochemical or biological properties, many representatives are considered to exhibit low selectivity among various protein targets or to plague high-throughput screening (HTS...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci5002668

    authors: Avram SI,Pacureanu LM,Bora A,Crisan L,Avram S,Kurunczi L

    更新日期:2014-08-25 00:00:00

  • Assessing the Protective Activity of a Recently Discovered Phenolic Compound against Oxidative Stress Using Computational Chemistry.

    abstract::The protection exerted by 3,5-dihydroxy-4-methoxybenzyl alcohol (DHMBA), a phenolic compound recently isolated from the Pacific oyster, against oxidative stress (OS) is investigated using the density functional theory. Our results indicate that DHMBA is an outstanding peroxyl radical scavenger, being about 15 times an...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00513

    authors: Villuendas-Rey Y,Alvarez-Idaboy JR,Galano A

    更新日期:2015-12-28 00:00:00

  • Hidden active information in a random compound library: extraction using a pseudo-structure-activity relationship model.

    abstract::We propose a hypothesis that "a model of active compound can be provided by integrating information of compounds high-ranked by docking simulation of a random compound library". In our hypothesis, the inclusion of true active compounds in the high-ranked compound is not necessary. We regard the high-ranked compounds a...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci7003384

    authors: Fukunishi H,Teramoto R,Shimada J

    更新日期:2008-03-01 00:00:00

  • Tyrosine Regulates β-Sheet Structure Formation in Amyloid-β42: A New Clustering Algorithm for Disordered Proteins.

    abstract::Our recent studies show that the single Tyr residue in the sequence of amyloid-β42 (Aβ42) is reactive toward various ligands, including metals and adenosine trisphospate (see: Coskuner , O. J. Biol. Inorg. Chem. 2016 , 21 , 957 - 973 and Coskuner , O. ; Murray , I. V. J. J. Alzheimer's Dis. 2014 , 41 , 561 - 574 ). Ho...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00761

    authors: Coskuner O,Uversky VN

    更新日期:2017-06-26 00:00:00

  • Efficiency of Stratification for Ensemble Docking Using Reduced Ensembles.

    abstract::Molecular docking can account for receptor flexibility by combining the docking score over multiple rigid receptor conformations, such as snapshots from a molecular dynamics simulation. Here, we evaluate a number of common snapshot selection strategies using a quality metric from stratified sampling, the efficiency of...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00314

    authors: Xie B,Clark JD,Minh DDL

    更新日期:2018-09-24 00:00:00

  • Accurate prediction of adsorption energies on graphene, using a dispersion-corrected semiempirical method including solvation.

    abstract::The accurate prediction of the adsorption energies of unsaturated molecules on graphene in the presence of water is essential for the design of molecules that can modify its properties and that can aid its processability. We here show that a semiempirical MO method corrected for dispersive interactions (PM6-DH2) can p...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci5003729

    authors: Vincent MA,Hillier IH

    更新日期:2014-08-25 00:00:00

  • Jaqpot Quattro: A Novel Computational Web Platform for Modeling and Analysis in Nanoinformatics.

    abstract::Engineered nanomaterials (ENMs) are increasingly infiltrating our lives as a result of their applications across multiple fields. However, ENM formulations may result in the modulation of pathways and mechanisms of toxic action that endanger human health and the environment. Alternative testing methods such as in sili...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00223

    authors: Chomenidis C,Drakakis G,Tsiliki G,Anagnostopoulou E,Valsamis A,Doganis P,Sopasakis P,Sarimveis H

    更新日期:2017-09-25 00:00:00

  • Multitarget structure-activity relationships characterized by activity-difference maps and consensus similarity measure.

    abstract::Dual and triple activity-difference (DAD/TAD) maps are tools for the systematic characterization of structure-activity relationships (SAR) of compound data sets screened against two or three targets. DAD and TAD maps are two- and three- dimensional representations of the pairwise activity differences of compound data ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200281v

    authors: Medina-Franco JL,Yongye AB,Pérez-Villanueva J,Houghten RA,Martínez-Mayorga K

    更新日期:2011-09-26 00:00:00

  • Cross-docking of inhibitors into CDK2 structures. 2.

    abstract::In the preceding paper (Duca, J. S.; Madison, V. S.; Voigt, J. H. J. Chem. Inf. Model. 2008, 48, 659-668), the accuracy of docking and affinity predictions of the Gold and Glide programs were investigated using single protein conformations spanning 150 CDK2/inhibitor crystallographic complexes. High docking accuracy w...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700428d

    authors: Voigt JH,Elkin C,Madison VS,Duca JS

    更新日期:2008-03-01 00:00:00

  • BiKi Life Sciences: A New Suite for Molecular Dynamics and Related Methods in Drug Discovery.

    abstract::In this paper, we introduce the BiKi Life Sciences suite. This software makes it easy for computational medicinal chemists to run ad hoc molecular dynamics protocols in a novel and task-oriented environment; as a notebook, BiKi (acronym of Binding Kinetics) keeps memory of any activity together with dependencies among...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00680

    authors: Decherchi S,Bottegoni G,Spitaleri A,Rocchia W,Cavalli A

    更新日期:2018-02-26 00:00:00

  • Prediction of pH-dependent aqueous solubility of druglike molecules.

    abstract::In the present work, the Henderson-Hasselbalch (HH) equation has been employed for the development of a tool for the prediction of pH-dependent aqueous solubility of drugs and drug candidates. A new prediction method for the intrinsic solubility was developed, based on artificial neural networks that have been trained...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci600292q

    authors: Hansen NT,Kouskoumvekaki I,Jørgensen FS,Brunak S,Jónsdóttir SO

    更新日期:2006-11-01 00:00:00

  • Conformational analysis of macrocycles: finding what common search methods miss.

    abstract::As computational drug design becomes increasingly reliant on virtual screening and on high-throughput 3D modeling, the need for fast, robust, and reliable methods for sampling molecular conformations has become greater than ever. Furthermore, chemical novelty is at a premium, forcing medicinal chemists to explore more...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci900238a

    authors: Bonnet P,Agrafiotis DK,Zhu F,Martin E

    更新日期:2009-10-01 00:00:00