ChemSchematicResolver: A Toolkit to Decode 2D Chemical Diagrams with Labels and R-Groups into Annotated Chemical Named Entities.

Abstract:

:The number of journal articles in the scientific domain has grown to the point where it has become impossible for researchers to capitalize on all findings in their relevant discipline. Information is stored in these articles in a number of ways, including figures that describe important results. In organic chemistry, these figures often present chemical schematic diagrams that graphically define the structures of carbon-based compounds. These diagrams are intuitive for an expert to comprehend, but they are not designed for machines. This work presents ChemSchematicResolver, a software tool that can be used to identify chemical schematic diagrams within the figure of a document, resolve any R-group substituents within them, and convert the resulting diagrams to a machine-readable format in a high-throughput, autonomous fashion. The tool includes a new algorithm that is used to identify relevant diagrams and a mechanism that combines these data with contextual information from the rest of the document for the creation of highly relational databases. It includes support for a variety of general R-group structures, the first time this is available in any open-source chemical schematic diagram extraction tool. It is presented alongside a self-generated evaluation set, on which the most important assessment metric, precision, achieved 83-100% for all assessed areas. The ChemSchematicResolver tool is released under the MIT license and is available to download from www.chemschematicresolver.org.

journal_name

J Chem Inf Model

authors

Beard EJ,Cole JM

doi

10.1021/acs.jcim.0c00042

subject

Has Abstract

pub_date

2020-04-27 00:00:00

pages

2059-2072

issue

4

eissn

1549-9596

issn

1549-960X

journal_volume

60

pub_type

杂志文章
  • Prediction of cytochrome P450 xenobiotic metabolism: tethered docking and reactivity derived from ligand molecular orbital analysis.

    abstract::Metabolism of xenobiotic and endogenous compounds is frequently complex, not completely elucidated, and therefore often ambiguous. The prediction of sites of metabolism (SoM) can be particularly helpful as a first step toward the identification of metabolites, a process especially relevant to drug discovery. This pape...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400058s

    authors: Tyzack JD,Williamson MJ,Torella R,Glen RC

    更新日期:2013-06-24 00:00:00

  • Accurate Hit Estimation for Iterative Screening Using Venn-ABERS Predictors.

    abstract::Iterative screening has emerged as a promising approach to increase the efficiency of high-throughput screening (HTS) campaigns in drug discovery. By learning from a subset of the compound library, inferences on what compounds to screen next can be made by predictive models. One of the challenges of iterative screenin...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00724

    authors: Buendia R,Kogej T,Engkvist O,Carlsson L,Linusson H,Johansson U,Toccaceli P,Ahlberg E

    更新日期:2019-03-25 00:00:00

  • Impact of template choice on homology model efficiency in virtual screening.

    abstract::Homology modeling is a reliable method of predicting the three-dimensional structures of proteins that lack NMR or X-ray crystallographic data. It employs the assumption that a structural resemblance exists between closely related proteins. Despite the availability of many crystal structures of possible templates, onl...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500001f

    authors: Rataj K,Witek J,Mordalski S,Kosciolek T,Bojarski AJ

    更新日期:2014-06-23 00:00:00

  • Random Forest Refinement of Pairwise Potentials for Protein-Ligand Decoy Detection.

    abstract::An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function's ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluat...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00356

    authors: Pei J,Zheng Z,Kim H,Song LF,Walworth S,Merz MR,Merz KM Jr

    更新日期:2019-07-22 00:00:00

  • Structure-based approach for the study of estrogen receptor binding affinity and subtype selectivity.

    abstract::Estrogens exert important physiological effects through the modulation of two human estrogen receptor (hER) subtypes, alpha (hERalpha) and beta (hERbeta). Because the levels and relative proportion of hERalpha and hERbeta differ significantly in different target cells, selective hER ligands could target specific tissu...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci8002182

    authors: Salum LB,Polikarpov I,Andricopulo AD

    更新日期:2008-11-01 00:00:00

  • Technique for energy decomposition in the study of "receptor-ligand" complexes.

    abstract::A new methodology to describe the interactions in "receptor-ligand" complexes is presented. The methodology is based on a combination of the 3D/4D QSAR BiS/MC and CoCon algorithms. The first algorithm performs the restricted docking of compounds to receptor pockets. The second determines the relationships between the ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci800405n

    authors: Potemkin VA,Pogrebnoy AA,Grishina MA

    更新日期:2009-06-01 00:00:00

  • Automated extraction of information on chemical-P-glycoprotein interactions from the literature.

    abstract::Knowledge of the interactions between drugs and transporters is important for drug discovery and development as well as for the evaluation of their clinical safety. We recently developed a text-mining system for the automatic extraction of information on chemical-CYP3A4 interactions from the literature. This system is...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci4003188

    authors: Yoshida S,Yamashita F,Ose A,Maeda K,Sugiyama Y,Hashida M

    更新日期:2013-10-28 00:00:00

  • Training a scoring function for the alignment of small molecules.

    abstract::A comprehensive data set of aligned ligands with highly similar binding pockets from the Protein Data Bank has been built. Based on this data set, a scoring function for recognizing good alignment poses for small molecules has been developed. This function is based on atoms and hydrogen-bond projected features. The co...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci100227h

    authors: Chan SL,Labute P

    更新日期:2010-09-27 00:00:00

  • HackaMol: An Object-Oriented Modern Perl Library for Molecular Hacking on Multiple Scales.

    abstract::HackaMol is an open source, object-oriented toolkit written in Modern Perl that organizes atoms within molecules and provides chemically intuitive attributes and methods. The library consists of two components: HackaMol, the core that contains classes for storing and manipulating molecular information, and HackaMol::X...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500359e

    authors: Riccardi D,Parks JM,Johs A,Smith JC

    更新日期:2015-04-27 00:00:00

  • Molecular Structure-Based Large-Scale Prediction of Chemical-Induced Gene Expression Changes.

    abstract::The quantitative structure-activity relationship (QSAR) approach has been used to model a wide range of chemical-induced biological responses. However, it had not been utilized to model chemical-induced genomewide gene expression changes until very recently, owing to the complexity of training and evaluating a very la...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00281

    authors: Liu R,AbdulHameed MDM,Wallqvist A

    更新日期:2017-09-25 00:00:00

  • Alanine Scanning Effects on the Biochemical and Biophysical Properties of Intrinsically Disordered Proteins: A Case Study of the Histidine to Alanine Mutations in Amyloid-β42.

    abstract::Alanine scanning is a tool in molecular biology that is commonly used to evaluate the contribution of a specific amino acid residue to the stability and function of a protein. Additionally, this tool is also used to understand whether the side chain of a specific amino acid residue plays a role in the protein's bioact...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00926

    authors: Coskuner-Weber O,Uversky VN

    更新日期:2019-02-25 00:00:00

  • Exploration of Interfacial Hydration Networks of Target-Ligand Complexes.

    abstract::Interfacial hydration strongly influences interactions between biomolecules. For example, drug-target complexes are often stabilized by hydration networks formed between hydrophilic residues and water molecules at the interface. Exhaustive exploration of hydration networks is challenging for experimental as well as th...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00638

    authors: Jeszenői N,Bálint M,Horváth I,van der Spoel D,Hetényi C

    更新日期:2016-01-25 00:00:00

  • Ligand- and Structure-Based Analysis of Deep Learning-Generated Potential α2a Adrenoceptor Agonists.

    abstract::The α2a adrenoceptor is a medically relevant subtype of the G protein-coupled receptor family. Unfortunately, high-throughput techniques aimed at producing novel drug leads for this receptor have been largely unsuccessful because of the complex pharmacology of adrenergic receptors. As such, cutting-edge in silico liga...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c01019

    authors: Schultz KJ,Colby SM,Lin VS,Wright AT,Renslow RS

    更新日期:2021-01-25 00:00:00

  • On three-electron bonds and hydrogen bonds in the open-shell complexes [H2X2]+ for X = F, Cl, and Br.

    abstract::The [H2X2]+ (X = Cl, Br) formula could refer to two possible stable structures, namely, the hydrogen-bonded complex and the three-electron-bonded one. In contrary to the results published by other authors, we claim that for the F-type structures the hydrogen-bonded form is the only possible one and the [HFFH]+ complex...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci600355g

    authors: Bil A,Berski S,Latajka Z

    更新日期:2007-05-01 00:00:00

  • Protein flexibility in virtual screening: the BACE-1 case study.

    abstract::Simulating protein flexibility is a major issue in the docking-based drug-design process for which a single methodological solution does not exist. In our search of new anti-Alzheimer ligands, we were faced with the challenge of including receptor plasticity in a virtual screening campaign aimed at finding new β-secre...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300390h

    authors: Cosconati S,Marinelli L,Di Leva FS,La Pietra V,De Simone A,Mancini F,Andrisano V,Novellino E,Goodsell DS,Olson AJ

    更新日期:2012-10-22 00:00:00

  • Identification of Enzyme Genes Using Chemical Structure Alignments of Substrate-Product Pairs.

    abstract::Although there are several databases that contain data on many metabolites and reactions in biochemical pathways, there is still a big gap in the numbers between experimentally identified enzymes and metabolites. It is supposed that many catalytic enzyme genes are still unknown. Although there are previous studies tha...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00216

    authors: Moriya Y,Yamada T,Okuda S,Nakagawa Z,Kotera M,Tokimatsu T,Kanehisa M,Goto S

    更新日期:2016-03-28 00:00:00

  • Criterion for evaluating the predictive ability of nonlinear regression models without cross-validation.

    abstract::We propose predictive performance criteria for nonlinear regression models without cross-validation. The proposed criteria are the determination coefficient and the root-mean-square error for the midpoints between k-nearest-neighbor data points. These criteria can be used to evaluate predictive ability after the regre...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci4003766

    authors: Kaneko H,Funatsu K

    更新日期:2013-09-23 00:00:00

  • H274Y's Effect on Oseltamivir Resistance: What Happens Before the Drug Enters the Binding Site.

    abstract::Increased reports of oseltamivir (OTV)-resistant strains of the influenza virus, such as the H274Y mutation on its neuraminidase (NA), have created some cause for concern. Many studies have been conducted in the attempt to uncover the mechanism of OTV resistance in H274Y NA. However, most of the reported studies on H2...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00331

    authors: Yusuf M,Mohamed N,Mohamad S,Janezic D,Damodaran KV,Wahab HA

    更新日期:2016-01-25 00:00:00

  • Multiple e-pharmacophore modeling, 3D-QSAR, and high-throughput virtual screening of hepatitis C virus NS5B polymerase inhibitors.

    abstract::The hepatitis C virus (HCV) NS5B RNA-dependent RNA polymerase (RdRP) is a crucial and unique component of the HCV RNA replication machinery and a validated target for drug discovery. Multiple crystal structures of NS5B inhibitor complexes have facilitated the identification of novel compound scaffolds through in silic...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400644r

    authors: Therese PJ,Manvar D,Kondepudi S,Battu MB,Sriram D,Basu A,Yogeeswari P,Kaushik-Basu N

    更新日期:2014-02-24 00:00:00

  • Supervised self-organizing maps in drug discovery. 2. Improvements in descriptor selection and model validation.

    abstract::The modeling of nonlinear descriptor-target relationships is a topic of considerable interest in drug discovery. We, herein, continue reporting the use of the self-organizing map-a nonlinear, topology-preserving pattern recognition technique that exhibits considerable promise in modeling and decoding these relationshi...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0500841

    authors: Xiao YD,Harris R,Bayram E,Ii PS,Schmitt JD

    更新日期:2006-01-01 00:00:00

  • Dihedral-based segment identification and classification of biopolymers I: proteins.

    abstract::A new structure classification scheme for biopolymers is introduced, which is solely based on main-chain dihedral angles. It is shown that by dividing a biopolymer into segments containing two central residues, a local classification can be performed. The method is referred to as DISICL, short for Dihedral-based Segme...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400541d

    authors: Nagy G,Oostenbrink C

    更新日期:2014-01-27 00:00:00

  • In silico deconstruction of ATP-competitive inhibitors of glycogen synthase kinase-3β.

    abstract::Fragment-based methods have emerged in the last two decades as alternatives to traditional high throughput screenings for the identification of chemical starting points in drug discovery. One arguable yet popular assumption about fragment-based design is that the fragment binding mode remains conserved upon chemical e...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300355p

    authors: Bisignano P,Lambruschini C,Bicego M,Murino V,Favia AD,Cavalli A

    更新日期:2012-12-21 00:00:00

  • Target-independent prediction of drug synergies using only drug lipophilicity.

    abstract::Physicochemical properties of compounds have been instrumental in selecting lead compounds with increased drug-likeness. However, the relationship between physicochemical properties of constituent drugs and the tendency to exhibit drug interaction has not been systematically studied. We assembled physicochemical descr...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500276x

    authors: Yilancioglu K,Weinstein ZB,Meydan C,Akhmetov A,Toprak I,Durmaz A,Iossifov I,Kazan H,Roth FP,Cokol M

    更新日期:2014-08-25 00:00:00

  • Concept-based semi-automatic classification of drugs.

    abstract::The anatomical therapeutic chemical (ATC) classification system maintained by the World Health Organization provides a global standard for the classification of medical substances and serves as a source for drug repurposing research. Nevertheless, it lacks several drugs that are major players in the global drug market...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci9000844

    authors: Gurulingappa H,Kolárik C,Hofmann-Apitius M,Fluck J

    更新日期:2009-08-01 00:00:00

  • Applicability Domain ANalysis (ADAN): a robust method for assessing the reliability of drug property predictions.

    abstract::We report a novel method called ADAN (Applicability Domain ANalysis) for assessing the reliability of drug property predictions obtained by in silico methods. The assessment provided by ADAN is based on the comparison of the query compound with the training set, using six diverse similarity criteria. For every criteri...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500172z

    authors: Carrió P,Pinto M,Ecker G,Sanz F,Pastor M

    更新日期:2014-05-27 00:00:00

  • Truncated variants of the GCN4 transcription activator protein bind DNA with dramatically different dynamical motifs.

    abstract::The yeast protein GCN4 is a transcriptional activator in the basic leucine zipper (bZip) family, whose distinguishing feature is the "chopstick-like" homodimer of alpha helices formed at the DNA-binding interface. While experiments have shown that truncated versions of the protein retain biologically relevant DNA-bind...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500448e

    authors: McHarris DM,Barr DA

    更新日期:2014-10-27 00:00:00

  • Force Field Benchmark of Amino Acids. 2. Partition Coefficients between Water and Organic Solvents.

    abstract::The partitioning of amino acids between water and apolar environments is of vital importance in protein function and drug delivery. Here we present an extensive benchmark for octanol/water (log Poct), chloroform/water (log Pclf), and cyclohexane/water (log Pchx) partition coefficients of neutral amino acid side chain ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00493

    authors: Zhang H,Jiang Y,Cui Z,Yin C

    更新日期:2018-08-27 00:00:00

  • Structure-activity relationships in non-ligand binding pocket (non-LBP) diarylhydrazide antiandrogens.

    abstract::We report the synthesis and a study of the structure-activity relationships of a new series of diarylhydrazides as potential selective non-ligand binding pocket androgen receptor antagonists. Their biological activity as antiandrogens in the context of the development of treatments for castration resistant prostate ca...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400189m

    authors: Caboni L,Egan B,Kelly B,Blanco F,Fayne D,Meegan MJ,Lloyd DG

    更新日期:2013-08-26 00:00:00

  • New serotonin 5-HT(6) ligands from common feature pharmacophore hypotheses.

    abstract::Serotonin 5-HT6 receptor antagonists are thought to play an important role in the treatment of psychiatry, Alzheimer's disease, and probably obesity. To find novel and potent 5-HT6 antagonists and to provide a new idea for drug design, we used a ligand-based pharmacophore to perform the virtual screening of a commerci...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700160t

    authors: Kim HJ,Doddareddy MR,Choo H,Cho YS,No KT,Park WK,Pae AN

    更新日期:2008-01-01 00:00:00

  • Combined 3D-QSAR modeling and molecular docking study on indolinone derivatives as inhibitors of 3-phosphoinositide-dependent protein kinase-1.

    abstract::3-Phosphoinositide-dependent protein kinase-1 (PDK1) is a promising target for developing novel anticancer drugs. In order to understand the structure-activity correlation of indolinone-based PDK1 inhibitors, we have carried out a combined molecular docking and three-dimensional quantitative structure-activity relatio...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci800147v

    authors: AbdulHameed MD,Hamza A,Liu J,Zhan CG

    更新日期:2008-09-01 00:00:00