Abstract:
:Chemical structure extraction from documents remains a hard problem because of both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally but still routinely encounter situations where recognition rates are not yet satisfactory and systematic improvement is challenging. Complications impacting the performance of current approaches include the diversity in visual styles used by various software to render structures, the frequent use of ad hoc annotations, and other challenges related to image quality, including resolution and noise. We present end-to-end deep learning solutions for both segmenting molecular structures from documents and predicting chemical structures from the segmented images. This deep-learning-based approach does not require any handcrafted features, is learned directly from data, and is robust against variations in image quality and style. Using the deep learning approach described herein, we show that it is possible to perform well on both segmentation and prediction of low-resolution images containing moderately sized molecules found in journal articles and patents.
journal_name
J Chem Inf Modeljournal_title
Journal of chemical information and modelingauthors
Staker J,Marshall K,Abel R,McQuaw CMdoi
10.1021/acs.jcim.8b00669subject
Has Abstractpub_date
2019-03-25 00:00:00pages
1017-1029issue
3eissn
1549-9596issn
1549-960Xjournal_volume
59pub_type
杂志文章abstract::Flavonoids, the vastest class of natural polyphenols, are extensively investigated for their multiple benefits on human health. Due to their physicochemical or biological properties, many representatives are considered to exhibit low selectivity among various protein targets or to plague high-throughput screening (HTS...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci5002668
更新日期:2014-08-25 00:00:00
abstract::Spin diffusion is a formidable problem when interpreting NMR data of chemical compounds. We developed a method to reconstruct the conformational ensemble of flexible molecules displaying spin diffusion, which minimizes the subjective bias in the interpretation of experimental data and which can be used routinely to ob...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00259
更新日期:2019-06-24 00:00:00
abstract::The determination of the validity of a QSAR model when applied to new compounds is an important concern in the field of QSAR and QSPR modeling. Various scoring techniques can be applied to specific types of models. We present a technique with which we can state whether a new compound will be well predicted by a previo...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci0497511
更新日期:2005-01-01 00:00:00
abstract::ThermoData Engine (TDE) is the first full-scale software implementation of the dynamic data evaluation concept, as reported in this journal. The present article describes the background and implementation for new additions in latest release of TDE. Advances are in the areas of program architecture and quality improvem...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci4005699
更新日期:2013-12-23 00:00:00
abstract::In this work we present the third generation of FAst MEtabolizer (FAME 3), a collection of extra trees classifiers for the prediction of sites of metabolism (SoMs) in small molecules such as drugs, druglike compounds, natural products, agrochemicals, and cosmetics. FAME 3 was derived from the MetaQSAR database ( Pedre...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00376
更新日期:2019-08-26 00:00:00
abstract::Protein dynamics play a critical role in ligand binding, and different models have been proposed to explain the relationships between protein motion and molecular recognition. Here, we present a study of ligand-binding processes associated with large conformational changes of a protein to elucidate the critical choice...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.7b00125
更新日期:2017-07-24 00:00:00
abstract::3-Pyridyl ethers are excellent nAChRs ligands, which show high subtype selectivity and binding affinity to alpha4beta2 nAChR. Although the quantitative structure-activity relationship (QSAR) of nAChRs ligands has been widely investigated using various classes of compounds, the open ring analogues of 3-pyridyl ethers h...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci0498113
更新日期:2005-03-01 00:00:00
abstract::A comprehensive data set of aligned ligands with highly similar binding pockets from the Protein Data Bank has been built. Based on this data set, a scoring function for recognizing good alignment poses for small molecules has been developed. This function is based on atoms and hydrogen-bond projected features. The co...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci100227h
更新日期:2010-09-27 00:00:00
abstract::How well do different classification methods perform in selecting the ligands of a protein target out of large compound collections not used to train the model? Support vector machines, random forest, artificial neural networks, k-nearest-neighbor classification with genetic-algorithm-optimized feature selection, tren...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci050519k
更新日期:2006-05-01 00:00:00
abstract::Structure-based computational target prediction methods identify potential targets for a bioactive compound. Methods based on protein-ligand docking so far face many challenges, where the greatest probably is the ranking of true targets in a large data set of protein structures. Currently, no standard data sets for ev...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci500131x
更新日期:2014-08-25 00:00:00
abstract::Blockade of human ether-à-go-go related gene (hERG) channel prolongs the duration of the cardiac action potential and is a common reason for drug failure in preclinical safety trials. Therefore, it is of great importance to develop robust in silico tools to predict potential hERG blockers in the early stages of drug d...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci200271d
更新日期:2011-11-28 00:00:00
abstract::The ONIOM scheme is one of the most popular QM/MM approaches, but its extended application has been so far hindered by the limited availability of force fields in most practical implementations. This paper describes a simple software code to overcome this limitation, and its application to three representative chemica...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.8b00332
更新日期:2018-09-24 00:00:00
abstract::There are only four derivatives of pseudouridine (Ψ) that are known to occur naturally in RNA as post-transcriptional modifications. We have studied the conformational consequences of pseudouridylation and further modifications using replica exchange molecular dynamics simulations at the nucleoside level, and the simu...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c00369
更新日期:2020-10-26 00:00:00
abstract::Development of coarse-grained (CG) molecular dynamics models is often a laborious process which commonly relies upon approximations to similar models, rather than systematic parametrization. PyCGTOOL automates much of the construction of CG models via calculation of both equilibrium values and force constants of inter...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.7b00096
更新日期:2017-04-24 00:00:00
abstract::It is of great interest in modern drug design to accurately calculate the free energies of protein-ligand or nucleic acid-ligand binding. MM-PBSA (molecular mechanics Poisson-Boltzmann surface area) and MM-GBSA (molecular mechanics generalized Born surface area) have gained popularity in this field. For both methods, ...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci300064d
更新日期:2012-05-25 00:00:00
abstract::The SZMAP method computes binding free energies and the corresponding thermodynamic components for water molecules in the binding site of a protein structure [ SZMAP, 1.0.0 ; OpenEye Scientific Software Inc. : Santa Fe, NM, USA , 2011 ]. In this work, the ability of SZMAP to predict water structure and thermodynamic s...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci500746d
更新日期:2015-08-24 00:00:00
abstract::Novel statistical potentials derived from known protein structures are presented. They are designed to describe cation-pi and amino-pi interactions between a positively charged amino acid or an amino acid carrying a partially charged amino group and an aromatic moiety. These potentials are based on the propensity of r...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci050395b
更新日期:2006-03-01 00:00:00
abstract::G protein-coupled receptors (GPCRs) are the largest family of cell surface receptors, which is arguably the most important family of drug target. With the technology breakthroughs in X-ray crystallography and cryo-electron microscopy, more than 300 GPCR-ligand complex structures have been publicly reported since 2007,...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00699
更新日期:2020-09-28 00:00:00
abstract::Fragment-based methods have emerged in the last two decades as alternatives to traditional high throughput screenings for the identification of chemical starting points in drug discovery. One arguable yet popular assumption about fragment-based design is that the fragment binding mode remains conserved upon chemical e...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci300355p
更新日期:2012-12-21 00:00:00
abstract::Structure-based virtual screening is highly used in the early stages of drug discovery to identify new putative lead compounds for a given target. However, when a small molecule elicits a biological effect, but its target is unknown, or the side effects it causes arise from its undesired interaction with unknown count...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00428
更新日期:2019-11-25 00:00:00
abstract::Using an all-atom explicit water model and replica exchange umbrella sampling simulations, we investigated the molecular mechanisms of benzoic acid partitioning into two model lipid bilayers. The first was formed of 1,2-dimyristoyl-sn-glycero-3-phosphocholine (DMPC) lipids, whereas the second was composed of an equimo...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c00590
更新日期:2020-08-24 00:00:00
abstract::Small-molecule protein docking is an essential tool in drug design and to understand molecular recognition. In the present work we introduce FlexAID, a small-molecule docking algorithm that accounts for target side-chain flexibility and utilizes a soft scoring function, i.e. one that is not highly dependent on specifi...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.5b00078
更新日期:2015-07-27 00:00:00
abstract::The balance between structural stability and functional plasticity in proteins that share common three-dimensional folds is the key factor that drives protein evolvability. The ability to distinguish the parts of homologous proteins that underlie common structural organization patterns from the parts acting as regulat...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.6b00504
更新日期:2017-04-24 00:00:00
abstract::The ligand binding determinants for the angiotensin II type 1 receptor (AT1R), a G protein-coupled receptor (GPCR), have been characterized by means of computer simulations. As a first step, a pharmacophore model of various known AT1R ligands exhibiting a wide range of binding affinities was generated. Second, a struc...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci400400m
更新日期:2013-11-25 00:00:00
abstract::All molecules of up to 11 atoms of C, N, O, and F possible under consideration of simple valency, chemical stability, and synthetic feasibility rules were generated and collected in a database (GDB). GDB contains 26.4 million molecules (110.9 million stereoisomers), including three- and four-membered rings and triple ...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci600423u
更新日期:2007-03-01 00:00:00
abstract::Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci900161g
更新日期:2009-09-01 00:00:00
abstract::Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations ar...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci500580y
更新日期:2015-01-26 00:00:00
abstract::Calcium is involved in important intracellular processes, such as intracellular signaling from cell membrane receptors to the nucleus. Typically, calcium levels are kept at less than 100 nM in the nucleus and cytosol, but some calcium is stored in the endoplasmic reticulum (ER) lumen for rapid release to activate intr...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.6b00475
更新日期:2017-02-27 00:00:00
abstract::Methods that rapidly evaluate molecular complexity and synthetic feasibility are becoming increasingly important for in silico chemistry. We propose a new metric based on relative atomic electronegativities and bond parameters that evaluate both synthetic and molecular complexity (SMCM) starting from chemical structur...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci0501387
更新日期:2005-09-01 00:00:00
abstract::Resonance structures of polycyclic aromatic hydrocarbons can be associated with numerical formulas by assigning pi-electrons of C=C double bonds to individual benzenoid rings. Each C=C double bond in a resonance structure assigns two pi-electrons to a ring in a fused-benzenoid system if it is not shared by adjacent ri...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci050196s
更新日期:2006-01-01 00:00:00