Get Your Atoms in Order--An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm.

Abstract:

:Finding a canonical ordering of the atoms in a molecule is a prerequisite for generating a unique representation of the molecule. The canonicalization of a molecule is usually accomplished by applying some sort of graph relaxation algorithm, the most common of which is the Morgan algorithm. There are known issues with that algorithm that lead to noncanonical atom orderings as well as problems when it is applied to large molecules like proteins. Furthermore, each cheminformatics toolkit or software provides its own version of a canonical ordering, most based on unpublished algorithms, which also complicates the generation of a universal unique identifier for molecules. We present an alternative canonicalization approach that uses a standard stable-sorting algorithm instead of a Morgan-like index. Two new invariants that allow canonical ordering of molecules with dependent chirality as well as those with highly symmetrical cyclic graphs have been developed. The new approach proved to be robust and fast when tested on the 1.45 million compounds of the ChEMBL 20 data set in different scenarios like random renumbering of input atoms or SMILES round tripping. Our new algorithm is able to generate a canonical order of the atoms of protein molecules within a few milliseconds. The novel algorithm is implemented in the open-source cheminformatics toolkit RDKit. With this paper, we provide a reference Python implementation of the algorithm that could easily be integrated in any cheminformatics toolkit. This provides a first step toward a common standard for canonical atom ordering to generate a universal unique identifier for molecules other than InChI.

journal_name

J Chem Inf Model

authors

Schneider N,Sayle RA,Landrum GA

doi

10.1021/acs.jcim.5b00543

subject

Has Abstract

pub_date

2015-10-26 00:00:00

pages

2111-20

issue

10

eissn

1549-9596

issn

1549-960X

journal_volume

55

pub_type

杂志文章
  • Chemoisosterism in the proteome.

    abstract::The concept of chemoisosterism of protein environments is introduced as the complementary property to bioisosterism of chemical fragments. In the same way that two chemical fragments are considered bioisosteric if they can bind to the same protein environment, two protein environments will be considered chemoisosteric...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci3002974

    authors: Jalencas X,Mestres J

    更新日期:2013-02-25 00:00:00

  • Molecular Modeling Investigation of the Interaction between Humicola insolens Cutinase and SDS Surfactant Suggests a Mechanism for Enzyme Inactivation.

    abstract::One of the largest commercial applications of enzymes and surfactants is as main components in modern detergents. The high concentration of surfactant compounds usually present in detergents can, however, negatively affect the enzymatic activity. To remedy this drawback, it is of great importance to characterize the i...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00857

    authors: Kjølbye LR,Laustsen A,Vestergaard M,Periole X,De Maria L,Svendsen A,Coletta A,Schiøtt B

    更新日期:2019-05-28 00:00:00

  • Chemoinformatics-based classification of prohibited substances employed for doping in sport.

    abstract::Representative molecules from 10 classes of prohibited substances were taken from the World Anti-Doping Agency (WADA) list, augmented by molecules from corresponding activity classes found in the MDDR database. Together with some explicitly allowed compounds, these formed a set of 5245 molecules. Five types of fingerp...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0601160

    authors: Cannon EO,Bender A,Palmer DS,Mitchell JB

    更新日期:2006-11-01 00:00:00

  • An Efficient Lossless Compression Algorithm for Trajectories of Atom Positions and Volumetric Data.

    abstract::We present our newly developed and highly efficient lossless compression algorithm for trajectories of atom positions and volumetric data. The algorithm is designed as a two-step approach. In the first step, efficient polynomial extrapolation schemes reduce the information entropy of the data by exploiting both spatia...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00501

    authors: Brehm M,Thomas M

    更新日期:2018-10-22 00:00:00

  • Structural insight into the unique binding properties of pyridylethanol(phenylethyl)amine inhibitor in human CYP51.

    abstract::Sterol 14α-demethylase (CYP51) is the main drug target for the treatment of fungal infections. The discovery of new efficient fungal CYP51 inhibitors requires an understanding of the structural requirements for selectivity for the fungal over the human ortholog. In this study, a binding mode of the pyridylethanol(phen...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500556k

    authors: Zelenko U,Hodošček M,Rozman D,Golič Grdadolnik S

    更新日期:2014-12-22 00:00:00

  • iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides.

    abstract::Umami or the taste of monosodium glutamate represents one of the major attractive taste modalities in humans. Therefore, knowledge about biophysical and biochemical properties of the umami taste is important for both scientific research and the food industry. Experimental approaches for predicting umami peptides are l...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00707

    authors: Charoenkwan P,Yana J,Nantasenamat C,Hasan MM,Shoombuatong W

    更新日期:2020-12-28 00:00:00

  • Two model system of the alpha1A-adrenoceptor docked with selected ligands.

    abstract::In this study, we have developed a two model system to mimic the active and inactive states of a G-protein coupled receptor specifically the alpha1A adrenergic receptor. We have docked two agonists, epinephrine (phenylamine type) and oxymetazoline (imidazoline type), as well as two antagonists, prazosin and 5-methylur...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700026v

    authors: Asher WB,Hoskins SN,Slasor LA,Morris DH,Cook EM,Bautista DL

    更新日期:2007-09-01 00:00:00

  • Development of novel statistical potentials describing cation-pi interactions in proteins and comparison with semiempirical and quantum chemistry approaches.

    abstract::Novel statistical potentials derived from known protein structures are presented. They are designed to describe cation-pi and amino-pi interactions between a positively charged amino acid or an amino acid carrying a partially charged amino group and an aromatic moiety. These potentials are based on the propensity of r...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci050395b

    authors: Gilis D,Biot C,Buisine E,Dehouck Y,Rooman M

    更新日期:2006-03-01 00:00:00

  • Nonadditivity Analysis.

    abstract::We introduce the statistics behind a novel type of SAR analysis named "nonadditivity analysis". On the basis of all pairs of matched pairs within a given data set, the approach analyzes whether the same transformations between related molecules have the same effect, i.e., whether they are additive. Assuming that the e...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00631

    authors: Kramer C

    更新日期:2019-09-23 00:00:00

  • Reliable and Performant Identification of Low-Energy Conformers in the Gas Phase and Water.

    abstract::Prediction of compound properties from structure via quantitative structure-activity relationship and machine-learning approaches is an important computational chemistry task in small-molecule drug research. Though many such properties are dependent on three-dimensional structures or even conformer ensembles, the majo...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00151

    authors: Cavasin AT,Hillisch A,Uellendahl F,Schneckener S,Göller AH

    更新日期:2018-05-29 00:00:00

  • Enrichment factor analyses on G-protein coupled receptors with known crystal structure.

    abstract::G-protein coupled receptors (GPCRs) are highly relevant drug targets. Four GPCRs with known crystal structure were analyzed with docking (AutoDock4) and postdocking (MM-PBSA) in order to evaluate the ability to recognize known antagonists from a larger database of molecular decoys and to predict correct binding modes....

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci4000745

    authors: Anighoro A,Rastelli G

    更新日期:2013-04-22 00:00:00

  • Development of a computational tool to rival experts in the prediction of sites of metabolism of xenobiotics by p450s.

    abstract::The metabolism of xenobiotics--and more specifically drugs--in the liver is a critical process controlling their half-life. Although there exist experimental methods, which measure the metabolic stability of xenobiotics and identify their metabolites, developing higher throughput predictive methods is an avenue of res...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci3003073

    authors: Campagna-Slater V,Pottel J,Therrien E,Cantin LD,Moitessier N

    更新日期:2012-09-24 00:00:00

  • Mechanism of Hormone Peptide Activation of a GPCR: Angiotensin II Activated State of AT1R Initiated by van der Waals Attraction.

    abstract::We present a succession of structural changes involved in hormone peptide activation of a prototypical GPCR. Microsecond molecular dynamics simulation generated conformational ensembles reveal propagation of structural changes through key "microswitches" within human AT1R bound to native hormone. The endocrine octa-pe...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00583

    authors: Singh KD,Unal H,Desnoyer R,Karnik SS

    更新日期:2019-01-28 00:00:00

  • Three-dimensional quantitative structure-activity relationship of nucleosides acting at the A3 adenosine receptor: analysis of binding and relative efficacy.

    abstract::The binding affinity and relative maximal efficacy of human A3 adenosine receptor (AR) agonists were each subjected to ligand-based three-dimensional quantitative structure-activity relationship analysis. Comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) used a...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci600501z

    authors: Kimand SK,Jacobson KA

    更新日期:2007-05-01 00:00:00

  • HLA-DM Stabilizes the Empty MHCII Binding Groove: A Model Using Customized Natural Move Monte Carlo.

    abstract::MHC class II molecules bind peptides derived from extracellular proteins that have been ingested by antigen-presenting cells and display them to the immune system. Peptide loading occurs within the antigen-presenting cell and is facilitated by HLA-DM. HLA-DM stabilizes the open conformation of the MHCII binding groove...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00104

    authors: Demharter S,Knapp B,Deane C,Minary P

    更新日期:2019-06-24 00:00:00

  • Heuristics from Modeling of Spectral Overlap in Förster Resonance Energy Transfer (FRET).

    abstract::Among the photophysical parameters that underpin Förster resonance energy transfer (FRET), perhaps the least explored is the spectral overlap term ( J). While by definition J increases linearly with acceptor molar absorption coefficient (ε(A) in M-1 cm-1), is proportional to wavelength (λ4), and depends on the degree ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00753

    authors: Qi Q,Taniguchi M,Lindsey JS

    更新日期:2019-02-25 00:00:00

  • Efficient Strategy for the Calculation of Solvation Free Energies in Water and Chloroform at the Quantum Mechanical/Molecular Mechanical Level.

    abstract::The partitioning of solute molecules between immiscible solvents with significantly different polarities is of great importance. The polarization between the solute and solvent molecules plays an essential role in determining the solubility of the solute, which makes computational studies utilizing molecular mechanics...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00001

    authors: Wang M,Li P,Jia X,Liu W,Shao Y,Hu W,Zheng J,Brooks BR,Mei Y

    更新日期:2017-10-23 00:00:00

  • Plant Metabolite Databases: From Herbal Medicines to Modern Drug Discovery.

    abstract::Traditional herbal medicine has been an inseparable part of the traditional medical science in many countries throughout history. Nowadays, the popularity of using herbal medicines in daily life, as well as clinical practices, has gradually expanded to numerous Western countries with positive impacts and acceptance. T...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00826

    authors: Nguyen-Vo TH,Nguyen L,Do N,Nguyen TN,Trinh K,Cao H,Le L

    更新日期:2020-03-23 00:00:00

  • Metabotropic glutamate receptor-mediated currents at the climbing fiber to Purkinje cell synapse.

    abstract::Different forms of synaptic plasticity in the cerebellum expressed at the synapses onto Purkinje cells (PCs) are mediated by membrane metabotropic glutamate receptors (mGluRs). There are three main mGluR groups with a total of 8 subtypes. Although mGluRs are also found at the climbing fiber (CF) to PC synapses, the di...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci050161s

    authors: Andjus PR,Bajić A,Zhu L,Strata P

    更新日期:2005-11-01 00:00:00

  • Molecular Structure-Based Large-Scale Prediction of Chemical-Induced Gene Expression Changes.

    abstract::The quantitative structure-activity relationship (QSAR) approach has been used to model a wide range of chemical-induced biological responses. However, it had not been utilized to model chemical-induced genomewide gene expression changes until very recently, owing to the complexity of training and evaluating a very la...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00281

    authors: Liu R,AbdulHameed MDM,Wallqvist A

    更新日期:2017-09-25 00:00:00

  • Structural characterizations of oligopyridyl foldamers, α-helix mimetics.

    abstract::Protein-protein interactions are central to many biological processes, from intracellular communication to cytoskeleton assembly, and therefore represent an important class of targets for new therapeutics. The most common secondary structure in natural proteins is an α-helix. Small molecules seem to be attractive cand...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200424a

    authors: Santos JS,Voisin-Chiret AS,Burzicki G,Sebaoun L,Sebban M,Lohier JF,Legay R,Oulyadi H,Bureau R,Rault S

    更新日期:2012-02-27 00:00:00

  • FOG: Fragment Optimized Growth algorithm for the de novo generation of molecules occupying druglike chemical space.

    abstract::An essential feature of all practical de novo molecule generating programs is the ability to focus the potential combinatorial explosion of grown molecules on a desired chemical space. It is a daunting task to balance the generation of new molecules with limitations on growth that produce desired features such as stab...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci9000458

    authors: Kutchukian PS,Lou D,Shakhnovich EI

    更新日期:2009-07-01 00:00:00

  • Identification of ligand templates using local structure alignment for structure-based drug design.

    abstract::With a rapid increase in the number of high-resolution protein-ligand structures, the known protein-ligand structures can be used to gain insight into ligand-binding modes in a target protein. On the basis of the fact that the structurally similar binding sites share information about their ligands, we have developed ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300178e

    authors: Lee HS,Im W

    更新日期:2012-10-22 00:00:00

  • Pharmacophore identification, in silico screening, and virtual library design for inhibitors of the human factor Xa.

    abstract::Factor Xa inhibitors are innovative anticoagulant agents that provide a better safety/efficacy profile compared to other anticoagulative drugs. A chemical feature-based modeling approach was applied to identify crucial pharmacophore patterns from 3D crystal structures of inhibitors bound to human factor Xa (Pdb entrie...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci049778k

    authors: Krovat EM,Frühwirth KH,Langer T

    更新日期:2005-01-01 00:00:00

  • Determining the validity of a QSAR model--a classification approach.

    abstract::The determination of the validity of a QSAR model when applied to new compounds is an important concern in the field of QSAR and QSPR modeling. Various scoring techniques can be applied to specific types of models. We present a technique with which we can state whether a new compound will be well predicted by a previo...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0497511

    authors: Guha R,Jurs PC

    更新日期:2005-01-01 00:00:00

  • BFMP: a method for discretizing and visualizing pyranose conformations.

    abstract::We report a new classification method for pyranose ring conformations called Best-fit, Four-Membered Plane (BFMP), which describes pyranose ring conformations based on reference planes defined by four atoms. The method is able to characterize all asymmetrical and symmetrical shapes of a pyran ring, is readily automate...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500325b

    authors: Makeneni S,Foley BL,Woods RJ

    更新日期:2014-10-27 00:00:00

  • Combining 3-D quantitative structure-activity relationship with ligand based and structure based alignment procedures for in silico screening of new hepatitis C virus NS5B polymerase inhibitors.

    abstract::The viral NS5B RNA-dependent RNA-polymerase (RdRp) is one of the best-studied and promising targets for the development of novel therapeutics against hepatitis C virus (HCV). Allosteric inhibition of this enzyme has emerged as a viable strategy toward blocking replication of viral RNA in cell based systems. Herein, we...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci9004749

    authors: Musmuca I,Caroli A,Mai A,Kaushik-Basu N,Arora P,Ragno R

    更新日期:2010-04-26 00:00:00

  • FragPELE: Dynamic Ligand Growing within a Binding Site. A Novel Tool for Hit-To-Lead Drug Design.

    abstract::The early stages of drug discovery rely on hit-to-lead programs, where initial hits undergo partial optimization to improve binding affinities for their biological target. This is an expensive and time-consuming process, requiring multiple iterations of trial and error designs, an ideal scenario for applying computer ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00938

    authors: Perez C,Soler D,Soliva R,Guallar V

    更新日期:2020-03-23 00:00:00

  • Radial clustergrams: visualizing the aggregate properties of hierarchical clusters.

    abstract::A new radial space-filling method for visualizing cluster hierarchies is presented. The method, referred to as a radial clustergram, arranges the clusters into a series of layers, each representing a different level of the tree. It uses adjacency of nodes instead of links to represent parent-child relationships and al...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci600427x

    authors: Agrafiotis DK,Bandyopadhyay D,Farnum M

    更新日期:2007-01-01 00:00:00

  • Improved Scaffold Hopping in Ligand-Based Virtual Screening Using Neural Representation Learning.

    abstract::Deep learning has demonstrated significant potential in advancing state of the art in many problem domains, especially those benefiting from automated feature extraction. Yet, the methodology has seen limited adoption in the field of ligand-based virtual screening (LBVS) as traditional approaches typically require lar...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00622

    authors: Stojanović L,Popović M,Tijanić N,Rakočević G,Kalinić M

    更新日期:2020-10-26 00:00:00