Study of Data Set Modelability: Modelability, Rivality, and Weighted Modelability Indexes.

Abstract:

:The knowledge of the capacity of a data set to be modeled in the first stages of the building of quantitative structure-activity relationship (QSAR) prediction models is an important issue because it might reduce the effort and time necessary to select or reject data sets and in refining the data set's composition. The modelability index (MODI) is based on the counting of the first nearest neighbor belonging to the molecules of the data set and is a standardized measurement assumed in the QSAR community. In this paper, we revisit the calculation of the modelability index, proposing a more formal formulation that extends the calculation to the first nearest neighbors that belong to each existing class in the data set. In addition, this new formulation allows the calculation of the rivality index, as a measurement of the presence of correctly classifiable molecules and activity cliffs. By weighting the rivality index considering the cardinality of the neighborhood of each molecule of the data set, the calculated weighted modelability index is highly correlated with the correct classification rate (QSAR_CCR) obtained in the building of QSAR models using different classification algorithms. The results obtained with the weighted modelability index show correlations of r2 higher than 0.9, slopes close to 1, and bias close to zero for different algorithms.

journal_name

J Chem Inf Model

authors

Luque Ruiz I,Gómez-Nieto MÁ

doi

10.1021/acs.jcim.8b00188

subject

Has Abstract

pub_date

2018-09-24 00:00:00

pages

1798-1814

issue

9

eissn

1549-9596

issn

1549-960X

journal_volume

58

pub_type

杂志文章
  • Molecular Dynamics Simulation of the Conformational Preferences of Pseudouridine Derivatives: Improving the Distribution in the Glycosidic Torsion Space.

    abstract::There are only four derivatives of pseudouridine (Ψ) that are known to occur naturally in RNA as post-transcriptional modifications. We have studied the conformational consequences of pseudouridylation and further modifications using replica exchange molecular dynamics simulations at the nucleoside level, and the simu...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00369

    authors: Dutta N,Sarzynska J,Lahiri A

    更新日期:2020-10-26 00:00:00

  • Structure-Based Discovery of 1H-Indazole-3-carboxamides as a Novel Structural Class of Human GSK-3 Inhibitors.

    abstract::An in silico screening procedure was performed to select new inhibitors of glycogen synthase kinase 3β (GSK-3β), a serine/threonine protein kinase that in the last two decades has emerged as a key target in drug discovery, having been implicated in multiple cellular processes and linked with the pathogenesis of severa...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00486

    authors: Ombrato R,Cazzolla N,Mancini F,Mangano G

    更新日期:2015-12-28 00:00:00

  • Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets.

    abstract::With the emergence of large collections of protein-ligand complexes complemented by binding data, as found in PDBbind or BindingMOAD, new opportunities for parametrizing and evaluating scoring functions have arisen. With huge data collections available, it becomes feasible to fit scoring functions in a QSAR style, i.e...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci100264e

    authors: Kramer C,Gedeck P

    更新日期:2010-11-22 00:00:00

  • Machine Learning Enhanced Spectrum Recognition Based on Computer Vision (SRCV) for Intelligent NMR Data Extraction.

    abstract::A machine learning enhanced spectrum recognition system called spectrum recognition based on computer vision (SRCV) for data extraction from previously analyzed 13C and 1H NMR spectra has been developed. The intelligent system was designed with four function modules to extract data from three areas of NMR images, incl...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c01046

    authors: Jia W,Yang Z,Yang M,Cheng L,Lei Z,Wang X

    更新日期:2021-01-25 00:00:00

  • Computational Prediction and Biochemical Analyses of New Inverse Agonists for the CB1 Receptor.

    abstract::Human cannabinoid type 1 (CB1) G-protein coupled receptor is a potential therapeutic target for obesity. The previously predicted and experimentally validated ensemble of ligand-free conformations of CB1 [Scott, C. E. et al. Protein Sci. 2013 , 22 , 101 - 113 ; Ahn, K. H. et al. Proteins 2013 , 81 , 1304 - 1317] are u...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00581

    authors: Scott CE,Ahn KH,Graf ST,Goddard WA 3rd,Kendall DA,Abrol R

    更新日期:2016-01-25 00:00:00

  • Comments on the article "Evaluation of pK(a) estimation methods on 211 druglike compounds".

    abstract::The recent article "Evaluation of pK(a) Estimation Methods on 211 Druglike Compounds" ( Manchester, J.; et al. J. Chem Inf. Model. 2010, 50, 565-571 ) reports poor results for the program Epik. Here, we highlight likely sources for the poor performance and describe work done to improve the performance. Running Epik in...

    journal_title:Journal of chemical information and modeling

    pub_type: 评论,杂志文章

    doi:10.1021/ci100332m

    authors: Shelley JC,Calkins D,Sullivan AP

    更新日期:2011-01-24 00:00:00

  • Searching for coordinated activity cliffs using particle swarm optimization.

    abstract::Activity cliffs are formed by structurally similar compounds having large potency differences. Coordinated activity cliffs evolve when compounds within groups of structural neighbors form multiple cliffs with different partners, giving rise to local networks of cliffs in a data set. Using particle swarm optimization, ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci3000503

    authors: Namasivayam V,Bajorath J

    更新日期:2012-04-23 00:00:00

  • Underestimated Halogen Bonds Forming with Protein Backbone in Protein Data Bank.

    abstract::Halogen bonds (XBs) are attracting increasing attention in biological systems. Protein Data Bank (PDB) archives experimentally determined XBs in biological macromolecules. However, no software for structure refinement in X-ray crystallography takes into account XBs, which might result in the weakening or even vanishin...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00235

    authors: Zhang Q,Xu Z,Shi J,Zhu W

    更新日期:2017-07-24 00:00:00

  • Predictive models for cytochrome p450 isozymes based on quantitative high throughput screening data.

    abstract::The human cytochrome P450 (CYP450) isozymes are the most important enzymes in the body to metabolize many endogenous and exogenous substances including environmental toxins and therapeutic drugs. Any unnecessary interactions between a small molecule and CYP450 isozymes may raise a potential to disarm the integrity of ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200311w

    authors: Sun H,Veith H,Xia M,Austin CP,Huang R

    更新日期:2011-10-24 00:00:00

  • Evaluation of different virtual screening programs for docking in a charged binding pocket.

    abstract::Virtual screening of small molecules against a protein target often identifies the correct pose, but the ranking in terms of binding energy remains a difficult problem, resulting in unacceptable numbers of false positives and negatives. To investigate this problem, the performance of three docking programs, FRED, QXP/...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci800154w

    authors: Deng W,Verlinde CL

    更新日期:2008-10-01 00:00:00

  • Holistic Approach to Partial Covalent Interactions in Protein Structure Prediction and Design with Rosetta.

    abstract::Partial covalent interactions (PCIs) in proteins, which include hydrogen bonds, salt bridges, cation-π, and π-π interactions, contribute to thermodynamic stability and facilitate interactions with other biomolecules. Several score functions have been developed within the Rosetta protein modeling framework that identif...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00398

    authors: Combs SA,Mueller BK,Meiler J

    更新日期:2018-05-29 00:00:00

  • Molecular Modeling Investigation of the Interaction between Humicola insolens Cutinase and SDS Surfactant Suggests a Mechanism for Enzyme Inactivation.

    abstract::One of the largest commercial applications of enzymes and surfactants is as main components in modern detergents. The high concentration of surfactant compounds usually present in detergents can, however, negatively affect the enzymatic activity. To remedy this drawback, it is of great importance to characterize the i...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00857

    authors: Kjølbye LR,Laustsen A,Vestergaard M,Periole X,De Maria L,Svendsen A,Coletta A,Schiøtt B

    更新日期:2019-05-28 00:00:00

  • Effect of input differences on the results of docking calculations.

    abstract::The sensitivity of docking calculations to the geometry of the input ligand was studied. It was found that even small changes in the ligand input conformation can lead to large differences in the geometries and scores of the resulting docked poses. The accuracy of docked poses produced from different ligand input stru...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci9000629

    authors: Feher M,Williams CI

    更新日期:2009-07-01 00:00:00

  • Sharing Data from Molecular Simulations.

    abstract::Given the need for modern researchers to produce open, reproducible scientific output, the lack of standards and best practices for sharing data and workflows used to produce and analyze molecular dynamics (MD) simulations has become an important issue in the field. There are now multiple well-established packages to ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00665

    authors: Abraham M,Apostolov R,Barnoud J,Bauer P,Blau C,Bonvin AMJJ,Chavent M,Chodera J,Čondić-Jurkić K,Delemotte L,Grubmüller H,Howard RJ,Jordan EJ,Lindahl E,Ollila OHS,Selent J,Smith DGA,Stansfeld PJ,Tiemann JKS,Trellet M

    更新日期:2019-10-28 00:00:00

  • Linear and nonlinear 3D-QSAR approaches in tandem with ligand-based homology modeling as a computational strategy to depict the pyrazolo-triazolo-pyrimidine antagonists binding site of the human adenosine A2A receptor.

    abstract::The integration of ligand- and structure-based strategies might sensitively increase the success of drug discovery process. We have recently described the application of Molecular Electrostatic Potential autocorrelated vectors (autoMEPs) in generating both linear (Partial Least-Square, PLS) and nonlinear (Response Sur...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700300w

    authors: Michielan L,Bacilieri M,Schiesaro A,Bolcato C,Pastorin G,Spalluto G,Cacciari B,Klotz KN,Kaseda C,Moro S

    更新日期:2008-02-01 00:00:00

  • Effect of data standardization on chemical clustering and similarity searching.

    abstract::Standardization is used to ensure that the variables in a similarity calculation make an equal contribution to the computed similarity value. This paper compares the use of seven different methods that have been suggested previously for the standardization of integer-valued or real-valued data, comparing the results w...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci800224h

    authors: Chu CW,Holliday JD,Willett P

    更新日期:2009-02-01 00:00:00

  • Multidimensional Drift of Sequence Attributes and Functional Profiles in the Superfamily of the Three-Finger Proteins and Their Structural Homologues.

    abstract::Functional diversity of the three-finger-protein domain (TFPD) had been acquired via hypervariability of some sequence positions and extensive insertion/deletion of short AA-segments that caused multidimensional drift of several sequence attributes such as the overall (HI) and local hydrophobicity levels, the isoelect...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00322

    authors: Galat A

    更新日期:2015-09-28 00:00:00

  • Spatial sign preprocessing: a simple way to impart moderate robustness to multivariate estimators.

    abstract::The spatial sign is a multivariate extension of the concept of sign. Recently multivariate estimators of covariance structures based on spatial signs have been examined by various authors. These new estimators are found to be robust to outlying observations. From a computational point of view, estimators based on spat...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci050498u

    authors: Serneels S,De Nolf E,Van Espen PJ

    更新日期:2006-05-01 00:00:00

  • Computational Insight Into the Mechanism of SARS-CoV-2 Membrane Fusion.

    abstract::Membrane fusion, a key step in the early stages of virus propagation, allows the release of the viral genome in the host cell cytoplasm. The process is initiated by fusion peptides that are small, hydrophobic components of viral membrane-embedded glycoproteins and are typically conserved within virus families. Here, w...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c01231

    authors: Borkotoky S,Dey D,Banerjee M

    更新日期:2021-01-25 00:00:00

  • Visualization of Solar Cell Library Space by Dimensionality Reduction Methods.

    abstract::Visualizing high-dimensional data by projecting them into a two- or three-dimensional space is a popular approach in many scientific fields, including computer-aided drug design and cheminformatics. In contrast, dimensionality reduction techniques have been far less explored for materials informatics. Nevertheless, si...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00552

    authors: Kaspi O,Yosipof A,Senderowitz H

    更新日期:2018-12-24 00:00:00

  • Homology model-guided 3D-QSAR studies of HIV-1 integrase inhibitors.

    abstract::In the present study, we report the exploration of binding modes of potent HIV-1 integrase (IN) inhibitors MK-0518 (raltegravir) and GS-9137 (elvitegravir) as well as chalcone and related amide IN inhibitors we recently synthesized and the development of 3D-QSAR models for integrase inhibition. Homology models of DNA-...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200485a

    authors: Sharma H,Cheng X,Buolamwini JK

    更新日期:2012-02-27 00:00:00

  • Transplant-insert-constrain-relax-assemble (TICRA): protein-ligand complex structure modeling and application to kinases.

    abstract::We introduce TICRA (transplant-insert-constrain-relax-assemble), a method for modeling the structure of unknown protein-ligand complexes using the X-ray crystal structures of homologous proteins and ligands with known activity. We present results from modeling the structures of protein kinase-inhibitor complexes using...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci100256u

    authors: Meshkat S,Klon AE,Zou J,Wiseman JS,Konteatis Z

    更新日期:2011-01-24 00:00:00

  • Trans and Cis Conformations of the Antihypertensive Drug Valsartan Respectively Lock the Inactive and Active-like States of Angiotensin II Type 1 Receptor: A Molecular Dynamics Study.

    abstract::Angiotensin II type 1 receptor (AT1R) is the principal regulator of blood pressure in humans. The overactivation of AT1R by the stimulation of angiotensin II would result in high blood pressure. To prevent hypertension, nonpeptide "sartan" drugs, such as valsartan (VST), have been developed to competitively block the ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00364

    authors: Wang L,Yan F

    更新日期:2018-10-22 00:00:00

  • Partitioning of Benzoic Acid into 1,2-Dimyristoyl-sn-glycero-3-phosphocholine and Blood-Brain Barrier Mimetic Bilayers.

    abstract::Using an all-atom explicit water model and replica exchange umbrella sampling simulations, we investigated the molecular mechanisms of benzoic acid partitioning into two model lipid bilayers. The first was formed of 1,2-dimyristoyl-sn-glycero-3-phosphocholine (DMPC) lipids, whereas the second was composed of an equimo...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00590

    authors: Siwy CM,Delfing BM,Smith AK,Klimov DK

    更新日期:2020-08-24 00:00:00

  • Structure-activity relationships in non-ligand binding pocket (non-LBP) diarylhydrazide antiandrogens.

    abstract::We report the synthesis and a study of the structure-activity relationships of a new series of diarylhydrazides as potential selective non-ligand binding pocket androgen receptor antagonists. Their biological activity as antiandrogens in the context of the development of treatments for castration resistant prostate ca...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400189m

    authors: Caboni L,Egan B,Kelly B,Blanco F,Fayne D,Meegan MJ,Lloyd DG

    更新日期:2013-08-26 00:00:00

  • Template CoMFA: the 3D-QSAR Grail?

    abstract::Template CoMFA, a novel alignment methodology for training or test set structures in 3D-QSAR, is introduced. Its two most significant advantages are its complete automation and its ability to derive a single combined model from multiple structural series affecting a biological target. Its only two inputs are one or mo...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400696v

    authors: Cramer RD,Wendt B

    更新日期:2014-02-24 00:00:00

  • The molecular basis for the selectivity of tadalafil toward phosphodiesterase 5 and 6: a modeling study.

    abstract::Great attention has been paid to the clinical significance of phosphodiesterase 5 (PDE5) inhibitors, such as sildenafil, tadalafil, and vardenafil widely used for erectile dysfunction. However, sildenafil causes side effects on visual functions since it shows similar potencies to inhibit PDE5 and PDE6, whereas tadalaf...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400458z

    authors: Huang YY,Li Z,Cai YH,Feng LJ,Wu Y,Li X,Luo HB

    更新日期:2013-11-25 00:00:00

  • Identification of Enzyme Genes Using Chemical Structure Alignments of Substrate-Product Pairs.

    abstract::Although there are several databases that contain data on many metabolites and reactions in biochemical pathways, there is still a big gap in the numbers between experimentally identified enzymes and metabolites. It is supposed that many catalytic enzyme genes are still unknown. Although there are previous studies tha...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00216

    authors: Moriya Y,Yamada T,Okuda S,Nakagawa Z,Kotera M,Tokimatsu T,Kanehisa M,Goto S

    更新日期:2016-03-28 00:00:00

  • Flux (2): comparison of molecular mutation and crossover operators for ligand-based de novo design.

    abstract::We implemented a fragment-based de novo design algorithm for a population-based optimization of molecular structures. The concept is grounded on an evolution strategy with mutation and crossover operators for structure breeding. Molecular building blocks were obtained from the pseudo-retrosynthesis of a collection of ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci6005307

    authors: Fechner U,Schneider G

    更新日期:2007-03-01 00:00:00

  • SARANEA: a freely available program to mine structure-activity and structure-selectivity relationship information in compound data sets.

    abstract::We introduce SARANEA, an open-source Java application for interactive exploration of structure-activity relationship (SAR) and structure-selectivity relationship (SSR) information in compound sets of any source. SARANEA integrates various SAR and SSR analysis functions and utilizes a network-like similarity graph data...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci900416a

    authors: Lounkine E,Wawer M,Wassermann AM,Bajorath J

    更新日期:2010-01-01 00:00:00