Ensemble feature selection: consistent descriptor subsets for multiple QSAR models.

Abstract:

:Selecting a small subset of descriptors from a large pool to build a predictive quantitative structure-activity relationship (QSAR) model is an important step in the QSAR modeling process. In general, subset selection is very hard to solve, even approximately, with guaranteed performance bounds. Traditional approaches employ deterministic or stochastic methods to obtain a descriptor subset that leads to an optimal model of a single type (such as linear regression or a neural network). With the development of ensemble modeling approaches, multiple models of differing types are individually developed resulting in different descriptor subsets for each model type. However, it is advantageous, from the point of view of developing interpretable QSAR models, to have a single set of descriptors that can be used for different model types. In this paper, we describe an approach to the selection of a single, optimal, subset of descriptors for multiple model types. We apply this approach to three data sets, covering both regression and classification, and show that the constraint of forcing different model types to use the same set of descriptors does not lead to a significant loss in predictive ability for the individual models considered. In addition, interpretations of the individual models developed using this approach indicate that they encode similar structure-activity trends.

journal_name

J Chem Inf Model

authors

Dutta D,Guha R,Wild D,Chen T

doi

10.1021/ci600563w

subject

Has Abstract

pub_date

2007-05-01 00:00:00

pages

989-97

issue

3

eissn

1549-9596

issn

1549-960X

journal_volume

47

pub_type

杂志文章
  • LiCABEDS II. Modeling of ligand selectivity for G-protein-coupled cannabinoid receptors.

    abstract::The cannabinoid receptor subtype 2 (CB2) is a promising therapeutic target for blood cancer, pain relief, osteoporosis, and immune system disease. The recent withdrawal of Rimonabant, which targets another closely related cannabinoid receptor (CB1), accentuates the importance of selectivity for the development of CB2 ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci3003914

    authors: Ma C,Wang L,Yang P,Myint KZ,Xie XQ

    更新日期:2013-01-28 00:00:00

  • Heterogeneous Dielectric Implicit Membrane Model for the Calculation of MMPBSA Binding Free Energies.

    abstract::Membrane-bound protein receptors are a primary biological drug target, but the computational analysis of membrane proteins has been limited. In order to improve molecular mechanics Poisson-Boltzmann surface area (MMPBSA) binding free energy calculations for membrane protein-ligand systems, we have optimized a new hete...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00363

    authors: Greene D,Qi R,Nguyen R,Qiu T,Luo R

    更新日期:2019-06-24 00:00:00

  • Evaluating Unexpectedly Short Non-covalent Distances in X-ray Crystal Structures of Proteins with Electronic Structure Analysis.

    abstract::We investigate unexpectedly short non-covalent distances (<85% of the sum of van der Waals radii) in X-ray crystal structures of proteins. We curate over 11 000 high-quality protein crystal structures and an ultra-high-resolution (1.2 Å or better) subset containing >900 structures. Although our non-covalent distance c...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00144

    authors: Qi HW,Kulik HJ

    更新日期:2019-05-28 00:00:00

  • H274Y's Effect on Oseltamivir Resistance: What Happens Before the Drug Enters the Binding Site.

    abstract::Increased reports of oseltamivir (OTV)-resistant strains of the influenza virus, such as the H274Y mutation on its neuraminidase (NA), have created some cause for concern. Many studies have been conducted in the attempt to uncover the mechanism of OTV resistance in H274Y NA. However, most of the reported studies on H2...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00331

    authors: Yusuf M,Mohamed N,Mohamad S,Janezic D,Damodaran KV,Wahab HA

    更新日期:2016-01-25 00:00:00

  • Protein Solvent Shell Structure Provides Rapid Analysis of Hydration Dynamics.

    abstract::The solvation layer surrounding a protein is clearly an intrinsic part of protein structure-dynamics-function, and our understanding of how the hydration dynamics influences protein function is emerging. We have recently reported simulations indicating a correlation between regional hydration dynamics and the structur...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00009

    authors: Dahanayake JN,Shahryari E,Roberts KM,Heikes ME,Kasireddy C,Mitchell-Koch KR

    更新日期:2019-05-28 00:00:00

  • Improving protocols for protein mapping through proper comparison to crystallography data.

    abstract::Computational approaches to fragment-based drug design (FBDD) can complement experiments and facilitate the identification of potential hot spots along the protein surface. However, the evaluation of computational methods for mapping binding sites frequently focuses upon the ability to reproduce crystallographic coord...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300430v

    authors: Lexa KW,Carlson HA

    更新日期:2013-02-25 00:00:00

  • RED: a set of molecular descriptors based on Renyi entropy.

    abstract::New molecular descriptors, RED (Renyi entropy descriptors), based on the generalized entropies introduced by Renyi are presented. Topological descriptors based on molecular features have proven to be useful for describing molecular profiles. Renyi entropy is used as a variability measure to contract a feature-pair dis...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci900275w

    authors: Delgado-Soler L,Toral R,Tomás MS,Rubio-Martinez J

    更新日期:2009-11-01 00:00:00

  • FOG: Fragment Optimized Growth algorithm for the de novo generation of molecules occupying druglike chemical space.

    abstract::An essential feature of all practical de novo molecule generating programs is the ability to focus the potential combinatorial explosion of grown molecules on a desired chemical space. It is a daunting task to balance the generation of new molecules with limitations on growth that produce desired features such as stab...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci9000458

    authors: Kutchukian PS,Lou D,Shakhnovich EI

    更新日期:2009-07-01 00:00:00

  • Ligand binding determinants for angiotensin II type 1 receptor from computer simulations.

    abstract::The ligand binding determinants for the angiotensin II type 1 receptor (AT1R), a G protein-coupled receptor (GPCR), have been characterized by means of computer simulations. As a first step, a pharmacophore model of various known AT1R ligands exhibiting a wide range of binding affinities was generated. Second, a struc...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400400m

    authors: Matsoukas MT,Cordomí A,Ríos S,Pardo L,Tselios T

    更新日期:2013-11-25 00:00:00

  • Reaction site mapping of xenobiotic biotransformations.

    abstract::Predictive metabolism methods can be used in drug discovery projects to enhance the understanding of structure-metabolism relationships. The present study uses data mining methods to exploit biotransformation data that have been recorded in the MDL Metabolite database. Reacting center fingerprints were derived from a ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci600376q

    authors: Boyer S,Arnby CH,Carlsson L,Smith J,Stein V,Glen RC

    更新日期:2007-03-01 00:00:00

  • Discovery of New SIRT2 Inhibitors by Utilizing a Consensus Docking/Scoring Strategy and Structure-Activity Relationship Analysis.

    abstract::SIRT2, which is a NAD+ (nicotinamide adenine dinucleotide) dependent deacetylase, has been demonstrated to play an important role in the occurrence and development of a variety of diseases such as cancer, ischemia-reperfusion, and neurodegenerative diseases. Small molecule inhibitors of SIRT2 are thought to be potenti...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00714

    authors: Huang S,Song C,Wang X,Zhang G,Wang Y,Jiang X,Sun Q,Huang L,Xiang R,Hu Y,Li L,Yang S

    更新日期:2017-04-24 00:00:00

  • SERAPhiC: a benchmark for in silico fragment-based drug design.

    abstract::Our main objective was to compile a data set of high-quality protein-fragment complexes and make it publicly available. Once assembled, the data set was challenged using docking procedures to address the following questions: (i) Can molecular docking correctly reproduce the experimentally solved structures? (ii) How t...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci2003363

    authors: Favia AD,Bottegoni G,Nobeli I,Bisignano P,Cavalli A

    更新日期:2011-11-28 00:00:00

  • Improved Chemical Structure-Activity Modeling Through Data Augmentation.

    abstract::Extending the original training data with simulated unobserved data points has proven powerful to increase both the generalization ability of predictive models and their robustness against changes in the structure of data (e.g., systematic drifts in the response variable) in diverse areas such as the analysis of spect...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00570

    authors: Cortes-Ciriano I,Bender A

    更新日期:2015-12-28 00:00:00

  • Searching for New Leads To Treat Epilepsy: Target-Based Virtual Screening for the Discovery of Anticonvulsant Agents.

    abstract::The purpose of this investigation is to contribute to the development of new anticonvulsant drugs to treat patients with refractory epilepsy. We applied a virtual screening protocol that involved the search into molecular databases of new compounds and known drugs to find small molecules that interact with the open co...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00721

    authors: Palestro PH,Enrique N,Goicoechea S,Villalba ML,Sabatier LL,Martin P,Milesi V,Bruno Blanch LE,Gavernet L

    更新日期:2018-07-23 00:00:00

  • RASA: a rapid retrosynthesis-based scoring method for the assessment of synthetic accessibility of drug-like molecules.

    abstract::In this account, a rapid retrosynthesis-based scoring method for the assessment of synthetic accessibility of drug-like molecules, called RASA (Retrosynthesis-based Assessment of Synthetic Accessibility) is devised. RASA first constructs a synthesis tree for the target molecule based on retrosynthetic analysis; in thi...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci100216g

    authors: Huang Q,Li LL,Yang SY

    更新日期:2011-10-24 00:00:00

  • Get Your Atoms in Order--An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm.

    abstract::Finding a canonical ordering of the atoms in a molecule is a prerequisite for generating a unique representation of the molecule. The canonicalization of a molecule is usually accomplished by applying some sort of graph relaxation algorithm, the most common of which is the Morgan algorithm. There are known issues with...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00543

    authors: Schneider N,Sayle RA,Landrum GA

    更新日期:2015-10-26 00:00:00

  • Comparison of Implicit and Explicit Solvation Models for Iota-Cyclodextrin Conformation Analysis from Replica Exchange Molecular Dynamics.

    abstract::Large ring cyclodextrins have become increasingly important for drug delivery applications. In this work, we have performed replica-exchange molecular dynamics simulations using both implicit and explicit water solvation models to study the conformational diversity of iota-cyclodextrin containing 14 α-1,4 glycosidic l...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00595

    authors: Khuntawee W,Kunaseth M,Rungnim C,Intagorn S,Wolschann P,Kungwan N,Rungrotmongkol T,Hannongbua S

    更新日期:2017-04-24 00:00:00

  • Use of 3D QSAR models for database screening: a feasibility study.

    abstract::The applicability and scope of 3D QSAR methods (CoMFA, CoMSIA) to screen databases are examined. A protocol requiring minimal user intervention has been established to align training and test set molecules using FlexS. As model system isozymes of human carbonic anhydrase (hCA) are used, all results are exemplified stu...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci7002945

    authors: Hillebrecht A,Klebe G

    更新日期:2008-02-01 00:00:00

  • Secondary structure characterization based on amino acid composition and availability in proteins.

    abstract::The importance of thorough analyses of the secondary structures in proteins as basic structural units cannot be overemphasized. Although recent computational methods have achieved reasonably high accuracy for predicting secondary structures from amino acid sequences, a simple and fundamental empirical approach to char...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci900452z

    authors: Otaki JM,Tsutsumi M,Gotoh T,Yamamoto H

    更新日期:2010-04-26 00:00:00

  • Binding Residence Time through Scaled Molecular Dynamics: A Prospective Application to hDAAO Inhibitors.

    abstract::Traditionally, a drug potency is expressed in terms of thermodynamic quantities, mostly Kd, and empirical IC50 values. Although binding affinity as an estimate of drug activity remains relevant, it is increasingly clear that it is also important to include (un)binding kinetic parameters in the characterization of pote...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00518

    authors: Bernetti M,Rosini E,Mollica L,Masetti M,Pollegioni L,Recanatini M,Cavalli A

    更新日期:2018-11-26 00:00:00

  • How to Model Inter- and Intramolecular Hydrogen Bond Strengths with Quantum Chemistry.

    abstract::This article presents the computation of both inter- and intramolecular hydrogen bond strengths from first-principles. Quantum chemical calculations conducted at the dispersion-corrected density functional theory level including free energy and solvation contributions are conducted for (i) one-to-one hydrogen-bonded c...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00132

    authors: Bauer CA

    更新日期:2019-09-23 00:00:00

  • Building Graphs To Describe Dynamics, Kinetics, and Energetics in the d-ALa:d-Lac Ligase VanA.

    abstract::The d-Ala:d-Lac ligase, VanA, plays a critical role in the resistance of vancomycin. Indeed, it is involved in the synthesis of a peptidoglycan precursor, to which vancomycin cannot bind. The reaction catalyzed by VanA requires the opening of the so-called "ω-loop", so that the substrates can enter the active site. He...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00211

    authors: Duclert-Savatier N,Bouvier G,Nilges M,Malliavin TE

    更新日期:2016-09-26 00:00:00

  • The ensemble performance index: an improved measure for assessing ensemble pose prediction performance.

    abstract::We present a theoretical study on the performance of ensemble docking methodologies considering multiple protein structures. We perform a theoretical analysis of pose prediction experiments which is completely unbiased, as we make no assumptions about specific scoring functions, search paradigms, protein structures, o...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci2002796

    authors: Korb O,McCabe P,Cole J

    更新日期:2011-11-28 00:00:00

  • Structure-based CoMFA as a predictive model - CYP2C9 inhibitors as a test case.

    abstract::In this study, we tried to establish a general scheme to create a model that could predict the affinity of small compounds to their target proteins. This scheme consists of a search for ligand-binding sites on a protein, a generation of bound conformations (poses) of ligands in each of the sites by docking, identifica...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci800313h

    authors: Yasuo K,Yamaotsu N,Gouda H,Tsujishita H,Hirono S

    更新日期:2009-04-01 00:00:00

  • Retrospect and Prospect of Single Particle Cryo-Electron Microscopy: The Class of Integral Membrane Proteins as an Example.

    abstract::A giant technological leap in the field of cryo-electron microscopy (cryo-EM) has assured the achievement of near-atomic resolution structures of biological macromolecules. As a recognition of this accomplishment, the Nobel Prize in Chemistry was awarded in 2017 to Jacques Dubochet, Joachim Frank, and Richard Henderso...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b01015

    authors: Akbar S,Mozumder S,Sengupta J

    更新日期:2020-05-26 00:00:00

  • The valence state combination model: a generic framework for handling tautomers and protonation states.

    abstract::The consistent handling of molecules is probably the most basic and important requirement in the field of cheminformatics. Reliable results can only be obtained if the underlying calculations are independent of the specific way molecules are represented in the input data. However, ensuring consistency is a complex tas...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400724v

    authors: Urbaczek S,Kolodzik A,Rarey M

    更新日期:2014-03-24 00:00:00

  • Prediction of the Favorable Hydration Sites in a Protein Binding Pocket and Its Application to Scoring Function Formulation.

    abstract::The important role of water molecules in protein-ligand binding energetics has attracted wide attention in recent years. A range of computational methods has been developed to predict the favorable locations of water molecules in a protein binding pocket. Most of the current methods are based on extensive molecular dy...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00619

    authors: Li Y,Gao Y,Holloway MK,Wang R

    更新日期:2020-09-28 00:00:00

  • Supervised self-organizing maps in drug discovery. 2. Improvements in descriptor selection and model validation.

    abstract::The modeling of nonlinear descriptor-target relationships is a topic of considerable interest in drug discovery. We, herein, continue reporting the use of the self-organizing map-a nonlinear, topology-preserving pattern recognition technique that exhibits considerable promise in modeling and decoding these relationshi...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0500841

    authors: Xiao YD,Harris R,Bayram E,Ii PS,Schmitt JD

    更新日期:2006-01-01 00:00:00

  • Periodic cages.

    abstract::Various cages are constructed by using three types of caps: f-cap (derived from spherical fullerenes by deleting zones of various size), kf-cap (obtainable by cutting off the polar ring, of size k), and t-cap ("tubercule"-cap). Building ways are presented, some of them being possible isomerization routes in the real c...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci049738g

    authors: Diudea MV,Nagy CL,Silaghi-Dumitrescu I,Graovac A,Janezic D,Vikić-Topić D

    更新日期:2005-03-01 00:00:00

  • Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets.

    abstract::With the emergence of large collections of protein-ligand complexes complemented by binding data, as found in PDBbind or BindingMOAD, new opportunities for parametrizing and evaluating scoring functions have arisen. With huge data collections available, it becomes feasible to fit scoring functions in a QSAR style, i.e...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci100264e

    authors: Kramer C,Gedeck P

    更新日期:2010-11-22 00:00:00