Ensemble feature selection: consistent descriptor subsets for multiple QSAR models.

Abstract:

:Selecting a small subset of descriptors from a large pool to build a predictive quantitative structure-activity relationship (QSAR) model is an important step in the QSAR modeling process. In general, subset selection is very hard to solve, even approximately, with guaranteed performance bounds. Traditional approaches employ deterministic or stochastic methods to obtain a descriptor subset that leads to an optimal model of a single type (such as linear regression or a neural network). With the development of ensemble modeling approaches, multiple models of differing types are individually developed resulting in different descriptor subsets for each model type. However, it is advantageous, from the point of view of developing interpretable QSAR models, to have a single set of descriptors that can be used for different model types. In this paper, we describe an approach to the selection of a single, optimal, subset of descriptors for multiple model types. We apply this approach to three data sets, covering both regression and classification, and show that the constraint of forcing different model types to use the same set of descriptors does not lead to a significant loss in predictive ability for the individual models considered. In addition, interpretations of the individual models developed using this approach indicate that they encode similar structure-activity trends.

journal_name

J Chem Inf Model

authors

Dutta D,Guha R,Wild D,Chen T

doi

10.1021/ci600563w

subject

Has Abstract

pub_date

2007-05-01 00:00:00

pages

989-97

issue

3

eissn

1549-9596

issn

1549-960X

journal_volume

47

pub_type

杂志文章
  • Structural and Functional Characterization of Allatostatin Receptor Type-C of Thaumetopoea pityocampa, a Potential Target for Next-Generation Pest Control Agents.

    abstract::Insect neuropeptide receptors, including allatostatin receptor type C (AstR-C), a G protein-coupled receptor, are among the potential targets for designing next-generation pesticides that despite their importance in offering a new mode-of-action have been overlooked. Focusing on AstR-C of Thaumetopoea pityocampa, a co...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00985

    authors: Shahraki A,Işbilir A,Dogan B,Lohse MJ,Durdagi S,Birgul-Iyison N

    更新日期:2021-01-21 00:00:00

  • Prediction and Experimental Confirmation of Novel Peripheral Cannabinoid-1 Receptor Antagonists.

    abstract::Small molecules targeting peripheral CB1 receptors have therapeutic potential in a variety of disorders including obesity-related, hormonal, and metabolic abnormalities, while avoiding the psychoactive effects in the central nervous system. We applied our in-house algorithm, iterative stochastic elimination, to produc...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00577

    authors: El-Atawneh S,Hirsch S,Hadar R,Tam J,Goldblum A

    更新日期:2019-09-23 00:00:00

  • Locating sweet spots for screening hits and evaluating pan-assay interference filters from the performance analysis of two lead-like libraries.

    abstract::The efficiency of automated compound screening is heavily influenced by the design and the quality of the screening libraries used. We recently reported on the assembly of one diverse and one target-focused lead-like screening library. Using data from 15 enzyme-based screenings conducted using these libraries, their p...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300382f

    authors: Mok NY,Maxe S,Brenk R

    更新日期:2013-03-25 00:00:00

  • Probing fragment complementation by rigid-body docking: in silico reconstitution of calbindin D9k.

    abstract::Fragment complementation is gaining an increasing impact as a nonperturbing method to probe noncovalent interactions within protein supersecondary structures. In this study, the fast Fourier transform rigid-body docking algorithm ZDOCK has been employed for in silico reconstitution of the calcium binding protein calbi...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0501995

    authors: Dell'Orco D,Seeber M,De Benedetti PG,Fanelli F

    更新日期:2005-09-01 00:00:00

  • Homology model-guided 3D-QSAR studies of HIV-1 integrase inhibitors.

    abstract::In the present study, we report the exploration of binding modes of potent HIV-1 integrase (IN) inhibitors MK-0518 (raltegravir) and GS-9137 (elvitegravir) as well as chalcone and related amide IN inhibitors we recently synthesized and the development of 3D-QSAR models for integrase inhibition. Homology models of DNA-...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200485a

    authors: Sharma H,Cheng X,Buolamwini JK

    更新日期:2012-02-27 00:00:00

  • Binding of Cytotoxic Aβ25-35 Peptide to the Dimyristoylphosphatidylcholine Lipid Bilayer.

    abstract::Aβ25-35 is a short, cytotoxic, and naturally occurring fragment of the Alzheimer's Aβ peptide. To map the molecular mechanism of Aβ25-35 binding to the zwitterionic dimyristoylphosphatidylcholine (DMPC) bilayer, we have performed replica exchange with solute tempering molecular dynamics simulations using all-atom expl...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00045

    authors: Smith AK,Klimov DK

    更新日期:2018-05-29 00:00:00

  • Regulation of JAK2 activation by Janus homology 2: evidence from molecular dynamics simulations.

    abstract::Janus kinase 2 (JAK2) is a protein tyrosine kinase implicated in signaling by specific members of the cytokine receptor family. Although it has been established that the JAK2 tyrosine kinase is negatively regulated by the JAK homology 2 (JH2) pseudokinase domain, the underlying mechanism of JH2 mediated regulation rem...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300308g

    authors: Wan S,Coveney PV

    更新日期:2012-11-26 00:00:00

  • Matched Molecular Series Analysis for ADME Property Prediction.

    abstract::Generation and prioritization of new molecules are the most central part of the drug design process. Matched molecular series analysis (MMSA) has recently been proposed as a formal approach that captures both of these key elements of design. In order to better understand the power of MMSA and its specific limitations,...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00269

    authors: Awale M,Riniker S,Kramer C

    更新日期:2020-06-22 00:00:00

  • Pharmer: efficient and exact pharmacophore search.

    abstract::Pharmacophore search is a key component of many drug discovery efforts. Pharmer is a new computational approach to pharmacophore search that scales with the breadth and complexity of the query, not the size of the compound library being screened. Two novel methods for organizing pharmacophore data, the Pharmer KDB-tre...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200097m

    authors: Koes DR,Camacho CJ

    更新日期:2011-06-27 00:00:00

  • Statistical confidence for variable selection in QSAR models via Monte Carlo cross-validation.

    abstract::A new variable selection wrapper method named the Monte Carlo variable selection (MCVS) method was developed utilizing the framework of the Monte Carlo cross-validation (MCCV) approach. The MCVS method reports the variable selection results in the most conventional and common measure of statistical hypothesis testing,...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700283s

    authors: Konovalov DA,Sim N,Deconinck E,Vander Heyden Y,Coomans D

    更新日期:2008-02-01 00:00:00

  • Truncated variants of the GCN4 transcription activator protein bind DNA with dramatically different dynamical motifs.

    abstract::The yeast protein GCN4 is a transcriptional activator in the basic leucine zipper (bZip) family, whose distinguishing feature is the "chopstick-like" homodimer of alpha helices formed at the DNA-binding interface. While experiments have shown that truncated versions of the protein retain biologically relevant DNA-bind...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500448e

    authors: McHarris DM,Barr DA

    更新日期:2014-10-27 00:00:00

  • Time-Domain Analysis of Molecular Dynamics Trajectories Using Deep Neural Networks: Application to Activity Ranking of Tankyrase Inhibitors.

    abstract::Molecular dynamics simulations provide valuable insights into the behavior of molecular systems. Extending the recent trend of using machine learning techniques to predict physicochemical properties from molecular dynamics data, we propose to consider the trajectories as multidimensional time series represented by 2D ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00135

    authors: Berishvili VP,Perkin VO,Voronkov AE,Radchenko EV,Syed R,Venkata Ramana Reddy C,Pillay V,Kumar P,Choonara YE,Kamal A,Palyulin VA

    更新日期:2019-08-26 00:00:00

  • Trust, but Verify II: A Practical Guide to Chemogenomics Data Curation.

    abstract::There is a growing public concern about the lack of reproducibility of experimental data published in peer-reviewed scientific literature. Herein, we review the most recent alerts regarding experimental data quality and discuss initiatives taken thus far to address this problem, especially in the area of chemical geno...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章,评审

    doi:10.1021/acs.jcim.6b00129

    authors: Fourches D,Muratov E,Tropsha A

    更新日期:2016-07-25 00:00:00

  • Simulation of 2D NMR Spectra of Carbohydrates Using GODESS Software.

    abstract::Glycan Optimized Dual Empirical Spectrum Simulation (GODESS) is a web service, which has been recently shown to be one of the most accurate tools for simulation of (1)H and (13)C 1D NMR spectra of natural carbohydrates and their derivatives. The new version of GODESS supports visualization of the simulated (1)H and (1...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00083

    authors: Kapaev RR,Toukach PV

    更新日期:2016-06-27 00:00:00

  • Critical Assessment of the Hildebrand and Hansen Solubility Parameters for Polymers.

    abstract::Solubility parameter models are widely used to select suitable solvents/nonsolvents for polymers in a variety of processing and engineering applications. In this study, we focus on two well-established models, namely, the Hildebrand and Hansen solubility parameter models. Both models are built on the basis of the noti...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00656

    authors: Venkatram S,Kim C,Chandrasekaran A,Ramprasad R

    更新日期:2019-10-28 00:00:00

  • Parameterization and conformational sampling effects in pharmacophore multiplet searching.

    abstract::Pharmacophore patterns in ligands can be effectively characterized in terms of their constituent pharmacophore multiplets. Bitsets (fingerprints) encoding which particular multiplets are found in a given ligand have been and continue to be used as molecular descriptors in a range of molecular modeling applications, fr...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci800234q

    authors: Fox PC,Wolohan PR,Abrahamian E,Clark RD

    更新日期:2008-12-01 00:00:00

  • Molecular Mechanism, Dynamics, and Energetics of Protein-Mediated Dinucleotide Flipping in a Mismatched DNA: A Computational Study of the RAD4-DNA Complex.

    abstract::DNA damage alters genetic information and adversely affects gene expression pathways leading to various complex genetic disorders and cancers. DNA repair proteins recognize and rectify DNA damage and mismatches with high fidelity. A critical molecular event that occurs during most protein-mediated DNA repair processes...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00636

    authors: Pitta K,Krishnan M

    更新日期:2018-03-26 00:00:00

  • Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise.

    abstract::We describe a general methodology for designing an empirical scoring function and provide smina, a version of AutoDock Vina specially optimized to support high-throughput scoring and user-specified custom scoring functions. Using our general method, the unique capabilities of smina, a set of default interaction terms ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300604z

    authors: Koes DR,Baumgartner MP,Camacho CJ

    更新日期:2013-08-26 00:00:00

  • Molecular Structure Extraction from Documents Using Deep Learning.

    abstract::Chemical structure extraction from documents remains a hard problem because of both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally but still routinely encounter s...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00669

    authors: Staker J,Marshall K,Abel R,McQuaw CM

    更新日期:2019-03-25 00:00:00

  • Physics-based scoring of protein-ligand complexes: enrichment of known inhibitors in large-scale virtual screening.

    abstract::We demonstrate that using an all-atom molecular mechanics force field combined with an implicit solvent model for scoring protein-ligand complexes is a promising approach for improving inhibitor enrichment in the virtual screening of large compound databases. The rescoring method is evaluated by the extent to which kn...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0502855

    authors: Huang N,Kalyanaraman C,Irwin JJ,Jacobson MP

    更新日期:2006-01-01 00:00:00

  • Computational Design of Biologically Active Anticancer Peptides and Their Interactions with Heterogeneous POPC/POPS Lipid Membranes.

    abstract::Over the last few decades, anticancer peptides (ACPs) have turned into potential warheads against cancer. Apart from small molecules and monoclonal antibodies, ACPs have been proven to be effective against cancer cells. ACPs are small cationic peptides that selectively bind to the negatively charged cancer cell membra...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.9b00348

    authors: Singh M,Kumar V,Sikka K,Thakur R,Harioudh MK,Mishra DP,Ghosh JK,Siddiqi MI

    更新日期:2020-01-27 00:00:00

  • Periodic cages.

    abstract::Various cages are constructed by using three types of caps: f-cap (derived from spherical fullerenes by deleting zones of various size), kf-cap (obtainable by cutting off the polar ring, of size k), and t-cap ("tubercule"-cap). Building ways are presented, some of them being possible isomerization routes in the real c...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci049738g

    authors: Diudea MV,Nagy CL,Silaghi-Dumitrescu I,Graovac A,Janezic D,Vikić-Topić D

    更新日期:2005-03-01 00:00:00

  • Improving classical substructure-based virtual screening to handle extrapolation challenges.

    abstract::Target-oriented substructure-based virtual screening (sSBVS) of molecules is a promising approach in drug discovery. Yet, there are doubts whether sSBVS is suitable also for extrapolation, that is, for detecting molecules that are very different from those used for training. Herein, we evaluate the predictive power of...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200472s

    authors: Biniashvili T,Schreiber E,Kliger Y

    更新日期:2012-03-26 00:00:00

  • 3D-QSAR and docking studies of selective GSK-3beta inhibitors. Comparison with a thieno[2,3-b]pyrrolizinone derivative, a new potential lead for GSK-3beta ligands.

    abstract::The three-dimensional structures of 3-anilino-4-arylmaleimides, selective GSK-3beta inhibitors, were correlated to their biological affinities by 3D-QSAR studies (CoMFA method). The cocrystallographic data of GSK-3beta vs 3-anilino-4-arylmaleimide allowed us to compare 3D-QSAR results to experimental intermolecular in...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci050008y

    authors: Lescot E,Bureau R,Sopkova-de Oliveira Santos J,Rochais C,Lisowski V,Lancelot JC,Rault S

    更新日期:2005-05-01 00:00:00

  • Free energy calculations give insight into the stereoselective hydroxylation of α-ionones by engineered cytochrome P450 BM3 mutants.

    abstract::Previously, stereoselective hydroxylation of α-ionone by Cytochrome P450 BM3 mutants M01 A82W and M11 L437N was observed. While both mutants hydroxylate α-ionone in a regioselective manner at the C3 position, M01 A82W catalyzes formation of trans-3-OH-α-ionone products whereas M11 L437N exhibits opposite stereoselecti...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300243n

    authors: de Beer SB,Venkataraman H,Geerke DP,Oostenbrink C,Vermeulen NP

    更新日期:2012-08-27 00:00:00

  • Virtual drug screen schema based on multiview similarity integration and ranking aggregation.

    abstract::The current drug virtual screen (VS) methods mainly include two categories. i.e., ligand/target structure-based virtual screen and that, utilizing protein-ligand interaction fingerprint information based on the large number of complex structures. Since the former one focuses on the one-side information while the later...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200481c

    authors: Kang H,Sheng Z,Zhu R,Huang Q,Liu Q,Cao Z

    更新日期:2012-03-26 00:00:00

  • Multidimensional Drift of Sequence Attributes and Functional Profiles in the Superfamily of the Three-Finger Proteins and Their Structural Homologues.

    abstract::Functional diversity of the three-finger-protein domain (TFPD) had been acquired via hypervariability of some sequence positions and extensive insertion/deletion of short AA-segments that caused multidimensional drift of several sequence attributes such as the overall (HI) and local hydrophobicity levels, the isoelect...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00322

    authors: Galat A

    更新日期:2015-09-28 00:00:00

  • Kinetic Models of Cyclosporin A in Polar and Apolar Environments Reveal Multiple Congruent Conformational States.

    abstract::The membrane permeability of cyclic peptides and peptidomimetics, which are generally larger and more complex than typical drug molecules, is likely strongly influenced by the conformational behavior of these compounds in polar and apolar environments. The size and complexity of peptides often limit their bioavailabil...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00251

    authors: Witek J,Keller BG,Blatter M,Meissner A,Wagner T,Riniker S

    更新日期:2016-08-22 00:00:00

  • Searching for coordinated activity cliffs using particle swarm optimization.

    abstract::Activity cliffs are formed by structurally similar compounds having large potency differences. Coordinated activity cliffs evolve when compounds within groups of structural neighbors form multiple cliffs with different partners, giving rise to local networks of cliffs in a data set. Using particle swarm optimization, ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci3000503

    authors: Namasivayam V,Bajorath J

    更新日期:2012-04-23 00:00:00

  • Identification of Enzyme Genes Using Chemical Structure Alignments of Substrate-Product Pairs.

    abstract::Although there are several databases that contain data on many metabolites and reactions in biochemical pathways, there is still a big gap in the numbers between experimentally identified enzymes and metabolites. It is supposed that many catalytic enzyme genes are still unknown. Although there are previous studies tha...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00216

    authors: Moriya Y,Yamada T,Okuda S,Nakagawa Z,Kotera M,Tokimatsu T,Kanehisa M,Goto S

    更新日期:2016-03-28 00:00:00