Abstract:
:Machine learning has exhibited powerful capabilities in many areas. However, machine learning models are mostly database dependent, requiring a new model if the database changes. Therefore, a universal model is highly desired to accommodate the widest variety of databases. Fortunately, this universality may be achieved by ensemble learning, which can integrate multiple learners to meet the demands of diversified databases. Therefore, we propose a general procedure for learning ensemble establishment based on noncovalent interactions (NCIs) databases. Additionally, accurate NCI computation is quite demanding for first-principles methods, for which a competent machine learning model can be an efficient solution to obtain high NCI accuracy with minimal computational resources. In regard to these aspects, multiple schemes of ensemble learning models (Bagging, Boosting, and Stacking frameworks), are explored in this study. The models are based on various low levels of density functional theory (DFT) calculations for the benchmark databases S66, S22, and X40. All NCIs computed by the DFT calculations can be improved to high-level accuracy (root-mean-square error RMSE = 0.22 kcal/mol in contrast to CCSD(T)/CBS benchmark) by established ensemble learning models. Compared with single machine learning models, ensemble models show better accuracy (RMSE of the best model is further lowered by ∼25%), robustness and goodness-of-fit according to evaluation parameters suggested by the OECD. Among ensemble learning models, heterogeneous Stacking ensemble models show the most valuable application potential. The standardized procedure of constructing learning ensembles has been well utilized on several NCI data sets, and this procedure may also be applicable for other chemical databases.
journal_name
J Chem Inf Modeljournal_title
Journal of chemical information and modelingauthors
Li W,Miao W,Cui J,Fang C,Su S,Li H,Hu L,Lu Y,Chen Gdoi
10.1021/acs.jcim.8b00878subject
Has Abstractpub_date
2019-05-28 00:00:00pages
1849-1857issue
5eissn
1549-9596issn
1549-960Xjournal_volume
59pub_type
杂志文章abstract::Targeted covalent inhibitors (TCIs) have been successfully developed as high-affinity and selective inhibitors of enzymes of the protein kinase family. These drugs typically act by undergoing an electrophilic addition with an active-site cysteine residue, so design of a TCI begins with the identification of a "druggab...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.8b00454
更新日期:2018-09-24 00:00:00
abstract::The importance of thorough analyses of the secondary structures in proteins as basic structural units cannot be overemphasized. Although recent computational methods have achieved reasonably high accuracy for predicting secondary structures from amino acid sequences, a simple and fundamental empirical approach to char...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci900452z
更新日期:2010-04-26 00:00:00
abstract::An index of the activation of Class A G-protein-coupled receptors (GPCRs) has been trained using interhelix distances from a series of microsecond molecular-dynamics simulations and tested for 268 published X-ray structures. In a three-class model that includes intermediate structures, 63% of the active structures are...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00604
更新日期:2019-09-23 00:00:00
abstract::G protein-coupled receptors (GPCRs) are the largest family of cell surface receptors, which is arguably the most important family of drug target. With the technology breakthroughs in X-ray crystallography and cryo-electron microscopy, more than 300 GPCR-ligand complex structures have been publicly reported since 2007,...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00699
更新日期:2020-09-28 00:00:00
abstract::Telomere maintenance is a universal cancer hallmark, and small molecules that disrupt telomere maintenance generally have anticancer properties. Since the vast majority of cancer cells utilize telomerase activity for telomere maintenance, the enzyme has been considered as an anticancer drug target. Recently, rational ...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.5b00336
更新日期:2015-12-28 00:00:00
abstract::We propose a hypothesis that "a model of active compound can be provided by integrating information of compounds high-ranked by docking simulation of a random compound library". In our hypothesis, the inclusion of true active compounds in the high-ranked compound is not necessary. We regard the high-ranked compounds a...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci7003384
更新日期:2008-03-01 00:00:00
abstract::Databases of small, potentially bioactive molecules are ubiquitous across the industry and academia. Designed such that each unique compound should appear only once, the multiplicity of ways in which many compounds can be represented means that these databases require methods for standardizing the representation of ch...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c00232
更新日期:2020-08-24 00:00:00
abstract::There are only four derivatives of pseudouridine (Ψ) that are known to occur naturally in RNA as post-transcriptional modifications. We have studied the conformational consequences of pseudouridylation and further modifications using replica exchange molecular dynamics simulations at the nucleoside level, and the simu...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c00369
更新日期:2020-10-26 00:00:00
abstract::Little attention has been given to the selection of trial descriptor sets when designing a QSAR analysis even though a great number of descriptor classes, and often a greater number of descriptors within a given class, are now available. This paper reports an effort to explore interrelationships between QSAR models an...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci3005308
更新日期:2013-01-28 00:00:00
abstract::Different forms of synaptic plasticity in the cerebellum expressed at the synapses onto Purkinje cells (PCs) are mediated by membrane metabotropic glutamate receptors (mGluRs). There are three main mGluR groups with a total of 8 subtypes. Although mGluRs are also found at the climbing fiber (CF) to PC synapses, the di...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci050161s
更新日期:2005-11-01 00:00:00
abstract::How well do different classification methods perform in selecting the ligands of a protein target out of large compound collections not used to train the model? Support vector machines, random forest, artificial neural networks, k-nearest-neighbor classification with genetic-algorithm-optimized feature selection, tren...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci050519k
更新日期:2006-05-01 00:00:00
abstract::Protein-ligand binding is essential to almost all life processes. The understanding of protein-ligand interactions is fundamentally important to rational drug and protein design. Based on large scale data sets, we show that protein rigidity strengthening or flexibility reduction is a mechanism in protein-ligand bindin...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.7b00226
更新日期:2017-07-24 00:00:00
abstract::Dynamical properties of proteins play an essential role in their function exertion. The elastic network model (ENM) is an effective and efficient tool in characterizing the intrinsic dynamical properties encoded in biomacromolecule structures. The Gaussian network model (GNM) and anisotropic network model (ANM) are th...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c01178
更新日期:2021-01-26 00:00:00
abstract::The emergence of multidrug-resistant Staphylococcus aureus (S. aureus) makes the treatment of infectious diseases in hospitals more difficult and increases the mortality of the patients. In this study, we attempted to identify novel potent antibiotic candidate compounds against S. aureus dihydrofolate reductase (saDHF...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci400686d
更新日期:2014-04-28 00:00:00
abstract::A fully folded functional protein is stabilized by several noncovalent interactions. When a protein undergoes conformational motions, the existing noncovalent interactions may be maintained. They may also break or new interactions may be formed. Knowledge of the dynamical nature of the different types of noncovalent i...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci200302q
更新日期:2011-12-27 00:00:00
abstract::Human telomeric DNA G-quadruplex has been identified as a good therapeutic target in cancer treatment. G-quadruplex-specific ligands that stabilize the G-quadruplex have great potential to be developed as anticancer agents. Two crystal structures (an apo form of parallel stranded human telomeric G-quadruplex and its h...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.7b00287
更新日期:2017-11-27 00:00:00
abstract::The protection exerted by 3,5-dihydroxy-4-methoxybenzyl alcohol (DHMBA), a phenolic compound recently isolated from the Pacific oyster, against oxidative stress (OS) is investigated using the density functional theory. Our results indicate that DHMBA is an outstanding peroxyl radical scavenger, being about 15 times an...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.5b00513
更新日期:2015-12-28 00:00:00
abstract::Congeners are molecules based on the same carbon skeleton but are different by the number of substituents and/or a substitution pattern. Examples are 1-chloronaphthalene, 1,4-dichloronaphthalene, and 1,3,8-trichloronaphthalene. Various persistent organic pollutants (POPs) exist in the environment as families of congen...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci300289b
更新日期:2012-11-26 00:00:00
abstract::Glycan Optimized Dual Empirical Spectrum Simulation (GODESS) is a web service, which has been recently shown to be one of the most accurate tools for simulation of (1)H and (13)C 1D NMR spectra of natural carbohydrates and their derivatives. The new version of GODESS supports visualization of the simulated (1)H and (1...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.6b00083
更新日期:2016-06-27 00:00:00
abstract::A model for prediction of percent intestinal absorption (%Abs) of neutral molecules was developed based upon surface charges of the molecule calculated by density functional theory (DFT). The surface charges are decomposed into sigma moments which are correlated to a partition coefficient representing transfer of the ...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci049653f
更新日期:2005-09-01 00:00:00
abstract::Aminoglycosides are antibiotics targeting the 16S RNA A site of the bacterial ribosome. There have been many efforts directed toward design of their synthetic derivatives, however with only few successes. As RNA binders, aminoglycosides are also a difficult target for computational drug design, since most of the exist...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci800361a
更新日期:2009-02-01 00:00:00
abstract::In the present contribution, we have developed a database, called the FAR-database, where the acronym FAR stands for Fused Aromatic Rings, which presents the results of nuclear independent chemical shifts calculations, NICS(0), NICS(1), NICS(0)ZZ, and NICS(1)ZZ, of 660 neutral benzenoid-PAHs and cyclopenta-fused PAHs....
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00909
更新日期:2020-02-24 00:00:00
abstract::The partitioning of solute molecules between immiscible solvents with significantly different polarities is of great importance. The polarization between the solute and solvent molecules plays an essential role in determining the solubility of the solute, which makes computational studies utilizing molecular mechanics...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.7b00001
更新日期:2017-10-23 00:00:00
abstract::With the hope of discovering effective analgesics with fewer side effects, attention has recently shifted to allosteric modulators of the opioid receptors. In the past two years, the first chemotypes of positive or silent allosteric modulators (PAMs or SAMs, respectively) of μ- and δ-opioid receptor types have been re...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.5b00388
更新日期:2015-09-28 00:00:00
abstract::Binding hot spots are regions of proteins that, due to their potentially high contribution to the binding free energy, have high propensity to bind small molecules. We present benchmark sets for testing computational methods for the identification of binding hot spots with emphasis on fragment-based ligand discovery. ...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c00877
更新日期:2020-12-28 00:00:00
abstract::Our recent studies show that the single Tyr residue in the sequence of amyloid-β42 (Aβ42) is reactive toward various ligands, including metals and adenosine trisphospate (see: Coskuner , O. J. Biol. Inorg. Chem. 2016 , 21 , 957 - 973 and Coskuner , O. ; Murray , I. V. J. J. Alzheimer's Dis. 2014 , 41 , 561 - 574 ). Ho...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.6b00761
更新日期:2017-06-26 00:00:00
abstract::Small molecule flexible alignment is a critical component of both ligand- and structure-based methods in computer-aided drug discovery. Despite its importance, the availability of high-quality flexible alignment software packages is limited. Here, we present BCL::MolAlign, a freely available property-based molecular a...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00020
更新日期:2019-02-25 00:00:00
abstract::Scoring the activity of compounds in phenotypic high-throughput assays presents a unique challenge because of the limited resolution and inherent measurement error of these assays. Techniques that leverage the structural similarity of compounds within an assay can be used to improve the hit-recovery rate from screenin...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci050087d
更新日期:2005-11-01 00:00:00
abstract::A database has been derived from recently reported [60]fullerene derivatives, and their binding scores with HIV-1 PR have been computed using docking techniques. Computational methods have been used to predict which derivatives may have high binding affinities, and for these compounds biological tests have been perfor...
journal_title:Journal of chemical information and modeling
pub_type: 信件
doi:10.1021/ci900047s
更新日期:2009-05-01 00:00:00
abstract::Sterol 14α-demethylase (CYP51) is the main drug target for the treatment of fungal infections. The discovery of new efficient fungal CYP51 inhibitors requires an understanding of the structural requirements for selectivity for the fungal over the human ortholog. In this study, a binding mode of the pyridylethanol(phen...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci500556k
更新日期:2014-12-22 00:00:00