Abstract:
:An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function's ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning (ML) methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF; Zhang , Z. J. Chem. Theory Comput. 2018 , 14 , 5045 ) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a ML model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the "comparison" concept, and the resultant RF models were tested on CASF-2013 ( Li , Y. J. Chem. Inf.Model. 2014 , 54 , 1700 ). In a comparison of the performance of our RF models against 29 scoring functions, we found that our models outperformed the other scoring functions in predicting the native pose. In addition, we created two artificially designed potential function sets to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which shares the same peak positions with GARF but has fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. All code and data used in this work are available at https://github.com/JunPei000/random_forest_protein_ligand_decoy_detection .
journal_name
J Chem Inf Modeljournal_title
Journal of chemical information and modelingauthors
Pei J,Zheng Z,Kim H,Song LF,Walworth S,Merz MR,Merz KM Jrdoi
10.1021/acs.jcim.9b00356subject
Has Abstractpub_date
2019-07-22 00:00:00pages
3305-3315issue
7eissn
1549-9596issn
1549-960Xjournal_volume
59pub_type
杂志文章abstract::When both the difference between two quantities and their individual values can be measured or computationally predicted, multiple quantities can be determined from the measurements or predictions of select individual quantities and select pairwise differences. These measurements and predictions form a network connect...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00528
更新日期:2019-11-25 00:00:00
abstract::The momentum gained by research on biologics has not been met yet with equal thrust on the informatics side. There is a noticeable lack of software for data management that empowers the bench scientists working on the development of biologic therapeutics. SARvision|Biologics is a tool to analyze data associated with b...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci400333x
更新日期:2013-10-28 00:00:00
abstract::Binding hot spots are regions of proteins that, due to their potentially high contribution to the binding free energy, have high propensity to bind small molecules. We present benchmark sets for testing computational methods for the identification of binding hot spots with emphasis on fragment-based ligand discovery. ...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c00877
更新日期:2020-12-28 00:00:00
abstract::We introduce the statistics behind a novel type of SAR analysis named "nonadditivity analysis". On the basis of all pairs of matched pairs within a given data set, the approach analyzes whether the same transformations between related molecules have the same effect, i.e., whether they are additive. Assuming that the e...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00631
更新日期:2019-09-23 00:00:00
abstract::We describe a novel deep learning neural network method and its application to impute assay pIC50 values. Unlike conventional machine learning approaches, this method is trained on sparse bioactivity data as input, typical of that found in public and commercial databases, enabling it to learn directly from correlation...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.8b00768
更新日期:2019-03-25 00:00:00
abstract::The viral NS5B RNA-dependent RNA-polymerase (RdRp) is one of the best-studied and promising targets for the development of novel therapeutics against hepatitis C virus (HCV). Allosteric inhibition of this enzyme has emerged as a viable strategy toward blocking replication of viral RNA in cell based systems. Herein, we...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci9004749
更新日期:2010-04-26 00:00:00
abstract::Human telomeric DNA G-quadruplex has been identified as a good therapeutic target in cancer treatment. G-quadruplex-specific ligands that stabilize the G-quadruplex have great potential to be developed as anticancer agents. Two crystal structures (an apo form of parallel stranded human telomeric G-quadruplex and its h...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.7b00287
更新日期:2017-11-27 00:00:00
abstract::On the order of hundreds of absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) models have been described in the literature in the past decade which are more often than not inaccessible to anyone but their authors. Public accessibility is also an issue with computational models for bioactivity, a...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.5b00143
更新日期:2015-06-22 00:00:00
abstract::Group 1 metabotropic glutamate receptors (mGluR) are G-protein coupled receptors with a large bilobate extracellular ligand binding region (LBR) that resembles a Venus fly trap. Closing of this LBR in the presence of a ligand is associated with the activation of the receptor. From conformational sampling of the LBR-li...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci400160x
更新日期:2013-06-24 00:00:00
abstract::In the present study, we report the exploration of binding modes of potent HIV-1 integrase (IN) inhibitors MK-0518 (raltegravir) and GS-9137 (elvitegravir) as well as chalcone and related amide IN inhibitors we recently synthesized and the development of 3D-QSAR models for integrase inhibition. Homology models of DNA-...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci200485a
更新日期:2012-02-27 00:00:00
abstract::The roles of chemical compounds in biological systems are now systematically analyzed by high-throughput experimental technologies. To automate the processing and interpretation of large-scale data it is necessary to develop bioinformatics methods to extract information from the chemical structures of these small mole...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci700006f
更新日期:2007-07-01 00:00:00
abstract::Umami or the taste of monosodium glutamate represents one of the major attractive taste modalities in humans. Therefore, knowledge about biophysical and biochemical properties of the umami taste is important for both scientific research and the food industry. Experimental approaches for predicting umami peptides are l...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c00707
更新日期:2020-12-28 00:00:00
abstract::Since many projects at pharmaceutical organizations get their start from a high-throughput screening (HTS) campaign, improving the quality of the HTS deck can improve the likelihood of discovering a high-quality lead molecule that can be progressed to a drug candidate. Over the past decade, Janssen has implemented sev...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.8b00258
更新日期:2018-10-22 00:00:00
abstract::Fragment complementation is gaining an increasing impact as a nonperturbing method to probe noncovalent interactions within protein supersecondary structures. In this study, the fast Fourier transform rigid-body docking algorithm ZDOCK has been employed for in silico reconstitution of the calcium binding protein calbi...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci0501995
更新日期:2005-09-01 00:00:00
abstract::Alanine scanning is a tool in molecular biology that is commonly used to evaluate the contribution of a specific amino acid residue to the stability and function of a protein. Additionally, this tool is also used to understand whether the side chain of a specific amino acid residue plays a role in the protein's bioact...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.8b00926
更新日期:2019-02-25 00:00:00
abstract::The Common Instrument Middleware Architecture (CIMA) aims at Grid-enabling a wide range of scientific instruments and sensors to enable easy access to and sharing and storage of data produced by these instruments and sensors. This paper describes the implementation of CIMA applied to the field of single-crystal X-ray ...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci050368l
更新日期:2006-05-01 00:00:00
abstract::There are only four derivatives of pseudouridine (Ψ) that are known to occur naturally in RNA as post-transcriptional modifications. We have studied the conformational consequences of pseudouridylation and further modifications using replica exchange molecular dynamics simulations at the nucleoside level, and the simu...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c00369
更新日期:2020-10-26 00:00:00
abstract::Inhibition of protein-protein interactions (PPIs) is emerging as a promising therapeutic strategy despite the difficulty in targeting such interfaces with drug-like small molecules. PPIs generally feature large and flat binding surfaces as compared to typical drug targets. These features pose a challenge for structura...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.5b00103
更新日期:2015-08-24 00:00:00
abstract::Factor Xa inhibitors are innovative anticoagulant agents that provide a better safety/efficacy profile compared to other anticoagulative drugs. A chemical feature-based modeling approach was applied to identify crucial pharmacophore patterns from 3D crystal structures of inhibitors bound to human factor Xa (Pdb entrie...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci049778k
更新日期:2005-01-01 00:00:00
abstract::A novel pharmacophore descriptor Flexophore is presented, which considers molecular flexibility when comparing descriptor similarities. The descriptor is a complete reduced graph of the underlying molecule. Its nodes are represented by enhanced MM2 atom types, while the edge descriptions encode the molecular flexibili...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci700359j
更新日期:2008-04-01 00:00:00
abstract::G-protein coupled receptors (GPCRs) are highly relevant drug targets. Four GPCRs with known crystal structure were analyzed with docking (AutoDock4) and postdocking (MM-PBSA) in order to evaluate the ability to recognize known antagonists from a larger database of molecular decoys and to predict correct binding modes....
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci4000745
更新日期:2013-04-22 00:00:00
abstract::A data set of 130 diverse compounds containing both central nervous system (CNS) and non-CNS drugs was used to generate a renal clearance model using a classical Volsurf approach. Percentage renal clearance data was used as a biological input. The score plots obtained from principal component analysis and partial leas...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci0503309
更新日期:2006-05-01 00:00:00
abstract::The balance between structural stability and functional plasticity in proteins that share common three-dimensional folds is the key factor that drives protein evolvability. The ability to distinguish the parts of homologous proteins that underlie common structural organization patterns from the parts acting as regulat...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.6b00504
更新日期:2017-04-24 00:00:00
abstract::Janus kinase 2 (JAK2) is a protein tyrosine kinase implicated in signaling by specific members of the cytokine receptor family. Although it has been established that the JAK2 tyrosine kinase is negatively regulated by the JAK homology 2 (JH2) pseudokinase domain, the underlying mechanism of JH2 mediated regulation rem...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci300308g
更新日期:2012-11-26 00:00:00
abstract::Interfacial hydration strongly influences interactions between biomolecules. For example, drug-target complexes are often stabilized by hydration networks formed between hydrophilic residues and water molecules at the interface. Exhaustive exploration of hydration networks is challenging for experimental as well as th...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.5b00638
更新日期:2016-01-25 00:00:00
abstract::The appropriate selection of a chemical space represented by the data set, the selection of its chemical data representation, the development of a correct modeling process using a robust and reproducible algorithm, and the performance of an exhaustive training and external validation determine the usability and reprod...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.7b00492
更新日期:2017-11-27 00:00:00
abstract::Reduction of the affinity of the fragment crystallizable (Fc) region with immune receptors by substitution of one or a few amino acids, known as Fc-silencing, is an established approach to reduce the immune effector functions of monoclonal antibody therapeutics. This approach to Fc-silencing, however, is problematic a...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b01198
更新日期:2020-11-23 00:00:00
abstract::Reversible covalent inhibitors have drawn increasing attention in drug design, as they are likely more potent than noncovalent inhibitors and less toxic than covalent inhibitors. Despite those advantages, the computational prediction of reversible covalent binding presents a formidable challenge because the binding pr...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.8b00959
更新日期:2019-05-28 00:00:00
abstract::Ab initio quantum-chemistry programs produce and use large amounts of data, which are usually stored on disk in the form of binary files. A FORTRAN library, named Q5Cost, has been designed and implemented in order to allow the storage of these data sets in a special data format built with the HDF5 technology. This dat...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci7000567
更新日期:2007-05-01 00:00:00
abstract::How well do different classification methods perform in selecting the ligands of a protein target out of large compound collections not used to train the model? Support vector machines, random forest, artificial neural networks, k-nearest-neighbor classification with genetic-algorithm-optimized feature selection, tren...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci050519k
更新日期:2006-05-01 00:00:00