Abstract:
:Visualizing high-dimensional data by projecting them into a two- or three-dimensional space is a popular approach in many scientific fields, including computer-aided drug design and cheminformatics. In contrast, dimensionality reduction techniques have been far less explored for materials informatics. Nevertheless, similar to their usefulness in analyzing the space of, e.g., drug-like molecules, such techniques could provide useful insights on materials space, including an intuitive grasp of the overall distribution of samples, the identification of interesting trends, including the formation of materials clusters and the presence of activity cliffs and outliers, and rational navigation through this space in the search for new materials. Here we present the first application of four dimensionality reduction techniques, namely, principal component analysis (PCA), kernel PCA, Isomap, and diffusion map, to visualize and analyze a part of the materials space populated by solar cells made of metal oxides. Solar cells in general and metal-oxide-based solar cells in particular hold the promise of contributing to the world's search for clean and affordable energy resources. With the exception of PCA, these methods have seldom been used to visualize chemistry space and almost never been used to visualize materials space. For this purpose, we integrated five metal-oxide-based solar cell libraries into a uniform database and subjected it to dimensionality reduction by all four methods, comparing their performances using various criteria such as maintaining the local environment of samples and the clustering structure in the low-dimensional space. We also looked at the number of outliers produced by each method and analyzed common outliers. We found that PCA performs best in terms of the ability to correctly maintain the local environment of samples, whereas Isomap does the best job of assigning class membership on the basis of the identities of nearest neighbors (i.e., it is the best classifier). We also found that many of the outliers identified by all of the methods could be rationalized. We suggest that the methods used in this work could be extended to study other types of solar cells, thereby setting the ground for further analysis of the photovoltaic (PV) space as well as other regions of materials space.
journal_name
J Chem Inf Modeljournal_title
Journal of chemical information and modelingauthors
Kaspi O,Yosipof A,Senderowitz Hdoi
10.1021/acs.jcim.8b00552subject
Has Abstractpub_date
2018-12-24 00:00:00pages
2428-2439issue
12eissn
1549-9596issn
1549-960Xjournal_volume
58pub_type
杂志文章abstract::With the hope of discovering effective analgesics with fewer side effects, attention has recently shifted to allosteric modulators of the opioid receptors. In the past two years, the first chemotypes of positive or silent allosteric modulators (PAMs or SAMs, respectively) of μ- and δ-opioid receptor types have been re...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.5b00388
更新日期:2015-09-28 00:00:00
abstract::In this work, the perception of similarity of reactions catalyzed by hydrolases and oxidoreductases on the basis of the overall breaking and making of bonds of reactions is investigated. Six physicochemical properties for the reacting bond in the substrate of each enzymatic reaction were calculated to describe the cha...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci9004833
更新日期:2010-06-28 00:00:00
abstract::Over the last few decades, anticancer peptides (ACPs) have turned into potential warheads against cancer. Apart from small molecules and monoclonal antibodies, ACPs have been proven to be effective against cancer cells. ACPs are small cationic peptides that selectively bind to the negatively charged cancer cell membra...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00348
更新日期:2020-01-27 00:00:00
abstract::New molecular descriptors, RED (Renyi entropy descriptors), based on the generalized entropies introduced by Renyi are presented. Topological descriptors based on molecular features have proven to be useful for describing molecular profiles. Renyi entropy is used as a variability measure to contract a feature-pair dis...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci900275w
更新日期:2009-11-01 00:00:00
abstract::Large ring cyclodextrins have become increasingly important for drug delivery applications. In this work, we have performed replica-exchange molecular dynamics simulations using both implicit and explicit water solvation models to study the conformational diversity of iota-cyclodextrin containing 14 α-1,4 glycosidic l...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.6b00595
更新日期:2017-04-24 00:00:00
abstract::Novel statistical potentials derived from known protein structures are presented. They are designed to describe cation-pi and amino-pi interactions between a positively charged amino acid or an amino acid carrying a partially charged amino group and an aromatic moiety. These potentials are based on the propensity of r...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci050395b
更新日期:2006-03-01 00:00:00
abstract::Angiotensin II type 1 receptor (AT1R) is the principal regulator of blood pressure in humans. The overactivation of AT1R by the stimulation of angiotensin II would result in high blood pressure. To prevent hypertension, nonpeptide "sartan" drugs, such as valsartan (VST), have been developed to competitively block the ...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.8b00364
更新日期:2018-10-22 00:00:00
abstract::Congeners are molecules based on the same carbon skeleton but are different by the number of substituents and/or a substitution pattern. Examples are 1-chloronaphthalene, 1,4-dichloronaphthalene, and 1,3,8-trichloronaphthalene. Various persistent organic pollutants (POPs) exist in the environment as families of congen...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci300289b
更新日期:2012-11-26 00:00:00
abstract::Increasing protein kinase C (PKC) activity is of potential therapeutic value. Its activation involves an interaction between the C1 domain and diacylglycerol (DAG) at intracellular membrane surfaces; DAG mimetics hold promise as new drugs. We previously developed the isophthalate derivative HMI-1a3, an effective but h...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c00624
更新日期:2020-11-23 00:00:00
abstract::A homology model of the Arabidopsis thaliana UV resistance locus 8 (UVR8) protein is presented herein, showing a seven-bladed β-propeller conformation similar to the globular structure of RCC1. The UVR8 amino acid sequence contains a very high amount of conserved tryptophans, and the homology model shows that seven of...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci200017f
更新日期:2011-06-27 00:00:00
abstract::Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations ar...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci500580y
更新日期:2015-01-26 00:00:00
abstract::Inhibitors of histone deacetylases (HDACIs) have emerged as a new class of drugs for the treatment of human cancers and other diseases because of their effects on cell growth, differentiation, and apoptosis. In this study we have developed several quantitative structure-activity relationship (QSAR) models for 59 chemi...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci800366f
更新日期:2009-02-01 00:00:00
abstract::Previously, stereoselective hydroxylation of α-ionone by Cytochrome P450 BM3 mutants M01 A82W and M11 L437N was observed. While both mutants hydroxylate α-ionone in a regioselective manner at the C3 position, M01 A82W catalyzes formation of trans-3-OH-α-ionone products whereas M11 L437N exhibits opposite stereoselecti...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci300243n
更新日期:2012-08-27 00:00:00
abstract::Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueou...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci060164k
更新日期:2007-01-01 00:00:00
abstract::In this account, a rapid retrosynthesis-based scoring method for the assessment of synthetic accessibility of drug-like molecules, called RASA (Retrosynthesis-based Assessment of Synthetic Accessibility) is devised. RASA first constructs a synthesis tree for the target molecule based on retrosynthetic analysis; in thi...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci100216g
更新日期:2011-10-24 00:00:00
abstract::Partial covalent interactions (PCIs) in proteins, which include hydrogen bonds, salt bridges, cation-π, and π-π interactions, contribute to thermodynamic stability and facilitate interactions with other biomolecules. Several score functions have been developed within the Rosetta protein modeling framework that identif...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.7b00398
更新日期:2018-05-29 00:00:00
abstract::DiSCuS, a "Database System for Compound Selection", has been developed. The primary goal of DiSCuS is to aid researchers in the steps subsequent to generating high-throughput virtual screening (HTVS) results, such as selection of compounds for further study, purchase, or synthesis. To do so, DiSCuS provides (1) a stor...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci400587f
更新日期:2014-01-27 00:00:00
abstract::Saturated acyclic alkanes show steric strain if they are highly branched and, in extreme cases, fall apart rapidly at room temperature. Consequently, attempts to count the number of isomeric forms for a given molecular formula that neglect this physical consideration will inevitably overestimate the size of the availa...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci700246b
更新日期:2007-11-01 00:00:00
abstract::The efficiency of automated compound screening is heavily influenced by the design and the quality of the screening libraries used. We recently reported on the assembly of one diverse and one target-focused lead-like screening library. Using data from 15 enzyme-based screenings conducted using these libraries, their p...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci300382f
更新日期:2013-03-25 00:00:00
abstract::A theoretical study of a variety of cyclohexane-based anion transporters interacting with the chloride anion has been conducted using density functional theory. The calculations have been performed in the gas phase but also, in order to describe the solvation effects on the interaction, two different solvents-chlorofo...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00154
更新日期:2019-05-28 00:00:00
abstract::Homology modeling is a reliable method of predicting the three-dimensional structures of proteins that lack NMR or X-ray crystallographic data. It employs the assumption that a structural resemblance exists between closely related proteins. Despite the availability of many crystal structures of possible templates, onl...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci500001f
更新日期:2014-06-23 00:00:00
abstract::Glycan Optimized Dual Empirical Spectrum Simulation (GODESS) is a web service, which has been recently shown to be one of the most accurate tools for simulation of (1)H and (13)C 1D NMR spectra of natural carbohydrates and their derivatives. The new version of GODESS supports visualization of the simulated (1)H and (1...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.6b00083
更新日期:2016-06-27 00:00:00
abstract::A machine learning enhanced spectrum recognition system called spectrum recognition based on computer vision (SRCV) for data extraction from previously analyzed 13C and 1H NMR spectra has been developed. The intelligent system was designed with four function modules to extract data from three areas of NMR images, incl...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c01046
更新日期:2021-01-25 00:00:00
abstract::The emergence of a large amount of pharmacological, genomic, and network knowledge data provides new challenges and opportunities for drug discovery and development. Identification of real small-molecule drug (SM)-miRNA associations is not only important in the development of effective drug repositioning but also cruc...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.0c00244
更新日期:2020-08-24 00:00:00
abstract::We describe a novel deep learning neural network method and its application to impute assay pIC50 values. Unlike conventional machine learning approaches, this method is trained on sparse bioactivity data as input, typical of that found in public and commercial databases, enabling it to learn directly from correlation...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.8b00768
更新日期:2019-03-25 00:00:00
abstract::The dissolution of a chemical into water is a process fundamental to both chemistry and biology. The persistence of a chemical within the environment and the effects of a chemical within the body are dependent primarily upon aqueous solubility. With the well-documented limitations hindering the accurate experimental d...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci900286s
更新日期:2009-11-01 00:00:00
abstract::In this work, we report an ab initio investigation based on density functional theory calculations within van der Waals D3 corrections to investigate the adsorption properties and activation of CO2 on transition-metal (TM) 13-atom clusters (TM = Ru, Rh, Pd, Ag), which is a key step for the development of subnano catal...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b00792
更新日期:2020-02-24 00:00:00
abstract::The ligand binding determinants for the angiotensin II type 1 receptor (AT1R), a G protein-coupled receptor (GPCR), have been characterized by means of computer simulations. As a first step, a pharmacophore model of various known AT1R ligands exhibiting a wide range of binding affinities was generated. Second, a struc...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/ci400400m
更新日期:2013-11-25 00:00:00
abstract::Binding affinity prediction with implicit solvent models remains a challenge in virtual screening for drug discovery. In order to assess the predictive power of implicit solvent models in docking techniques with Amber scoring, three generalized Born models (GBHCT, GBOBCI, and GBOBCII) available in Dock 6.7 were utiliz...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.6b00418
更新日期:2016-10-24 00:00:00
abstract::Reversible and irreversible covalent ligands are advanced cysteine protease inhibitors in the drug development pipeline. K777 is an irreversible inhibitor of cruzain, a necessary enzyme for the survival of the Trypanosoma cruzi (T. cruzi) parasite, the causative agent of Chagas disease. Despite their importance, irrev...
journal_title:Journal of chemical information and modeling
pub_type: 杂志文章
doi:10.1021/acs.jcim.9b01138
更新日期:2020-03-23 00:00:00