An Efficient Lossless Compression Algorithm for Trajectories of Atom Positions and Volumetric Data.

Abstract:

:We present our newly developed and highly efficient lossless compression algorithm for trajectories of atom positions and volumetric data. The algorithm is designed as a two-step approach. In the first step, efficient polynomial extrapolation schemes reduce the information entropy of the data by exploiting both spatial and temporal continuity. The second step processes the data by a series of transformations (Burrows-Wheeler, move-to-front, run length encoding) and finally compresses the stream with multitable canonical Huffman coding. Our approach reaches a compression ratio of around 15:1 for typical position trajectories in the XYZ format. For volumetric data trajectories in Gaussian Cube format (such as electron density), even a compression ratio of around 35:1 is yielded, which is by far the smallest size of all formats compared here. At the same time, compression and decompression are still reasonably fast for everyday use. The precision of the data can be selected by the user. For storage of the compressed data, we introduce the BQB file format, which is very robust, flexible, and efficient. In contrast to most archiving formats, it allows fast random access to individual trajectory frames. Our method is implemented in C++ and provided as free software under the GNU LGPL license. It has been included in the TRAVIS program package but is also available as stand-alone tool and as a library ("libbqb") for use in other projects.

journal_name

J Chem Inf Model

authors

Brehm M,Thomas M

doi

10.1021/acs.jcim.8b00501

subject

Has Abstract

pub_date

2018-10-22 00:00:00

pages

2092-2107

issue

10

eissn

1549-9596

issn

1549-960X

journal_volume

58

pub_type

杂志文章
  • Comparison of several molecular docking programs: pose prediction and virtual screening accuracy.

    abstract::Molecular docking programs are widely used modeling tools for predicting ligand binding modes and structure based virtual screening. In this study, six molecular docking programs (DOCK, FlexX, GLIDE, ICM, PhDOCK, and Surflex) were evaluated using metrics intended to assess docking pose and virtual screening accuracy. ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci900056c

    authors: Cross JB,Thompson DC,Rai BK,Baber JC,Fan KY,Hu Y,Humblet C

    更新日期:2009-06-01 00:00:00

  • TAMkin: a versatile package for vibrational analysis and chemical kinetics.

    abstract::TAMkin is a program for the calculation and analysis of normal modes, thermochemical properties and chemical reaction rates. At present, the output from the frequently applied software programs ADF, CHARMM, CPMD, CP2K, Gaussian, Q-Chem, and VASP can be analyzed. The normal-mode analysis can be performed using a broad ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci100099g

    authors: Ghysels A,Verstraelen T,Hemelsoet K,Waroquier M,Van Speybroeck V

    更新日期:2010-09-27 00:00:00

  • Molecular Dynamics Simulations of Membrane-Bound STIM1 to Investigate Conformational Changes during STIM1 Activation upon Calcium Release.

    abstract::Calcium is involved in important intracellular processes, such as intracellular signaling from cell membrane receptors to the nucleus. Typically, calcium levels are kept at less than 100 nM in the nucleus and cytosol, but some calcium is stored in the endoplasmic reticulum (ER) lumen for rapid release to activate intr...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00475

    authors: Mukherjee S,Karolak A,Debant M,Buscaglia P,Renaudineau Y,Mignen O,Guida WC,Brooks WH

    更新日期:2017-02-27 00:00:00

  • Long-range effects of a peripheral mutation on the enzymatic activity of cytochrome P450 1A2.

    abstract::The human cytochrome P450 1A2 is an important drug metabolizing and procarcinogen activating enzyme. An experimental study found that a peripheral mutation, F186L, at ∼26 Å away from the enzyme's active site, caused a significant reduction in the enzymatic activity of 1A2 deethylation reactions. In this paper, we expl...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200112b

    authors: Zhang T,Liu LA,Lewis DF,Wei DQ

    更新日期:2011-06-27 00:00:00

  • Efficient Strategy for the Calculation of Solvation Free Energies in Water and Chloroform at the Quantum Mechanical/Molecular Mechanical Level.

    abstract::The partitioning of solute molecules between immiscible solvents with significantly different polarities is of great importance. The polarization between the solute and solvent molecules plays an essential role in determining the solubility of the solute, which makes computational studies utilizing molecular mechanics...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00001

    authors: Wang M,Li P,Jia X,Liu W,Shao Y,Hu W,Zheng J,Brooks BR,Mei Y

    更新日期:2017-10-23 00:00:00

  • Influence of protonation, tautomeric, and stereoisomeric states on protein-ligand docking results.

    abstract::In this work, we present a systematical investigation of the influence of ligand protonation states, stereoisomers, and tautomers on results obtained with the two protein-ligand docking programs GOLD and PLANTS. These different states were generated with a fully automated tool, called SPORES (Structure PrOtonation and...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci800420z

    authors: ten Brink T,Exner TE

    更新日期:2009-06-01 00:00:00

  • Chemoisosterism in the proteome.

    abstract::The concept of chemoisosterism of protein environments is introduced as the complementary property to bioisosterism of chemical fragments. In the same way that two chemical fragments are considered bioisosteric if they can bind to the same protein environment, two protein environments will be considered chemoisosteric...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci3002974

    authors: Jalencas X,Mestres J

    更新日期:2013-02-25 00:00:00

  • Modeling Binding with Large Conformational Changes: Key Points in Ensemble-Docking Approaches.

    abstract::Protein dynamics play a critical role in ligand binding, and different models have been proposed to explain the relationships between protein motion and molecular recognition. Here, we present a study of ligand-binding processes associated with large conformational changes of a protein to elucidate the critical choice...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00125

    authors: Motta S,Bonati L

    更新日期:2017-07-24 00:00:00

  • GalaxyGPCRloop: Template-Based and Ab Initio Structure Sampling of the Extracellular Loops of G-Protein-Coupled Receptors.

    abstract::The second extracellular loops (ECL2s) of G-protein-coupled receptors (GPCRs) are often involved in GPCR functions, and their structures have important implications in drug discovery. However, structure prediction of ECL2 is difficult because of its long length and the structural diversity among different GPCRs. In th...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.8b00148

    authors: Won J,Lee GR,Park H,Seok C

    更新日期:2018-06-25 00:00:00

  • Truncated variants of the GCN4 transcription activator protein bind DNA with dramatically different dynamical motifs.

    abstract::The yeast protein GCN4 is a transcriptional activator in the basic leucine zipper (bZip) family, whose distinguishing feature is the "chopstick-like" homodimer of alpha helices formed at the DNA-binding interface. While experiments have shown that truncated versions of the protein retain biologically relevant DNA-bind...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500448e

    authors: McHarris DM,Barr DA

    更新日期:2014-10-27 00:00:00

  • Estimation of carcinogenicity using molecular fragments tree.

    abstract::Carcinogenicity is an important toxicological endpoint that poses high concern to drug discovery. In this study, we developed a method to extract structural alerts (SAs) and modulating factors of carcinogens on the basis of statistical analyses. First, the Gaston algorithm, a frequent subgraph mining method, was used ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300266p

    authors: Wang Y,Lu J,Wang F,Shen Q,Zheng M,Luo X,Zhu W,Jiang H,Chen K

    更新日期:2012-08-27 00:00:00

  • iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides.

    abstract::Umami or the taste of monosodium glutamate represents one of the major attractive taste modalities in humans. Therefore, knowledge about biophysical and biochemical properties of the umami taste is important for both scientific research and the food industry. Experimental approaches for predicting umami peptides are l...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.0c00707

    authors: Charoenkwan P,Yana J,Nantasenamat C,Hasan MM,Shoombuatong W

    更新日期:2020-12-28 00:00:00

  • A critical assessment of combined ligand- and structure-based approaches to HERG channel blocker modeling.

    abstract::Blockade of human ether-à-go-go related gene (hERG) channel prolongs the duration of the cardiac action potential and is a common reason for drug failure in preclinical safety trials. Therefore, it is of great importance to develop robust in silico tools to predict potential hERG blockers in the early stages of drug d...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200271d

    authors: Du-Cuny L,Chen L,Zhang S

    更新日期:2011-11-28 00:00:00

  • Improving classical substructure-based virtual screening to handle extrapolation challenges.

    abstract::Target-oriented substructure-based virtual screening (sSBVS) of molecules is a promising approach in drug discovery. Yet, there are doubts whether sSBVS is suitable also for extrapolation, that is, for detecting molecules that are very different from those used for training. Herein, we evaluate the predictive power of...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci200472s

    authors: Biniashvili T,Schreiber E,Kliger Y

    更新日期:2012-03-26 00:00:00

  • Structural and Sequence Similarity Makes a Significant Impact on Machine-Learning-Based Scoring Functions for Protein-Ligand Interactions.

    abstract::The prediction of protein-ligand binding affinity has recently been improved remarkably by machine-learning-based scoring functions. For example, using a set of simple descriptors representing the atomic distance counts, the RF-Score improves the Pearson correlation coefficient to about 0.8 on the core set of the PDBb...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.7b00049

    authors: Li Y,Yang J

    更新日期:2017-04-24 00:00:00

  • In silico drug screening approach for the design of magic bullets: a successful example with anti-HIV fullerene derivatized amino acids.

    abstract::A database has been derived from recently reported [60]fullerene derivatives, and their binding scores with HIV-1 PR have been computed using docking techniques. Computational methods have been used to predict which derivatives may have high binding affinities, and for these compounds biological tests have been perfor...

    journal_title:Journal of chemical information and modeling

    pub_type: 信件

    doi:10.1021/ci900047s

    authors: Durdagi S,Supuran CT,Strom TA,Doostdar N,Kumar MK,Barron AR,Mavromoustakos T,Papadopoulos MG

    更新日期:2009-05-01 00:00:00

  • Factors affecting d-block metal-ligand bond lengths: toward an automated library of molecular geometry for metal complexes.

    abstract::Metal-ligand (M-L) bond lengths for a range of ligands (carboxylates, chlorides, pyridines, water, tertiary phosphines, and alkenes) and a variety of metals have been retrieved from the Cambridge Structural Database, CSD. Analysis of the factors which affect M-L bond lengths (for example, ligand coordination mode, oxi...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0500785

    authors: Harris SE,Orpen AG,Bruno IJ,Taylor R

    更新日期:2005-11-01 00:00:00

  • Develop and test a solvent accessible surface area-based model in conformational entropy calculations.

    abstract::It is of great interest in modern drug design to accurately calculate the free energies of protein-ligand or nucleic acid-ligand binding. MM-PBSA (molecular mechanics Poisson-Boltzmann surface area) and MM-GBSA (molecular mechanics generalized Born surface area) have gained popularity in this field. For both methods, ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci300064d

    authors: Wang J,Hou T

    更新日期:2012-05-25 00:00:00

  • Chemoinformatics-based classification of prohibited substances employed for doping in sport.

    abstract::Representative molecules from 10 classes of prohibited substances were taken from the World Anti-Doping Agency (WADA) list, augmented by molecules from corresponding activity classes found in the MDDR database. Together with some explicitly allowed compounds, these formed a set of 5245 molecules. Five types of fingerp...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci0601160

    authors: Cannon EO,Bender A,Palmer DS,Mitchell JB

    更新日期:2006-11-01 00:00:00

  • T-Cell Receptor Binding Affects the Dynamics of the Peptide/MHC-I Complex.

    abstract::The recognition of peptide/MHC by T-cell receptors is one of the most important interactions in the adaptive immune system. A large number of computational studies have investigated the structural dynamics of this interaction. However, to date only limited attention has been paid to differences between the dynamics of...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.5b00511

    authors: Knapp B,Deane CM

    更新日期:2016-01-25 00:00:00

  • New serotonin 5-HT(6) ligands from common feature pharmacophore hypotheses.

    abstract::Serotonin 5-HT6 receptor antagonists are thought to play an important role in the treatment of psychiatry, Alzheimer's disease, and probably obesity. To find novel and potent 5-HT6 antagonists and to provide a new idea for drug design, we used a ligand-based pharmacophore to perform the virtual screening of a commerci...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700160t

    authors: Kim HJ,Doddareddy MR,Choo H,Cho YS,No KT,Park WK,Pae AN

    更新日期:2008-01-01 00:00:00

  • Statistical confidence for variable selection in QSAR models via Monte Carlo cross-validation.

    abstract::A new variable selection wrapper method named the Monte Carlo variable selection (MCVS) method was developed utilizing the framework of the Monte Carlo cross-validation (MCCV) approach. The MCVS method reports the variable selection results in the most conventional and common measure of statistical hypothesis testing,...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci700283s

    authors: Konovalov DA,Sim N,Deconinck E,Vander Heyden Y,Coomans D

    更新日期:2008-02-01 00:00:00

  • Target-independent prediction of drug synergies using only drug lipophilicity.

    abstract::Physicochemical properties of compounds have been instrumental in selecting lead compounds with increased drug-likeness. However, the relationship between physicochemical properties of constituent drugs and the tendency to exhibit drug interaction has not been systematically studied. We assembled physicochemical descr...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci500276x

    authors: Yilancioglu K,Weinstein ZB,Meydan C,Akhmetov A,Toprak I,Durmaz A,Iossifov I,Kazan H,Roth FP,Cokol M

    更新日期:2014-08-25 00:00:00

  • Hidden active information in a random compound library: extraction using a pseudo-structure-activity relationship model.

    abstract::We propose a hypothesis that "a model of active compound can be provided by integrating information of compounds high-ranked by docking simulation of a random compound library". In our hypothesis, the inclusion of true active compounds in the high-ranked compound is not necessary. We regard the high-ranked compounds a...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci7003384

    authors: Fukunishi H,Teramoto R,Shimada J

    更新日期:2008-03-01 00:00:00

  • The assembly-inducing laulimalide/peloruside a binding site on tubulin: molecular modeling and biochemical studies with [³H]peloruside A.

    abstract::We used synthetic peloruside A for the commercial preparation of [³H]peloruside A. The radiolabeled compound bound to preformed tubulin polymer in amounts stoichiometric with the polymer's tubulin content, with an apparent K(d) value of 0.35 μM. A less active peloruside A analogue, (11-R)-peloruside A and laulimalide ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci1002894

    authors: Nguyen TL,Xu X,Gussio R,Ghosh AK,Hamel E

    更新日期:2010-11-22 00:00:00

  • Scaffold topologies. 1. Exhaustive enumeration up to eight rings.

    abstract::Mapping the chemical space of small organic molecules is approached from a theoretical graph theory viewpoint, in an effort to begin the systematic exploration of molecular topologies. We present an algorithm for exhaustive generation of scaffold topologies with up to eight rings and an efficient comparison method for...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci7003412

    authors: Pollock SN,Coutsias EA,Wester MJ,Oprea TI

    更新日期:2008-07-01 00:00:00

  • The valence state combination model: a generic framework for handling tautomers and protonation states.

    abstract::The consistent handling of molecules is probably the most basic and important requirement in the field of cheminformatics. Reliable results can only be obtained if the underlying calculations are independent of the specific way molecules are represented in the input data. However, ensuring consistency is a complex tas...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/ci400724v

    authors: Urbaczek S,Kolodzik A,Rarey M

    更新日期:2014-03-24 00:00:00

  • Unraveling Energy and Dynamics Determinants to Interpret Protein Functional Plasticity: The Limonene-1,2-epoxide-hydrolase Case Study.

    abstract::The balance between structural stability and functional plasticity in proteins that share common three-dimensional folds is the key factor that drives protein evolvability. The ability to distinguish the parts of homologous proteins that underlie common structural organization patterns from the parts acting as regulat...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00504

    authors: Rinaldi S,Gori A,Annovazzi C,Ferrandi EE,Monti D,Colombo G

    更新日期:2017-04-24 00:00:00

  • Molecular Dynamics Simulations of Supramolecular Anticancer Nanotubes.

    abstract::We report here on long-time all-atomistic molecular dynamics simulations of functional supramolecular nanotubes composed by the self-assembly of peptide-drug amphiphiles (DAs). These DAs have been shown to possess an inherently high drug loading of the hydrophobic anticancer drug camptothecin. We probe the self-assemb...

    journal_title:Journal of chemical information and modeling

    pub_type: 信件

    doi:10.1021/acs.jcim.8b00193

    authors: Kang M,Chakraborty K,Loverde SM

    更新日期:2018-06-25 00:00:00

  • Cheminformatics Modeling of Adverse Drug Responses by Clinically Relevant Mutants of Human Androgen Receptor.

    abstract::The human androgen receptor (AR) is a ligand-activated transcription factor that plays a pivotal role in the development and progression of prostate cancer (PCa). Many forms of castration-resistant prostate cancer (CRPC) still rely on the AR for survival. Currently used antiandrogens face clinical limitations as drug ...

    journal_title:Journal of chemical information and modeling

    pub_type: 杂志文章

    doi:10.1021/acs.jcim.6b00400

    authors: Paul N,Carabet LA,Lallous N,Yamazaki T,Gleave ME,Rennie PS,Cherkasov A

    更新日期:2016-12-27 00:00:00