Entropy-scaling search of massive biological data.

Abstract:

:Many data sets exhibit well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here we introduce a framework for similarity search based on characterizing a data set's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the data set is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains-high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND (3700x BLASTX)), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve 'compressive omics,' and the general theory can be readily applied to data science problems outside of biology. Source code: http://gems.csail.mit.edu.

journal_name

Cell Syst

journal_title

Cell systems

authors

Yu YW,Daniels NM,Danko DC,Berger B

doi

10.1016/j.cels.2015.08.004

subject

Has Abstract

pub_date

2015-08-26 00:00:00

pages

130-140

issue

2

eissn

2405-4712

issn

2405-4720

pii

S2405-4712(15)00058-7

journal_volume

1

pub_type

杂志文章
  • Orchestration of DNA Damage Checkpoint Dynamics across the Human Cell Cycle.

    abstract::Although molecular mechanisms that prompt cell-cycle arrest in response to DNA damage have been elucidated, the systems-level properties of DNA damage checkpoints are not understood. Here, using time-lapse microscopy and simulations that model the cell cycle as a series of Poisson processes, we characterize DNA damage...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2017.09.015

    authors: Chao HX,Poovey CE,Privette AA,Grant GD,Chao HY,Cook JG,Purvis JE

    更新日期:2017-11-22 00:00:00

  • Systems Analyses Reveal Shared and Diverse Attributes of Oct4 Regulation in Pluripotent Cells.

    abstract::We combine a genome-scale RNAi screen in mouse epiblast stem cells (EpiSCs) with genetic interaction, protein localization, and "protein-level dependency" studies-a systematic technique that uncovers post-transcriptional regulation-to delineate the network of factors that control the expression of Oct4, a key regulato...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2015.08.002

    authors: Ding L,Paszkowski-Rogacz M,Winzi M,Chakraborty D,Theis M,Singh S,Ciotta G,Poser I,Roguev A,Chu WK,Choudhary C,Mann M,Stewart AF,Krogan N,Buchholz F

    更新日期:2015-08-26 00:00:00

  • Characterizing Strain Variation in Engineered E. coli Using a Multi-Omics-Based Workflow.

    abstract::Understanding the complex interactions that occur between heterologous and native biochemical pathways represents a major challenge in metabolic engineering and synthetic biology. We present a workflow that integrates metabolomics, proteomics, and genome-scale models of Escherichia coli metabolism to study the effects...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2016.04.004

    authors: Brunk E,George KW,Alonso-Gutierrez J,Thompson M,Baidoo E,Wang G,Petzold CJ,McCloskey D,Monk J,Yang L,O'Brien EJ,Batth TS,Martin HG,Feist A,Adams PD,Keasling JD,Palsson BO,Lee TS

    更新日期:2016-05-25 00:00:00

  • The Effects of Stochasticity at the Single-Cell Level and Cell Size Control on the Population Growth.

    abstract::Establishing a quantitative connection between the population growth rate and the generation times of single cells is a prerequisite for understanding evolutionary dynamics of microbes. However, existing theories fail to account for the experimentally observed correlations between mother-daughter generation times that...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2017.08.015

    authors: Lin J,Amir A

    更新日期:2017-10-25 00:00:00

  • Systematic Analysis of the Determinants of Gene Expression Noise in Embryonic Stem Cells.

    abstract::Isogenic cells in a common environment show substantial cell-to-cell variation in gene expression, often referred to as "expression noise." Here, we use multiple single-cell RNA-sequencing datasets to identify features associated with high or low expression noise in mouse embryonic stem cells. These include the core p...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2017.10.003

    authors: Faure AJ,Schmiedel JM,Lehner B

    更新日期:2017-11-22 00:00:00

  • Optogenetic Control of Calcium Oscillation Waveform Defines NFAT as an Integrator of Calcium Load.

    abstract::It is known that the calcium-dependent transcription factor NFAT initiates transcription in response to pulsatile loads of calcium signal. However, the relative contributions of calcium oscillation frequency, amplitude, and duty cycle to transcriptional activity remain unclear. Here, we engineer HeLa cells to permit o...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2016.03.010

    authors: Hannanta-Anan P,Chow BY

    更新日期:2016-04-27 00:00:00

  • Reconstruction of Cell-type-Specific Interactomes at Single-Cell Resolution.

    abstract::The human interactome is instrumental in the systems-level study of the cell and the contextualization of disease-associated gene perturbations. However, reference organismal interactomes do not capture the cell-type-specific context in which proteins and modules preferentially act. Here, we introduce SCINET, a comput...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2019.10.007

    authors: Mohammadi S,Davila-Velderrain J,Kellis M

    更新日期:2019-12-18 00:00:00

  • How Do Chaperones Protect a Cell's Proteins from Oxidative Damage?

    abstract::The accumulation of protein damage in aging organisms is thought to contribute to many aging-related diseases. Yet the properties determining which proteins are most susceptible remain poorly understood. Are certain conformations more vulnerable? Which chaperones are the main guardians? We address these questions with...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2018.05.001

    authors: Santra M,Dill KA,de Graff AMR

    更新日期:2018-06-27 00:00:00

  • Biophysics of Temporal Interference Stimulation.

    abstract::Temporal interference (TI) is a non-invasive neurostimulation technique that utilizes high-frequency external electric fields to stimulate deep neuronal structures without affecting superficial, off-target structures. TI represents a potential breakthrough for treating conditions, such as Parkinson's disease and chron...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2020.10.004

    authors: Mirzakhalili E,Barra B,Capogrosso M,Lempka SF

    更新日期:2020-12-16 00:00:00

  • RBM-MHC: A Semi-Supervised Machine-Learning Method for Sample-Specific Prediction of Antigen Presentation by HLA-I Alleles.

    abstract::The recent increase of immunopeptidomics data, obtained by mass spectrometry or binding assays, opens up possibilities for investigating endogenous antigen presentation by the highly polymorphic human leukocyte antigen class I (HLA-I) protein. State-of-the-art methods predict with high accuracy presentation by HLA all...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2020.11.005

    authors: Bravi B,Tubiana J,Cocco S,Monasson R,Mora T,Walczak AM

    更新日期:2020-12-11 00:00:00

  • Measuring Signaling and RNA-Seq in the Same Cell Links Gene Expression to Dynamic Patterns of NF-κB Activation.

    abstract::Signaling proteins display remarkable cell-to-cell heterogeneity in their dynamic responses to stimuli, but the consequences of this heterogeneity remain largely unknown. For instance, the contribution of the dynamics of the innate immune transcription factor nuclear factor κB (NF-κB) to gene expression output is disp...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2017.03.010

    authors: Lane K,Van Valen D,DeFelice MM,Macklin DN,Kudo T,Jaimovich A,Carr A,Meyer T,Pe'er D,Boutet SC,Covert MW

    更新日期:2017-04-26 00:00:00

  • Conservation and Divergence of p53 Oscillation Dynamics across Species.

    abstract::The tumor-suppressing transcription factor p53 is highly conserved at the protein level and plays a key role in the DNA damage response. One important aspect of p53 regulation is its dynamics in response to DNA damage, which include oscillations. Here, we observe that, while the qualitative oscillatory nature of p53 d...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2017.09.012

    authors: Stewart-Ornstein J,Cheng HWJ,Lahav G

    更新日期:2017-10-25 00:00:00

  • Evaluation of Schink et al.: Having the Gem Shine through a Fog.

    abstract::One snapshot of the peer review process for "Death Rate of E. coli during Starvation Is Set by Maintenance Cost and Biomass Recycling" (Schink et al., 2019). ...

    journal_title:Cell systems

    pub_type: 评论,杂志文章

    doi:10.1016/j.cels.2019.07.004

    authors: Laman Trip DS,Maire T,Youk H

    更新日期:2019-07-24 00:00:00

  • Omics Meets Metabolic Pathway Engineering.

    abstract::A principled approach to integrating metabolomics, proteomics, and genome-scale metabolic modeling facilitaties rational pathway engineering of E. coli. ...

    journal_title:Cell systems

    pub_type: 评论,杂志文章

    doi:10.1016/j.cels.2016.05.005

    authors: Chen GQ

    更新日期:2016-06-22 00:00:00

  • Mapping Cellular Reprogramming via Pooled Overexpression Screens with Paired Fitness and Single-Cell RNA-Sequencing Readout.

    abstract::Understanding the effects of genetic perturbations on the cellular state has been challenging using traditional pooled screens, which typically rely on the delivery of a single perturbation per cell and unidimensional phenotypic readouts. Here, we use barcoded open reading frame overexpression libraries coupled with s...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2018.10.008

    authors: Parekh U,Wu Y,Zhao D,Worlikar A,Shah N,Zhang K,Mali P

    更新日期:2018-11-28 00:00:00

  • Principles of Systems Biology, No. 31.

    abstract::This month: selected work from the 2018 RECOMB meeting, organized by Ecole Polytechnique and held last April in Paris. ...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2018.08.005

    authors: Cho H,Berger B,Peng J,Galitzine C,Vitek O,Beltran PMJ,Cristea IM,Görtler F,Solbrig S,Wettig T,Oefner PJ,Spang R,Altenbuchinger M,Basso RS,Hochbaum D,Vandin F,Silverbush D,Cristea S,Yanovich G,Geiger T,Beerenwinkel

    更新日期:2018-08-22 00:00:00

  • Environment Tunes Propagation of Cell-to-Cell Variation in the Human Macrophage Gene Network.

    abstract::Cell-to-cell variation in gene expression and the propagation of such variation (PoV or "noise propagation") from one gene to another in the gene network, as reflected by gene-gene correlation across single cells, are commonly observed in single-cell transcriptomic studies and can shape the phenotypic diversity of cel...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2017.03.002

    authors: Martins AJ,Narayanan M,Prüstel T,Fixsen B,Park K,Gottschalk RA,Lu Y,Andrews-Pfannkoch C,Lau WW,Wendelsdorf KV,Tsang JS

    更新日期:2017-04-26 00:00:00

  • Metabolism-Centric Trans-Omics.

    abstract::Two recent studies in Cell and Science demonstrate the reconstruction of global mechanistic networks and identification of regulatory principles from multi-omics data. ...

    journal_title:Cell systems

    pub_type: 评论,杂志文章

    doi:10.1016/j.cels.2017.01.007

    authors: Yugi K,Kuroda S

    更新日期:2017-01-25 00:00:00

  • Breaking New Ground in the Landscape of Single-Cell Analysis.

    abstract::Here, we outline p-Creode, a new algorithm to construct multi-branching cell lineage trajectories from single-cell data. Application of this platform to diverse sources of single-cell data demonstrates its robustness and scalability, while the discovery of a new origin for rare gut tuft cells showcases the utility of ...

    journal_title:Cell systems

    pub_type: 评论,杂志文章

    doi:10.1016/j.cels.2017.12.015

    authors: Kamimoto K,Morris SA

    更新日期:2018-01-24 00:00:00

  • Studying Autism in Context.

    abstract::Studying autism genes in the context of the protein complexes to which they belong illustrates the potential of network-centric approaches for understanding complex genetic disease. ...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2015.11.004

    authors: Das J,Meyer MJ,Yu H

    更新日期:2015-11-25 00:00:00

  • A Pandemic on a Pandemic: Racism and COVID-19 in Blacks.

    abstract::Racism and COVID-19 represent a pandemic on a pandemic for Blacks. The pandemics find themselves synergized to the detriment of Blacks and their health. The complexity of the combination of these pandemics are evident when examining the interplay between racist policing practices and health. ...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2020.07.002

    authors: Laurencin CT,Walker JM

    更新日期:2020-07-22 00:00:00

  • Synthetic Biology and Engineered Live Biotherapeutics: Toward Increasing System Complexity.

    abstract::Recent advances in synthetic biology and biological system engineering have allowed the design and construction of engineered live biotherapeutics targeting a range of human clinical applications. In this review, we outline how systems approaches have been used to move from simple constitutive systems, where a single ...

    journal_title:Cell systems

    pub_type: 杂志文章,评审

    doi:10.1016/j.cels.2018.06.008

    authors: Ozdemir T,Fedorec AJH,Danino T,Barnes CP

    更新日期:2018-07-25 00:00:00

  • Entraining Oscillations in the NF-κB Signaling System: With a Little Help from Noise.

    abstract::Two studies show that noise is a key ingredient of new mechanisms for entraining the NF-κB system. ...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2016.12.008

    authors: Lefranc M

    更新日期:2016-12-21 00:00:00

  • The Hidden Memory of Differentiating Cells.

    abstract::An integrated approach unifies experimental observations and mathematical modeling to represent differentiation dynamics as discrete transition events underpinned by stochastic transitions between hidden states. ...

    journal_title:Cell systems

    pub_type: 评论,杂志文章

    doi:10.1016/j.cels.2017.09.009

    authors: Moris N,Arias AM

    更新日期:2017-09-27 00:00:00

  • Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom.

    abstract::Hi-C experiments study how genomes fold in 3D, generating contact maps containing features as small as 20 bp and as large as 200 Mb. Here we introduce Juicebox, a tool for exploring Hi-C and other contact map data. Juicebox allows users to zoom in and out of Hi-C maps interactively, just as a user of Google Earth migh...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2015.07.012

    authors: Durand NC,Robinson JT,Shamim MS,Machol I,Mesirov JP,Lander ES,Aiden EL

    更新日期:2016-07-01 00:00:00

  • Synthetic 5' UTRs Can Either Up- or Downregulate Expression upon RNA-Binding Protein Binding.

    abstract::The construction of complex gene-regulatory networks requires both inhibitory and upregulatory modules. However, the vast majority of RNA-based regulatory "parts" are inhibitory. Using a synthetic biology approach combined with SHAPE-seq, we explored the regulatory effect of RNA-binding protein (RBP)-RNA interactions ...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2019.04.007

    authors: Katz N,Cohen R,Solomon O,Kaufmann B,Atar O,Yakhini Z,Goldberg S,Amit R

    更新日期:2019-07-24 00:00:00

  • Escaping Circadian Regulation: An Emerging Hallmark of Cancer?

    abstract::Alterations of circadian clock genes are associated with patient survival, tumor stage, and clinical subtype across various cancer types, highlighting the importance of timing in cancer treatment. ...

    journal_title:Cell systems

    pub_type: 评论

    doi:10.1016/j.cels.2018.03.006

    authors: El-Athman R,Relógio A

    更新日期:2018-03-28 00:00:00

  • The Key Parameters that Govern Translation Efficiency.

    abstract::Translation of mRNA into protein is a fundamental yet complex biological process with multiple factors that can potentially affect its efficiency. Here, we study a stochastic model describing the traffic flow of ribosomes along the mRNA and identify the key parameters that govern the overall rate of protein synthesis,...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2019.12.003

    authors: Erdmann-Pham DD,Dao Duc K,Song YS

    更新日期:2020-02-26 00:00:00

  • Emergent Gene Expression Responses to Drug Combinations Predict Higher-Order Drug Interactions.

    abstract::Effective design of combination therapies requires understanding the changes in cell physiology that result from drug interactions. Here, we show that the genome-wide transcriptional response to combinations of two drugs, measured at a rigorously controlled growth rate, can predict higher-order antagonism with a third...

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2019.10.004

    authors: Lukačišin M,Bollenbach T

    更新日期:2019-11-27 00:00:00

  • tRNA Methylation Is a Global Determinant of Bacterial Multi-drug Resistance.

    abstract::Gram-negative bacteria are intrinsically resistant to drugs because of their double-membrane envelope structure that acts as a permeability barrier and as an anchor for efflux pumps. Antibiotics are blocked and expelled from cells and cannot reach high-enough intracellular concentrations to exert a therapeutic effect....

    journal_title:Cell systems

    pub_type: 杂志文章

    doi:10.1016/j.cels.2019.03.008

    authors: Masuda I,Matsubara R,Christian T,Rojas ER,Yadavalli SS,Zhang L,Goulian M,Foster LJ,Huang KC,Hou YM

    更新日期:2019-04-24 00:00:00