Tesco Grocery 1.0, a large-scale dataset of grocery purchases in London.


:We present the Tesco Grocery 1.0 dataset: a record of 420 M food items purchased by 1.6 M fidelity card owners who shopped at the 411 Tesco stores in Greater London over the course of the entire year of 2015, aggregated at the level of census areas to preserve anonymity. For each area, we report the number of transactions and nutritional properties of the typical food item bought including the average caloric intake and the composition of nutrients. The set of global trade international numbers (barcodes) for each food type is also included. To establish data validity we: i) compare food purchase volumes to population from census to assess representativeness, and ii) match nutrient and energy intake to official statistics of food-related illnesses to appraise the extent to which the dataset is ecologically valid. Given its unprecedented scale and geographic granularity, the data can be used to link food purchases to a number of geographically-salient indicators, which enables studies on health outcomes, cultural aspects, and economic factors.


Sci Data


Scientific data


Aiello LM,Quercia D,Schifanella R,Del Prete L




Has Abstract


2020-02-18 00:00:00












  • A dataset describing a suite of novel antibody reagents for the RAS signaling network.

    abstract::RAS genes are frequently mutated in cancer and have for decades eluded effective therapeutic attack. The National Cancer Institute's RAS Initiative has a focus on understanding pathways and discovering therapies for RAS-driven cancers. Part of these efforts is the generation of novel reagents to enable the quantificat...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Schoenherr RM,Huang D,Voytovich UJ,Ivey RG,Kennedy JJ,Saul RG,Colantonio S,Roberts RR,Knotts JG,Kaczmarczyk JA,Perry C,Hewitt SM,Bocik W,Whiteley GR,Hiltke T,Boja ES,Rodriguez H,Whiteaker JR,Paulovich AG

    更新日期:2019-08-29 00:00:00

  • Genotoype-by-sequencing of three geographically distinct populations of Olympia oysters, Ostrea lurida.

    abstract::Olympia oysters are found along the west coast of North America and as the only native oyster species in the region, receive considerable attention with regard to restoration and conservation. Knowledge of genetic structure of this species is essential for resource managers. Here we provide genetic data for three dist...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: White SJ,Vadopalas B,Silliman K,Roberts SB

    更新日期:2017-09-12 00:00:00

  • Linking in silico MS/MS spectra with chemistry data to improve identification of unknowns.

    abstract::Confident identification of unknown chemicals in high resolution mass spectrometry (HRMS) screening studies requires cohesive workflows and complementary data, tools, and software. Chemistry databases, screening libraries, and chemical metadata have become fixtures in identification workflows. To increase confidence i...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: McEachran AD,Balabin I,Cathey T,Transue TR,Al-Ghoul H,Grulke C,Sobus JR,Williams AJ

    更新日期:2019-08-02 00:00:00

  • Data for training and testing radiation detection algorithms in an urban environment.

    abstract::The detection, identification, and localization of illicit nuclear materials in urban environments is of utmost importance for national security. Most often, the process of performing these operations consists of a team of trained individuals equipped with radiation detection devices that have built-in algorithms to a...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Ghawaly JM Jr,Nicholson AD,Peplow DE,Anderson-Cook CM,Myers KL,Archer DE,Willis MJ,Quiter BJ

    更新日期:2020-10-05 00:00:00

  • Publisher Correction: Tracking vegetation phenology across diverse biomes using Version 2.0 of the PhenoCam Dataset.

    abstract::An amendment to this paper has been published and can be accessed via a link at the top of the paper. ...

    journal_title:Scientific data

    pub_type: 杂志文章,已发布勘误


    authors: Seyednasrollah B,Young AM,Hufkens K,Milliman T,Friedl MA,Frolking S,Richardson AD

    更新日期:2019-11-01 00:00:00

  • Enabling precision medicine in neonatology, an integrated repository for preterm birth research.

    abstract::Preterm birth, or the delivery of an infant prior to 37 weeks of gestation, is a significant cause of infant morbidity and mortality. In the last decade, the advent and continued development of molecular profiling technologies has enabled researchers to generate vast amount of 'omics' data, which together with integra...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Sirota M,Thomas CG,Liu R,Zuhl M,Banerjee P,Wong RJ,Quaintance CC,Leite R,Chubiz J,Anderson R,Chappell J,Kim M,Grobman W,Zhang G,Rokas A,England SK,Parry S,Shaw GM,Simpson JL,Thomson E,Butte AJ,March of Dimes Pre

    更新日期:2018-11-06 00:00:00

  • If these data could talk.

    abstract::In the last few decades, data-driven methods have come to dominate many fields of scientific inquiry. Open data and open-source software have enabled the rapid implementation of novel methods to manage and analyze the growing flood of data. However, it has become apparent that many scientific fields exhibit distressin...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Pasquier T,Lau MK,Trisovic A,Boose ER,Couturier B,Crosas M,Ellison AM,Gibson V,Jones CR,Seltzer M

    更新日期:2017-09-05 00:00:00

  • A data citation roadmap for scholarly data repositories.

    abstract::This article presents a practical roadmap for scholarly data repositories to implement data citation in accordance with the Joint Declaration of Data Citation Principles, a synopsis and harmonization of the recommendations of major science policy bodies. The roadmap was developed by the Repositories Expert Group, as p...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Fenner M,Crosas M,Grethe JS,Kennedy D,Hermjakob H,Rocca-Serra P,Durand G,Berjon R,Karcher S,Martone M,Clark T

    更新日期:2019-04-10 00:00:00

  • Spatiotemporal dataset on Chinese population distribution and its driving factors from 1949 to 2013.

    abstract::Spatio-temporal data on human population and its driving factors is critical to understanding and responding to population problems. Unfortunately, such spatio-temporal data on a large scale and over the long term are often difficult to obtain. Here, we present a dataset on Chinese population distribution and its driv...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Wang L,Chen L

    更新日期:2016-07-05 00:00:00

  • A database of geopositioned Middle East Respiratory Syndrome Coronavirus occurrences.

    abstract::As a World Health Organization Research and Development Blueprint priority pathogen, there is a need to better understand the geographic distribution of Middle East Respiratory Syndrome Coronavirus (MERS-CoV) and its potential to infect mammals and humans. This database documents cases of MERS-CoV globally, with speci...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Ramshaw RE,Letourneau ID,Hong AY,Hon J,Morgan JD,Osborne JCP,Shirude S,Van Kerkhove MD,Hay SI,Pigott DM

    更新日期:2019-12-13 00:00:00

  • Experimental flows through an array of emerged or slightly submerged square cylinders over a rough bed.

    abstract::The experimental dataset presented was collected in an 18 m long and 1 m wide laboratory flume. Low to high flood flows through an urbanized floodplain were modelled. The floodplain bed is rough, modelled with dense artificial grass. A square cylinder array, representing house models, was set on the rough bed. The cyl...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Oukacine M,Proust S,Larrarte F,Goutal N

    更新日期:2021-01-11 00:00:00

  • Multivariate time series dataset for space weather data analytics.

    abstract::We introduce and make openly accessible a comprehensive, multivariate time series (MVTS) dataset extracted from solar photospheric vector magnetograms in Spaceweather HMI Active Region Patch (SHARP) series. Our dataset also includes a cross-checked NOAA solar flare catalog that immediately facilitates solar flare pred...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Angryk RA,Martens PC,Aydin B,Kempton D,Mahajan SS,Basodi S,Ahmadzadeh A,Cai X,Filali Boubrahimi S,Hamdi SM,Schuh MA,Georgoulis MK

    更新日期:2020-07-10 00:00:00

  • Two-colour serial femtosecond crystallography dataset from gadoteridol-derivatized lysozyme for MAD phasing.

    abstract::We provide a detailed description of a gadoteridol-derivatized lysozyme (gadolinium lysozyme) two-colour serial femtosecond crystallography (SFX) dataset for multiple wavelength anomalous dispersion (MAD) structure determination. The data was collected at the Spring-8 Angstrom Compact free-electron LAser (SACLA) facil...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Gorel A,Motomura K,Fukuzawa H,Doak RB,Grünbein ML,Hilpert M,Inoue I,Kloos M,Nass Kovács G,Nango E,Nass K,Roome CM,Shoeman RL,Tanaka R,Tono K,Foucar L,Joti Y,Yabashi M,Iwata S,Ueda K,Barends TRM,Schlichting I

    更新日期:2017-12-12 00:00:00

  • Serial scanning electron microscopy of anti-PKHD1L1 immuno-gold labeled mouse hair cell stereocilia bundles.

    abstract::Serial electron microscopy techniques have proven to be a powerful tool in biology. Unfortunately, the data sets they generate lack robust and accurate automated segmentation algorithms. In this data descriptor publication, we introduce a serial focused ion beam scanning electron microscopy (FIB-SEM) dataset consistin...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Ivanchenko MV,Cicconet M,Jandal HA,Wu X,Corey DP,Indzhykulian AA

    更新日期:2020-06-17 00:00:00

  • I-BLEND, a campus-scale commercial and residential buildings electrical energy dataset.

    abstract::Efficient energy consumption at the building level is vital for sustainability. Providing energy efficient systems and solutions requires an understanding of how energy gets consumed. However, there is a general lack of large-scale open datasets about the energy consumption of buildings, which hinders the research. Th...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Rashid H,Singh P,Singh A

    更新日期:2019-02-19 00:00:00

  • A multi-omics digital research object for the genetics of sleep regulation.

    abstract::With the aim to uncover the molecular pathways underlying the regulation of sleep, we recently assembled an extensive and comprehensive systems genetics dataset interrogating a genetic reference population of mice at the levels of the genome, the brain and liver transcriptomes, the plasma metabolome, and the sleep-wak...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Jan M,Gobet N,Diessler S,Franken P,Xenarios I

    更新日期:2019-10-31 00:00:00

  • An agricultural survey for more than 9,500 African households.

    abstract::Surveys for more than 9,500 households were conducted in the growing seasons 2002/2003 or 2003/2004 in eleven African countries: Burkina Faso, Cameroon, Ghana, Niger and Senegal in western Africa; Egypt in northern Africa; Ethiopia and Kenya in eastern Africa; South Africa, Zambia and Zimbabwe in southern Africa. Hous...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Waha K,Zipf B,Kurukulasuriya P,Hassan RM

    更新日期:2016-05-24 00:00:00

  • Ground reference data for sugarcane biomass estimation in São Paulo state, Brazil.

    abstract::In order to make effective decisions on sustainable development, it is essential for sugarcane-producing countries to take into account sugarcane acreage and sugarcane production dynamics. The availability of sugarcane biophysical data along the growth season is key to an effective mapping of such dynamics, especially...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Molijn RA,Iannini L,Rocha JV,Hanssen RF

    更新日期:2018-08-07 00:00:00

  • Transcriptome data of temporal and cingulate cortex in the Rett syndrome brain.

    abstract::Rett syndrome is an X-linked neurodevelopmental disorder caused by mutation in the methyl-CpG-binding protein 2 gene (MECP2) in the majority of cases. We describe an RNA sequencing dataset of postmortem brain tissue samples from four females clinically diagnosed with Rett syndrome and four age-matched female donors. T...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Aldinger KA,Timms AE,MacDonald JW,McNamara HK,Herstein JS,Bammler TK,Evgrafov OV,Knowles JA,Levitt P

    更新日期:2020-06-19 00:00:00

  • An analecta of visualizations for foodborne illness trends and seasonality.

    abstract::Disease surveillance systems worldwide face increasing pressure to maintain and distribute data in usable formats supplemented with effective visualizations to enable actionable policy and programming responses. Annual reports and interactive portals provide access to surveillance data and visualizations depicting tem...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Simpson RB,Zhou B,Alarcon Falconi TM,Naumova EN

    更新日期:2020-10-13 00:00:00

  • Paired rRNA-depleted and polyA-selected RNA sequencing data and supporting multi-omics data from human T cells.

    abstract::Both poly(A) enrichment and ribosomal RNA depletion are commonly used for RNA sequencing. Either has its advantages and disadvantages that may lead to biases in the downstream analyses. To better access these effects, we carried out both ribosomal RNA-depleted and poly(A)-selected RNA-seq for CD4+ T naive cells isolat...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Chen L,Yang R,Kwan T,Tang C,Watt S,Zhang Y,Bourque G,Ge B,Downes K,Frontini M,Ouwehand WH,Lin JW,Soranzo N,Pastinen T,Chen L

    更新日期:2020-11-09 00:00:00

  • Corrigendum: High-throughput RNAi screen for essential genes and drug synergistic combinations in colorectal cancer.

    abstract::This corrects the article DOI: 10.1038/sdata.2017.139. ...

    journal_title:Scientific data

    pub_type: 杂志文章,已发布勘误


    authors: Williams SP,Barthorpe AS,Lightfoot H,Garnett MJ,McDermott U

    更新日期:2018-10-09 00:00:00

  • A statistical atlas of cerebral arteries generated using multi-center MRA datasets from healthy subjects.

    abstract::Magnetic resonance angiography (MRA) can capture the variation of cerebral arteries with high spatial resolution. These measurements include valuable information about the morphology, geometry, and density of brain arteries, which may be useful to identify risk factors for cerebrovascular and neurological diseases at ...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Mouches P,Forkert ND

    更新日期:2019-04-11 00:00:00

  • Creating a surrogate commuter network from Australian Bureau of Statistics census data.

    abstract::Between the 2011 and 2016 national censuses, the Australian Bureau of Statistics changed its anonymity policy compliance system for the distribution of census data. The new method has resulted in dramatic inconsistencies when comparing low-resolution data to aggregated high-resolution data. Hence, aggregated totals do...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Fair KM,Zachreson C,Prokopenko M

    更新日期:2019-08-16 00:00:00

  • Tracking vegetation phenology across diverse biomes using Version 2.0 of the PhenoCam Dataset.

    abstract::Monitoring vegetation phenology is critical for quantifying climate change impacts on ecosystems. We present an extensive dataset of 1783 site-years of phenological data derived from PhenoCam network imagery from 393 digital cameras, situated from tropics to tundra across a wide range of plant functional types, biomes...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Seyednasrollah B,Young AM,Hufkens K,Milliman T,Friedl MA,Frolking S,Richardson AD

    更新日期:2019-10-22 00:00:00

  • The effects of sequencing platforms on phylogenetic resolution in 16 S rRNA gene profiling of human feces.

    abstract::High-quality and high-throughput sequencing technologies are required for therapeutic and diagnostic analyses of human gut microbiota. Here, we evaluated the advantages and disadvantages of the various commercial sequencing platforms for studying human gut microbiota. We generated fecal bacterial sequences from 170 Ko...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Whon TW,Chung WH,Lim MY,Song EJ,Kim PS,Hyun DW,Shin NR,Bae JW,Nam YD

    更新日期:2018-04-24 00:00:00

  • flEECe, an energy use and occupant behavior dataset for net-zero energy affordable senior residential buildings.

    abstract::The behaviors of building occupants have continued to perplex scholars for years in our attempts to develop models for energy efficient housing. Building simulations, project delivery approaches, policies, and more have fell short of their optimistic goals due to the complexity of human behavior. As a part of a multip...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Paige F,Agee P,Jazizadeh F

    更新日期:2019-11-26 00:00:00

  • Multiple-data-based monthly geopotential model set LDCmgm90.

    abstract::While the GRACE (Gravity Recovery and Climate Experiment) satellite mission is of great significance in understanding various branches of Earth sciences, the quality of GRACE monthly products can be unsatisfactory due to strong longitudinal stripe-pattern errors and other flaws. Based on corrected GRACE Mascon (mass c...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Chen W,Luo J,Ray J,Yu N,Li JC

    更新日期:2019-10-23 00:00:00

  • Comprehensive analysis of the venom gland transcriptome of the spider Dolomedes fimbriatus.

    abstract::A comprehensive transcriptome analysis of an expressed sequence tag (EST) database of the spider Dolomedes fimbriatus venom glands using single-residue distribution analysis (SRDA) identified 7,169 unique sequences. Mature chains of 163 different toxin-like polypeptides were predicted on the basis of well-established ...

    journal_title:Scientific data

    pub_type: 杂志文章


    authors: Kozlov SA,Lazarev VN,Kostryukova ES,Selezneva OV,Ospanova EA,Alexeev DG,Govorun VM,Grishin EV

    更新日期:2014-08-05 00:00:00

  • Outlier analyses of the Protein Data Bank archive using a probability-density-ranking approach.

    abstract::Outlier analyses are central to scientific data assessments. Conventional outlier identification methods do not work effectively for Protein Data Bank (PDB) data, which are characterized by heavy skewness and the presence of bounds and/or long tails. We have developed a data-driven nonparametric method to identify out...

    journal_title:Scientific data



    authors: Shao C,Liu Z,Yang H,Wang S,Burley SK

    更新日期:2018-12-11 00:00:00