Linking in silico MS/MS spectra with chemistry data to improve identification of unknowns.

Abstract:

:Confident identification of unknown chemicals in high resolution mass spectrometry (HRMS) screening studies requires cohesive workflows and complementary data, tools, and software. Chemistry databases, screening libraries, and chemical metadata have become fixtures in identification workflows. To increase confidence in compound identifications, the use of structural fragmentation data collected via tandem mass spectrometry (MS/MS or MS2) is vital. However, the availability of empirically collected MS/MS data for identification of unknowns is limited. Researchers have therefore turned to in silico generation of MS/MS data for use in HRMS-based screening studies. This paper describes the generation en masse of predicted MS/MS spectra for the entirety of the US EPA's DSSTox database using competitive fragmentation modelling and a freely available open source tool, CFM-ID. The generated dataset comprises predicted MS/MS spectra for ~700,000 structures, and mappings between predicted spectra, structures, associated substances, and chemical metadata. Together, these resources facilitate improved compound identifications in HRMS screening studies. These data are accessible via an SQL database, a comma-separated export file (.csv), and EPA's CompTox Chemicals Dashboard.

journal_name

Sci Data

journal_title

Scientific data

authors

McEachran AD,Balabin I,Cathey T,Transue TR,Al-Ghoul H,Grulke C,Sobus JR,Williams AJ

doi

10.1038/s41597-019-0145-z

subject

Has Abstract

pub_date

2019-08-02 00:00:00

pages

141

issue

1

issn

2052-4463

pii

10.1038/s41597-019-0145-z

journal_volume

6

pub_type

杂志文章
  • A catalogue of 863 Rett-syndrome-causing MECP2 mutations and lessons learned from data integration.

    abstract::Rett syndrome (RTT) is a rare neurological disorder mostly caused by a genetic variation in MECP2. Making new MECP2 variants and the related phenotypes available provides data for better understanding of disease mechanisms and faster identification of variants for diagnosis. This is, however, currently hampered by the...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-020-00794-7

    authors: Ehrhart F,Jacobsen A,Rigau M,Bosio M,Kaliyaperumal R,Laros JFJ,Willighagen EL,Valencia A,Roos M,Capella-Gutierrez S,Curfs LMG,Evelo CT

    更新日期:2021-01-15 00:00:00

  • Serial scanning electron microscopy of anti-PKHD1L1 immuno-gold labeled mouse hair cell stereocilia bundles.

    abstract::Serial electron microscopy techniques have proven to be a powerful tool in biology. Unfortunately, the data sets they generate lack robust and accurate automated segmentation algorithms. In this data descriptor publication, we introduce a serial focused ion beam scanning electron microscopy (FIB-SEM) dataset consistin...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-020-0509-4

    authors: Ivanchenko MV,Cicconet M,Jandal HA,Wu X,Corey DP,Indzhykulian AA

    更新日期:2020-06-17 00:00:00

  • Comprehensive high-resolution multiple-reaction monitoring mass spectrometry for targeted eicosanoid assays.

    abstract::Eicosanoids comprise a class of bioactive lipids derived from a unique group of essential fatty acids that mediate a variety of important physiological functions. Owing to the structural diversity of these lipids, their analysis in biological samples is often a major challenge. Advancements in mass spectrometric have ...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/sdata.2018.167

    authors: Sorgi CA,Peti APF,Petta T,Meirelles AFG,Fontanari C,Moraes LAB,Faccioli LH

    更新日期:2018-08-21 00:00:00

  • Characterization of deep neural network features by decodability from human brain activity.

    abstract::Achievements of near human-level performance in object recognition by deep neural networks (DNNs) have triggered a flood of comparative studies between the brain and DNNs. Using a DNN as a proxy for hierarchical visual representations, our recent study found that human brain activity patterns measured by functional ma...

    journal_title:Scientific data

    pub_type:

    doi:10.1038/sdata.2019.12

    authors: Horikawa T,Aoki SC,Tsukamoto M,Kamitani Y

    更新日期:2019-02-12 00:00:00

  • Facial model collection for medical augmented reality in oncologic cranio-maxillofacial surgery.

    abstract::Medical augmented reality (AR) is an increasingly important topic in many medical fields. AR enables x-ray vision to see through real world objects. In medicine, this offers pre-, intra- or post-interventional visualization of "hidden" structures. In contrast to a classical monitor view, AR applications provide visual...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-019-0327-8

    authors: Gsaxner C,Wallner J,Chen X,Zemann W,Egger J

    更新日期:2019-12-09 00:00:00

  • Impacts of elevated atmospheric CO₂ on nutrient content of important food crops.

    abstract::One of the many ways that climate change may affect human health is by altering the nutrient content of food crops. However, previous attempts to study the effects of increased atmospheric CO2 on crop nutrition have been limited by small sample sizes and/or artificial growing conditions. Here we present data from a me...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/sdata.2015.36

    authors: Dietterich LH,Zanobetti A,Kloog I,Huybers P,Leakey AD,Bloom AJ,Carlisle E,Fernando N,Fitzgerald G,Hasegawa T,Holbrook NM,Nelson RL,Norton R,Ottman MJ,Raboy V,Sakai H,Sartor KA,Schwartz J,Seneweera S,Usui Y,Yoshina

    更新日期:2015-07-21 00:00:00

  • Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control.

    abstract::Direct-infusion mass spectrometry (DIMS) metabolomics is an important approach for characterising molecular responses of organisms to disease, drugs and the environment. Increasingly large-scale metabolomics studies are being conducted, necessitating improvements in both bioanalytical and computational workflows to ma...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/sdata.2014.12

    authors: Kirwan JA,Weber RJ,Broadhurst DI,Viant MR

    更新日期:2014-06-10 00:00:00

  • Spatial and temporal dynamics of multidimensional well-being, livelihoods and ecosystem services in coastal Bangladesh.

    abstract::Populations in resource dependent economies gain well-being from the natural environment, in highly spatially and temporally variable patterns. To collect information on this, we designed and implemented a 1586-household quantitative survey in the southwest coastal zone of Bangladesh. Data were collected on material, ...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/sdata.2016.94

    authors: Adams H,Adger WN,Ahmad S,Ahmed A,Begum D,Lázár AN,Matthews Z,Rahman MM,Streatfield PK

    更新日期:2016-11-08 00:00:00

  • A three-dimensional thalamocortical dataset for characterizing brain heterogeneity.

    abstract::Neural microarchitecture is heterogeneous, varying both across and within brain regions. The consistent identification of regions of interest is one of the most critical aspects in examining neurocircuitry, as these structures serve as the vital landmarks with which to map brain pathways. Access to continuous, three-d...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-020-00692-y

    authors: Prasad JA,Balwani AH,Johnson EC,Miano JD,Sampathkumar V,De Andrade V,Fezzaa K,Du M,Vescovi R,Jacobsen C,Kording KP,Gürsoy D,Gray Roncal W,Kasthuri N,Dyer EL

    更新日期:2020-10-20 00:00:00

  • Enabling precision medicine in neonatology, an integrated repository for preterm birth research.

    abstract::Preterm birth, or the delivery of an infant prior to 37 weeks of gestation, is a significant cause of infant morbidity and mortality. In the last decade, the advent and continued development of molecular profiling technologies has enabled researchers to generate vast amount of 'omics' data, which together with integra...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/sdata.2018.219

    authors: Sirota M,Thomas CG,Liu R,Zuhl M,Banerjee P,Wong RJ,Quaintance CC,Leite R,Chubiz J,Anderson R,Chappell J,Kim M,Grobman W,Zhang G,Rokas A,England SK,Parry S,Shaw GM,Simpson JL,Thomson E,Butte AJ,March of Dimes Pre

    更新日期:2018-11-06 00:00:00

  • Sample descriptors linked to metagenomic sequencing data from human and animal enteric samples from Vietnam.

    abstract::There is still limited information on the diversity of viruses co-circulating in humans and animals. Here, we report data obtained from a large field collection of enteric samples taken from humans, pigs, rodents and other mammal hosts in Vietnam between 2012 and 2016. Each of 2100 stool or rectal swab samples was sub...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-019-0215-2

    authors: Woolhouse M,Ashworth J,Bogaardt C,Tue NT,Baker S,Thwaites G,Phuc TM

    更新日期:2019-10-15 00:00:00

  • Experimental flows through an array of emerged or slightly submerged square cylinders over a rough bed.

    abstract::The experimental dataset presented was collected in an 18 m long and 1 m wide laboratory flume. Low to high flood flows through an urbanized floodplain were modelled. The floodplain bed is rough, modelled with dense artificial grass. A square cylinder array, representing house models, was set on the rough bed. The cyl...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-020-00791-w

    authors: Oukacine M,Proust S,Larrarte F,Goutal N

    更新日期:2021-01-11 00:00:00

  • Viruses of the Nahant Collection, characterization of 251 marine Vibrionaceae viruses.

    abstract::Viruses are highly discriminating in their interactions with host cells and are thought to play a major role in maintaining diversity of environmental microbes. However, large-scale ecological and genomic studies of co-occurring virus-host pairs, required to characterize the mechanistic and genomic foundations of viru...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/sdata.2018.114

    authors: Kauffman KM,Brown JM,Sharma RS,VanInsberghe D,Elsherbini J,Polz M,Kelly L

    更新日期:2018-07-03 00:00:00

  • De novo transcriptomes of 14 gammarid individuals for proteogenomic analysis of seven taxonomic groups.

    abstract::Gammarids are amphipods found worldwide distributed in fresh and marine waters. They play an important role in aquatic ecosystems and are well established sentinel species in ecotoxicology. In this study, we sequenced the transcriptomes of a male individual and a female individual for seven different taxonomic groups ...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-019-0192-5

    authors: Cogne Y,Degli-Esposti D,Pible O,Gouveia D,François A,Bouchez O,Eché C,Ford A,Geffard O,Armengaud J,Chaumot A,Almunia C

    更新日期:2019-09-27 00:00:00

  • Harmonised LUCAS in-situ land cover and use database for field surveys from 2006 to 2018 in the European Union.

    abstract::Accurately characterizing land surface changes with Earth Observation requires geo-located ground truth. In the European Union (EU), a tri-annual surveyed sample of land cover and land use has been collected since 2006 under the Land Use/Cover Area frame Survey (LUCAS). A total of 1351293 observations at 651780 unique...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-020-00675-z

    authors: d'Andrimont R,Yordanov M,Martinez-Sanchez L,Eiselt B,Palmieri A,Dominici P,Gallego J,Reuter HI,Joebges C,Lemoine G,van der Velde M

    更新日期:2020-10-16 00:00:00

  • Transcriptome profiling of interaction effects of soybean cyst nematodes and soybean aphids on soybean.

    abstract::Soybean aphid (Aphis glycines; SBA) and soybean cyst nematode (Heterodera glycines; SCN) are two major pests of soybean (Glycine max) in the United States of America. This study aims to characterize three-way interactions among soybean, SBA, and SCN using both demographic and genetic datasets. SCN-resistant and SCN-su...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-019-0140-4

    authors: Neupane S,Mathew FM,Varenhorst AJ,Nepal MP

    更新日期:2019-07-24 00:00:00

  • Draft genome of the big-headed turtle Platysternon megacephalum.

    abstract::The big-headed turtle, Platysternon megacephalum, as the sole member of the monotypic family Platysternidae, has a number of distinct characteristics including an extra-large head, long tail, flat carapace, and a preference for low water temperature environments. We performed whole genome sequencing, assembly, and gen...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-019-0067-9

    authors: Cao D,Wang M,Ge Y,Gong S

    更新日期:2019-05-16 00:00:00

  • A dataset of distribution and diversity of ticks in China.

    abstract::While tick-borne zoonoses, such as Lyme disease and tick-borne encephalitis, present an increasing global concern, knowledge of their vectors' distribution remains limited, especially for China. In this paper, we present the first comprehensive dataset of known tick species and their distributions in China, derived fr...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-019-0115-5

    authors: Zhang G,Zheng D,Tian Y,Li S

    更新日期:2019-07-01 00:00:00

  • Obstacles to the reuse of study metadata in ClinicalTrials.gov.

    abstract::Metadata that are structured using principled schemas and that use terms from ontologies are essential to making biomedical data findable and reusable for downstream analyses. The largest source of metadata that describes the experimental protocol, funding, and scientific leadership of clinical studies is ClinicalTria...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-020-00780-z

    authors: Miron L,Gonçalves RS,Musen MA

    更新日期:2020-12-18 00:00:00

  • Very high resolution, altitude-corrected, TMPA-based monthly satellite precipitation product over the CONUS.

    abstract::The Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) product provided over 17 years of gridded precipitation datasets. However, the accuracy and spatial resolution of TMPA limits the applicability in hydrometeorological applications. We present a dataset that enhances the accurac...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-020-0411-0

    authors: Hashemi H,Fayne J,Lakshmi V,Huffman GJ

    更新日期:2020-03-03 00:00:00

  • The odonate phenotypic database, a new open data resource for comparative studies of an old insect order.

    abstract::We present The Odonate Phenotypic Database (OPD): an online data resource of dragonfly and damselfly phenotypes (Insecta: Odonata). Odonata is a relatively small insect order that currently consists of about 6400 species belonging to 32 families. The database consists of multiple morphological, life-history and behavi...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-019-0318-9

    authors: Waller JT,Willink B,Tschol M,Svensson EI

    更新日期:2019-12-12 00:00:00

  • A global compendium of human Crimean-Congo haemorrhagic fever virus occurrence.

    abstract::In order to map global disease risk, a geographic database of human Crimean-Congo haemorrhagic fever virus (CCHFV) occurrence was produced by surveying peer-reviewed literature and case reports, as well as informal online sources. Here we present this database, comprising occurrence data linked to geographic point or ...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/sdata.2015.16

    authors: Messina JP,Pigott DM,Duda KA,Brownstein JS,Myers MF,George DB,Hay SI

    更新日期:2015-04-14 00:00:00

  • Long term survey of the fish community and associated benthic fauna of the Seine estuary nursery grounds.

    abstract::Estuaries are crucial ecosystems where human activities deeply affect numerous ecological functions. Here we present a survey dataset based on the monitoring of fish nursery grounds of the Seine estuary and eastern bay of Seine collected once a year using a beam trawl during three distinct periods (1995-2002, 2008-201...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-020-0572-x

    authors: Cariou T,Dubroca L,Vogel C

    更新日期:2020-07-13 00:00:00

  • Map of physical interactions between extracellular domains of Arabidopsis leucine-rich repeat receptor kinases.

    abstract::Plants use surface receptors to perceive information about many aspects of their local environment. These receptors physically interact to form both steady state and signalling competent complexes. The signalling events downstream of receptor activation impact both plant developmental and immune responses. Here, we pr...

    journal_title:Scientific data

    pub_type:

    doi:10.1038/sdata.2019.25

    authors: Mott GA,Smakowska-Luzan E,Pasha A,Parys K,Howton TC,Neuhold J,Lehner A,Grünwald K,Stolt-Bergner P,Provart NJ,Mukhtar MS,Desveaux D,Guttman DS,Belkhadir Y

    更新日期:2019-02-26 00:00:00

  • The systematic identification of cytoskeletal genes required for Drosophila melanogaster muscle maintenance.

    abstract::Animal muscles must maintain their function and structure while bearing substantial mechanical loads. How muscles withstand persistent mechanical strain is presently not well understood. Understanding the mechanisms by which tissues maintain their complex architecture is a key goal of cell biology. This dataset repres...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/sdata.2014.2

    authors: Perkins AD,Lee MJ,Tanentzapf G

    更新日期:2014-03-11 00:00:00

  • De novo transcriptome assembly databases for the butterfly orchid Phalaenopsis equestris.

    abstract::Orchids are renowned for their spectacular flowers and ecological adaptations. After the sequencing of the genome of the tropical epiphytic orchid Phalaenopsis equestris, we combined Illumina HiSeq2000 for RNA-Seq and Trinity for de novo assembly to characterize the transcriptomes for 11 diverse P. equestris tissues r...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/sdata.2016.83

    authors: Niu SC,Xu Q,Zhang GQ,Zhang YQ,Tsai WC,Hsu JL,Liang CK,Luo YB,Liu ZJ

    更新日期:2016-09-27 00:00:00

  • Computational workflow to study the seasonal variation of secondary metabolites in nine different bryophytes.

    abstract::In Eco-Metabolomics interactions are studied of non-model organisms in their natural environment and relations are made between biochemistry and ecological function. Current challenges when processing such metabolomics data involve complex experiment designs which are often carried out in large field campaigns involvi...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/sdata.2018.179

    authors: Peters K,Gorzolka K,Bruelheide H,Neumann S

    更新日期:2018-08-28 00:00:00

  • Quantitative mapping of RNA-mediated nuclear estrogen receptor β interactome in human breast cancer cells.

    abstract::The nuclear receptor estrogen receptor 2 (ESR2, ERβ) modulates cancer cell proliferation and tumor growth, exerting an oncosuppressive role in breast cancer (BC). Interaction proteomics by tandem affinity purification coupled to mass spectrometry was previously applied in BC cells to identify proteins acting in concer...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/sdata.2018.31

    authors: Giurato G,Nassa G,Salvati A,Alexandrova E,Rizzo F,Nyman TA,Weisz A,Tarallo R

    更新日期:2018-03-06 00:00:00

  • A multi-species repository of social networks.

    abstract::Social network analysis is an invaluable tool to understand the patterns, evolution, and consequences of sociality. Comparative studies over a range of social systems across multiple taxonomic groups are particularly valuable. Such studies however require quantitative social association or interaction data across mult...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/s41597-019-0056-z

    authors: Sah P,Méndez JD,Bansal S

    更新日期:2019-04-29 00:00:00

  • High resolution multi-facies realizations of sedimentary reservoir and aquifer analogs.

    abstract::Geological structures are by nature inaccessible to direct observation. This can cause difficulties in applications where a spatially explicit representation of such structures is required, in particular when modelling fluid migration in geological formations. An increasing trend in recent years has been to use analog...

    journal_title:Scientific data

    pub_type: 杂志文章

    doi:10.1038/sdata.2015.33

    authors: Bayer P,Comunian A,Höyng D,Mariethoz G

    更新日期:2015-07-07 00:00:00