Abstract:
:Outlier analyses are central to scientific data assessments. Conventional outlier identification methods do not work effectively for Protein Data Bank (PDB) data, which are characterized by heavy skewness and the presence of bounds and/or long tails. We have developed a data-driven nonparametric method to identify outliers in PDB data based on kernel probability density estimation. Unlike conventional outlier analyses based on location and scale, Probability Density Ranking can be used for robust assessments of distance from other observations. Analyzing PDB data from the vantage points of probability and frequency enables proper outlier identification, which is important for quality control during deposition-validation-biocuration of new three-dimensional structure data. Ranking of Probability Density also permits use of Most Probable Range as a robust measure of data dispersion that is more compact than Interquartile Range. The Probability-Density-Ranking approach can be employed to analyze outliers and data-spread on any large data set with continuous distribution.
journal_name
Sci Datajournal_title
Scientific dataauthors
Shao C,Liu Z,Yang H,Wang S,Burley SKdoi
10.1038/sdata.2018.293subject
Has Abstractpub_date
2018-12-11 00:00:00pages
180293issn
2052-4463pii
sdata2018293journal_volume
5pub_type
相关文献
Scientific Data文献大全abstract::Lysosomes are the main degradative organelles of cells and involved in a variety of processes including the recycling of macromolecules, storage of compounds, and metabolic signaling. Despite an increasing interest in the proteomic analysis of lysosomes, no systematic study of sample preparation protocols for lysosome...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-020-0399-5
更新日期:2020-02-26 00:00:00
abstract::We present The Odonate Phenotypic Database (OPD): an online data resource of dragonfly and damselfly phenotypes (Insecta: Odonata). Odonata is a relatively small insect order that currently consists of about 6400 species belonging to 32 families. The database consists of multiple morphological, life-history and behavi...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-019-0318-9
更新日期:2019-12-12 00:00:00
abstract::Ticks carry pathogens that can cause disease in both animals and humans, and there is a need to monitor the distribution and abundance of ticks and the pathogens they carry to pinpoint potential high risk areas for tick-borne disease transmission. In a joint Scandinavian study, we measured Ixodes ricinus instar abunda...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-020-00579-y
更新日期:2020-07-16 00:00:00
abstract::Fully-automated nuclear image segmentation is the prerequisite to ensure statistically significant, quantitative analyses of tissue preparations,applied in digital pathology or quantitative microscopy. The design of segmentation methods that work independently of the tissue type or preparation is complex, due to varia...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-020-00608-w
更新日期:2020-08-11 00:00:00
abstract::Animal muscles must maintain their function and structure while bearing substantial mechanical loads. How muscles withstand persistent mechanical strain is presently not well understood. Understanding the mechanisms by which tissues maintain their complex architecture is a key goal of cell biology. This dataset repres...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2014.2
更新日期:2014-03-11 00:00:00
abstract::Direct-infusion mass spectrometry (DIMS) metabolomics is an important approach for characterising molecular responses of organisms to disease, drugs and the environment. Increasingly large-scale metabolomics studies are being conducted, necessitating improvements in both bioanalytical and computational workflows to ma...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2014.12
更新日期:2014-06-10 00:00:00
abstract::Long-term datasets of number and size of lakes over the Tibetan Plateau (TP) are among the most critical components for better understanding the interactions among the cryosphere, hydrosphere, and atmosphere at regional and global scales. Due to the harsh environment and the scarcity of data over the TP, data accumula...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2016.39
更新日期:2016-06-21 00:00:00
abstract::Connective tissues such as tendon, ligament and skin are biological fibre composites comprising collagen fibrils reinforcing the weak proteoglycan-rich ground substance in extracellular matrix (ECM). One of the hallmarks of ageing of connective tissues is the progressive and irreversible change in the tissue mechanica...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2018.140
更新日期:2018-07-24 00:00:00
abstract::The detection, identification, and localization of illicit nuclear materials in urban environments is of utmost importance for national security. Most often, the process of performing these operations consists of a team of trained individuals equipped with radiation detection devices that have built-in algorithms to a...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-020-00672-2
更新日期:2020-10-05 00:00:00
abstract::Bumblebees (Hymenoptera: Apidae) are important pollinating insects that play pivotal roles in crop production and natural ecosystem services. Although protein-coding genes in bumblebees have been extensively annotated, regulatory sequences of the genome, such as promoters and enhancers, have been poorly annotated. To ...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-020-00713-w
更新日期:2020-10-26 00:00:00
abstract::This article presents a practical roadmap for scholarly data repositories to implement data citation in accordance with the Joint Declaration of Data Citation Principles, a synopsis and harmonization of the recommendations of major science policy bodies. The roadmap was developed by the Repositories Expert Group, as p...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-019-0031-8
更新日期:2019-04-10 00:00:00
abstract::Orchids are renowned for their spectacular flowers and ecological adaptations. After the sequencing of the genome of the tropical epiphytic orchid Phalaenopsis equestris, we combined Illumina HiSeq2000 for RNA-Seq and Trinity for de novo assembly to characterize the transcriptomes for 11 diverse P. equestris tissues r...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2016.83
更新日期:2016-09-27 00:00:00
abstract::Long-term exposure to air pollution is considered a major public health concern and has been related to overall mortality and various diseases such as respiratory and cardiovascular disease. Due to the spatial variability of air pollution concentrations, assessment of individual exposure to air pollution requires spat...
journal_title:Scientific data
pub_type:
doi:10.1038/sdata.2019.35
更新日期:2019-03-12 00:00:00
abstract::Social network analysis is an invaluable tool to understand the patterns, evolution, and consequences of sociality. Comparative studies over a range of social systems across multiple taxonomic groups are particularly valuable. Such studies however require quantitative social association or interaction data across mult...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-019-0056-z
更新日期:2019-04-29 00:00:00
abstract::Olympia oysters are found along the west coast of North America and as the only native oyster species in the region, receive considerable attention with regard to restoration and conservation. Knowledge of genetic structure of this species is essential for resource managers. Here we provide genetic data for three dist...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2017.130
更新日期:2017-09-12 00:00:00
abstract::Patient-specific craniofacial implants are used to repair skull bone defects after trauma or surgery. Currently, cranial implants are designed and produced by third-party suppliers, which is usually time-consuming and expensive. Recent advances in additive manufacturing made the in-hospital or in-operation-room fabric...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-021-00806-0
更新日期:2021-01-29 00:00:00
abstract::Human motion capture is commonly used in various fields, including sport, to analyze, understand, and synthesize kinematic and kinetic data. Specialized computer vision and marker-based optical motion capture techniques constitute the gold-standard for accurate and robust human motion capture. The dataset presented co...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-021-00801-5
更新日期:2021-01-18 00:00:00
abstract::Surveys for more than 9,500 households were conducted in the growing seasons 2002/2003 or 2003/2004 in eleven African countries: Burkina Faso, Cameroon, Ghana, Niger and Senegal in western Africa; Egypt in northern Africa; Ethiopia and Kenya in eastern Africa; South Africa, Zambia and Zimbabwe in southern Africa. Hous...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2016.20
更新日期:2016-05-24 00:00:00
abstract::A comprehensive transcriptome analysis of an expressed sequence tag (EST) database of the spider Dolomedes fimbriatus venom glands using single-residue distribution analysis (SRDA) identified 7,169 unique sequences. Mature chains of 163 different toxin-like polypeptides were predicted on the basis of well-established ...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2014.23
更新日期:2014-08-05 00:00:00
abstract::The Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) product provided over 17 years of gridded precipitation datasets. However, the accuracy and spatial resolution of TMPA limits the applicability in hydrometeorological applications. We present a dataset that enhances the accurac...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-020-0411-0
更新日期:2020-03-03 00:00:00
abstract::Magnetic resonance angiography (MRA) can capture the variation of cerebral arteries with high spatial resolution. These measurements include valuable information about the morphology, geometry, and density of brain arteries, which may be useful to identify risk factors for cerebrovascular and neurological diseases at ...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-019-0034-5
更新日期:2019-04-11 00:00:00
abstract::Advances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available. However, idiosy...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2017.35
更新日期:2017-04-11 00:00:00
abstract::Populations in resource dependent economies gain well-being from the natural environment, in highly spatially and temporally variable patterns. To collect information on this, we designed and implemented a 1586-household quantitative survey in the southwest coastal zone of Bangladesh. Data were collected on material, ...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2016.94
更新日期:2016-11-08 00:00:00
abstract::While the GRACE (Gravity Recovery and Climate Experiment) satellite mission is of great significance in understanding various branches of Earth sciences, the quality of GRACE monthly products can be unsatisfactory due to strong longitudinal stripe-pattern errors and other flaws. Based on corrected GRACE Mascon (mass c...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-019-0239-7
更新日期:2019-10-23 00:00:00
abstract::Efficient energy consumption at the building level is vital for sustainability. Providing energy efficient systems and solutions requires an understanding of how energy gets consumed. However, there is a general lack of large-scale open datasets about the energy consumption of buildings, which hinders the research. Th...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2019.15
更新日期:2019-02-19 00:00:00
abstract::Transparent evaluations of FAIRness are increasingly required by a wide range of stakeholders, from scientists to publishers, funding agencies and policy makers. We propose a scalable, automatable framework to evaluate digital resources that encompasses measurable indicators, open source tools, and participation guide...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-019-0184-5
更新日期:2019-09-20 00:00:00
abstract::Induced pluripotent stem cells (iPSCs) and human embryonic stem cells (hESCs) differentiated into hepatocyte-like cells (HLCs) provide a defined and renewable source of cells for drug screening, toxicology and regenerative medicine. We previously reprogrammed human fetal foreskin fibroblast cells (HFF1) into iPSCs emp...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2018.35
更新日期:2018-03-13 00:00:00
abstract::We provide a detailed description of a gadoteridol-derivatized lysozyme (gadolinium lysozyme) two-colour serial femtosecond crystallography (SFX) dataset for multiple wavelength anomalous dispersion (MAD) structure determination. The data was collected at the Spring-8 Angstrom Compact free-electron LAser (SACLA) facil...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2017.188
更新日期:2017-12-12 00:00:00
abstract::Magnetic resonance imaging (MRI) captures the dynamics of brain development with multiple modalities that quantify both structure and function. These measurements may yield valuable insights into the neural patterns that mark healthy maturation or that identify early risk for psychiatric disorder. The Pediatric Templa...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/sdata.2015.3
更新日期:2015-02-03 00:00:00
abstract::Good access to resources and opportunities is essential for sustainable development. Improving access, especially in rural areas, requires useful measures of current access to the locations where these resources and opportunities are found. Recent work has developed a global map of travel times to cities with more tha...
journal_title:Scientific data
pub_type: 杂志文章
doi:10.1038/s41597-019-0265-5
更新日期:2019-11-07 00:00:00