Towards mainstreaming of biodiversity data publishing: recommendations of the GBIF Data Publishing Framework Task Group.

Abstract:

BACKGROUND:Data are the evidentiary basis for scientific hypotheses, analyses and publication, for policy formation and for decision-making. They are essential to the evaluation and testing of results by peer scientists both present and future. There is broad consensus in the scientific and conservation communities that data should be freely, openly available in a sustained, persistent and secure way, and thus standards for 'free' and 'open' access to data have become well developed in recent years. The question of effective access to data remains highly problematic. DISCUSSION:Specifically with respect to scientific publishing, the ability to critically evaluate a published scientific hypothesis or scientific report is contingent on the examination, analysis, evaluation - and if feasible - on the re-generation of data on which conclusions are based. It is not coincidental that in the recent 'climategate' controversies, the quality and integrity of data and their analytical treatment were central to the debate. There is recent evidence that even when scientific data are requested for evaluation they may not be available. The history of dissemination of scientific results has been marked by paradigm shifts driven by the emergence of new technologies. In recent decades, the advance of computer-based technology linked to global communications networks has created the potential for broader and more consistent dissemination of scientific information and data. Yet, in this digital era, scientists and conservationists, organizations and institutions have often been slow to make data available. Community studies suggest that the withholding of data can be attributed to a lack of awareness, to a lack of technical capacity, to concerns that data should be withheld for reasons of perceived personal or organizational self interest, or to lack of adequate mechanisms for attribution. CONCLUSIONS:There is a clear need for institutionalization of a 'data publishing framework' that can address sociocultural, technical-infrastructural, policy, political and legal constraints, as well as addressing issues of sustainability and financial support. To address these aspects of a data publishing framework - a systematic, standard approach to the formal definition and public disclosure of data - in the context of biodiversity data, the Global Biodiversity Information Facility (GBIF, the single inter-governmental body most clearly mandated to undertake such an effort) convened a Data Publishing Framework Task Group. We conceive this data publishing framework as an environment conducive to ensure free and open access to world's biodiversity data. Here, we present the recommendations of that Task Group, which are intended to encourage free and open access to the worlds' biodiversity data.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Moritz T,Krishnan S,Roberts D,Ingwersen P,Agosti D,Penev L,Cockerill M,Chavan V,Data Publishing Framework Task Group.

doi

10.1186/1471-2105-12-S15-S1

subject

Has Abstract

pub_date

2011-01-01 00:00:00

pages

S1

issn

1471-2105

pii

1471-2105-12-S15-S1

journal_volume

12 Suppl 15

pub_type

指南,杂志文章
  • DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark.

    abstract:BACKGROUND:XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts. A more scalable implementation would reduce the need for specialized computational resources and enable increased exploration of the configuration parame...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3108-7

    authors: Linderman MD,Chia D,Wallace F,Nothaft FA

    更新日期:2019-10-11 00:00:00

  • On the detection of functionally coherent groups of protein domains with an extension to protein annotation.

    abstract:BACKGROUND:Protein domains coordinate to perform multifaceted cellular functions, and domain combinations serve as the functional building blocks of the cell. The available methods to identify functional domain combinations are limited in their scope, e.g. to the identification of combinations falling within individual...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-390

    authors: McLaughlin WA,Chen K,Hou T,Wang W

    更新日期:2007-10-16 00:00:00

  • INBIA: a boosting methodology for proteomic network inference.

    abstract:BACKGROUND:The analysis of tissue-specific protein interaction networks and their functional enrichment in pathological and normal tissues provides insights on the etiology of diseases. The Pan-cancer proteomic project, in The Cancer Genome Atlas, collects protein expressions in human cancers and it is a reference reso...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2183-5

    authors: Sardina DS,Micale G,Ferro A,Pulvirenti A,Giugno R

    更新日期:2018-07-09 00:00:00

  • PVT: an efficient computational procedure to speed up next-generation sequence analysis.

    abstract:BACKGROUND:High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the dif...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-167

    authors: Maji RK,Sarkar A,Khatua S,Dasgupta S,Ghosh Z

    更新日期:2014-06-04 00:00:00

  • Ranking analysis of F-statistics for microarray data.

    abstract:BACKGROUND:Microarray technology provides an efficient means for globally exploring physiological processes governed by the coordinated expression of multiple genes. However, identification of genes differentially expressed in microarray experiments is challenging because of their potentially high type I error rate. Me...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-142

    authors: Tan YD,Fornage M,Xu H

    更新日期:2008-03-06 00:00:00

  • A computational evaluation of over-representation of regulatory motifs in the promoter regions of differentially expressed genes.

    abstract:BACKGROUND:Observed co-expression of a group of genes is frequently attributed to co-regulation by shared transcription factors. This assumption has led to the hypothesis that promoters of co-expressed genes should share common regulatory motifs, which forms the basis for numerous computational tools that search for th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-267

    authors: Meng G,Mosig A,Vingron M

    更新日期:2010-05-20 00:00:00

  • Reduction strategies for hierarchical multi-label classification in protein function prediction.

    abstract:BACKGROUND:Hierarchical Multi-Label Classification is a classification task where the classes to be predicted are hierarchically organized. Each instance can be assigned to classes belonging to more than one path in the hierarchy. This scenario is typically found in protein function prediction, considering that each pr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1232-1

    authors: Cerri R,Barros RC,P L F de Carvalho AC,Jin Y

    更新日期:2016-09-15 00:00:00

  • An algorithm for automated closure during assembly.

    abstract:BACKGROUND:Finishing is the process of improving the quality and utility of draft genome sequences generated by shotgun sequencing and computational assembly. Finishing can involve targeted sequencing. Finishing reads may be incorporated by manual or automated means. One automated method uses targeted addition by local...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-457

    authors: Koren S,Miller JR,Walenz BP,Sutton G

    更新日期:2010-09-10 00:00:00

  • m6Acomet: large-scale functional prediction of individual m6A RNA methylation sites from an RNA co-methylation network.

    abstract:BACKGROUND:Over one hundred different types of post-transcriptional RNA modifications have been identified in human. Researchers discovered that RNA modifications can regulate various biological processes, and RNA methylation, especially N6-methyladenosine, has become one of the most researched topics in epigenetics. ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2840-3

    authors: Wu X,Wei Z,Chen K,Zhang Q,Su J,Liu H,Zhang L,Meng J

    更新日期:2019-05-02 00:00:00

  • iSeg: an efficient algorithm for segmentation of genomic and epigenomic data.

    abstract:BACKGROUND:Identification of functional elements of a genome often requires dividing a sequence of measurements along a genome into segments where adjacent segments have different properties, such as different mean values. Despite dozens of algorithms developed to address this problem in genomics research, methods with...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2140-3

    authors: Girimurugan SB,Liu Y,Lung PY,Vera DL,Dennis JH,Bass HW,Zhang J

    更新日期:2018-04-11 00:00:00

  • A Bayesian data fusion based approach for learning genome-wide transcriptional regulatory networks.

    abstract:BACKGROUND:Reverse engineering of transcriptional regulatory networks (TRN) from genomics data has always represented a computational challenge in System Biology. The major issue is modeling the complex crosstalk among transcription factors (TFs) and their target genes, with a method able to handle both the high number...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3510-1

    authors: Sauta E,Demartini A,Vitali F,Riva A,Bellazzi R

    更新日期:2020-05-29 00:00:00

  • MGOGP: a gene module-based heuristic algorithm for cancer-related gene prioritization.

    abstract:BACKGROUND:Prioritizing genes according to their associations with a cancer allows researchers to explore genes in more informed ways. By far, Gene-centric or network-centric gene prioritization methods are predominated. Genes and their protein products carry out cellular processes in the context of functional modules....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2216-0

    authors: Su L,Liu G,Bai T,Meng X,Ma Q

    更新日期:2018-06-05 00:00:00

  • Graph-representation of oxidative folding pathways.

    abstract:BACKGROUND:The process of oxidative folding combines the formation of native disulfide bond with conformational folding resulting in the native three-dimensional fold. Oxidative folding pathways can be described in terms of disulfide intermediate species (DIS) which can also be isolated and characterized. Each DIS corr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-19

    authors: Agoston V,Cemazar M,Kaján L,Pongor S

    更新日期:2005-01-27 00:00:00

  • BiPOm: a rule-based ontology to represent and infer molecule knowledge from a biological process-centered viewpoint.

    abstract:BACKGROUND:Managing and organizing biological knowledge remains a major challenge, due to the complexity of living systems. Recently, systemic representations have been promising in tackling such a challenge at the whole-cell scale. In such representations, the cell is considered as a system composed of interlocked sub...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03637-9

    authors: Henry V,Saïs F,Inizan O,Marchadier E,Dibie J,Goelzer A,Fromion V

    更新日期:2020-07-23 00:00:00

  • Directed acyclic graph kernels for structural RNA analysis.

    abstract:BACKGROUND:Recent discoveries of a large variety of important roles for non-coding RNAs (ncRNAs) have been reported by numerous researchers. In order to analyze ncRNAs by kernel methods including support vector machines, we propose stem kernels as an extension of string kernels for measuring the similarities between tw...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-318

    authors: Sato K,Mituyama T,Asai K,Sakakibara Y

    更新日期:2008-07-22 00:00:00

  • Pre-processing Agilent microarray data.

    abstract:BACKGROUND:Pre-processing methods for two-sample long oligonucleotide arrays, specifically the Agilent technology, have not been extensively studied. The goal of this study is to quantify some of the sources of error that affect measurement of expression using Agilent arrays and to compare Agilent's Feature Extraction ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-142

    authors: Zahurak M,Parmigiani G,Yu W,Scharpf RB,Berman D,Schaeffer E,Shabbeer S,Cope L

    更新日期:2007-05-01 00:00:00

  • Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashing.

    abstract:BACKGROUND:Clustering of protein sequences is of key importance in predicting the structure and function of newly sequenced proteins and is also of use for their annotation. With the advent of multiple high-throughput sequencing technologies, new protein sequences are becoming available at an extraordinary rate. The ra...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2080-y

    authors: Abnousi A,Broschat SL,Kalyanaraman A

    更新日期:2018-03-05 00:00:00

  • Towards barcode markers in Fungi: an intron map of Ascomycota mitochondria.

    abstract:BACKGROUND:A standardized and cost-effective molecular identification system is now an urgent need for Fungi owing to their wide involvement in human life quality. In particular the potential use of mitochondrial DNA species markers has been taken in account. Unfortunately, a serious difficulty in the PCR and bioinform...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S6-S15

    authors: Santamaria M,Vicario S,Pappadà G,Scioscia G,Scazzocchio C,Saccone C

    更新日期:2009-06-16 00:00:00

  • EGNAS: an exhaustive DNA sequence design algorithm.

    abstract:BACKGROUND:The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA) is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of seq...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-138

    authors: Kick A,Bönsch M,Mertig M

    更新日期:2012-06-20 00:00:00

  • Improving ontologies by automatic reasoning and evaluation of logical definitions.

    abstract:BACKGROUND:Ontologies are widely used to represent knowledge in biomedicine. Systematic approaches for detecting errors and disagreements are needed for large ontologies with hundreds or thousands of terms and semantic relationships. A recent approach of defining terms using logical definitions is now increasingly bein...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-418

    authors: Köhler S,Bauer S,Mungall CJ,Carletti G,Smith CL,Schofield P,Gkoutos GV,Robinson PN

    更新日期:2011-10-27 00:00:00

  • Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data.

    abstract:BACKGROUND:Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-282

    authors: Lopez D,Casero D,Cokus SJ,Merchant SS,Pellegrini M

    更新日期:2011-07-12 00:00:00

  • CorrelaGenes: a new tool for the interpretation of the human transcriptome.

    abstract:BACKGROUND:The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform them in information easily accessible to biologists. RESULTS:By exploiting expression data publicly available in the Gene Expression Omnibus (GEO) d...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S1-S6

    authors: Cremaschi P,Rovida S,Sacchi L,Lisa A,Calvi F,Montecucco A,Biamonti G,Bione S,Sacchi G

    更新日期:2014-01-01 00:00:00

  • ICoVax 2013: the 3rd ISV Pre-conference Computational Vaccinology Workshop.

    abstract::Following last year's computational vaccinology workshop in Shanghai, China, the third ISV Pre-conference Computational Vaccinology Workshop (ICoVax 2013) was held in Barcelona, Spain. ICoVax 2013 provided an international platform for the attendees to showcase their research and discuss problems and solutions in the ...

    journal_title:BMC bioinformatics

    pub_type:

    doi:10.1186/1471-2105-15-S4-I1

    authors: De Groot AS,De Groot P,He Y

    更新日期:2014-01-01 00:00:00

  • Critique of the pairwise method for estimating qPCR amplification efficiency: beware of correlated data!

    abstract:BACKGROUND:A recently proposed method for estimating qPCR amplification efficiency E analyzes fluorescence intensity ratios from pairs of points deemed to lie in the exponential growth region on the amplification curves for all reactions in a dilution series. This method suffers from a serious problem: The resulting ra...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03604-4

    authors: Tellinghuisen J

    更新日期:2020-07-08 00:00:00

  • Qxpak.5: old mixed model solutions for new genomics problems.

    abstract:BACKGROUND:Mixed models have a long and fruitful history in statistics. They are pertinent to genomics problems because they are highly versatile, accommodating a wide variety of situations within the same theoretical and algorithmic framework. RESULTS:Qxpak is a package for versatile statistical genomics, specificall...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-202

    authors: Pérez-Enciso M,Misztal I

    更新日期:2011-05-25 00:00:00

  • Species-specific analysis of protein sequence motifs using mutual information.

    abstract:BACKGROUND:Protein sequence motifs are by definition short fragments of conserved amino acids, often associated with a specific function. Accordingly protein sequence profiles derived from multiple sequence alignments provide an alternative description of functional motifs characterizing families of related sequences. ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-164

    authors: Hummel J,Keshvari N,Weckwerth W,Selbig J

    更新日期:2005-06-29 00:00:00

  • Moiety modeling framework for deriving moiety abundances from mass spectrometry measured isotopologues.

    abstract:BACKGROUND:Stable isotope tracing can follow individual atoms through metabolic transformations through the detection of the incorporation of stable isotope within metabolites. This resulting data can be interpreted in terms related to metabolic flux. However, detection of a stable isotope in metabolites by mass spectr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3096-7

    authors: Jin H,Moseley HNB

    更新日期:2019-10-28 00:00:00

  • Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient.

    abstract:BACKGROUND:Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-158

    authors: Stoltzfus A,Lapp H,Matasci N,Deus H,Sidlauskas B,Zmasek CM,Vaidya G,Pontelli E,Cranston K,Vos R,Webb CO,Harmon LJ,Pirrung M,O'Meara B,Pennell MW,Mirarab S,Rosenberg MS,Balhoff JP,Bik HM,Heath TA,Midford PE,Brown

    更新日期:2013-05-13 00:00:00

  • A CoD-based stationary control policy for intervening in large gene regulatory networks.

    abstract:BACKGROUND:One of the most important goals of the mathematical modeling of gene regulatory networks is to alter their behavior toward desirable phenotypes. Therapeutic techniques are derived for intervention in terms of stationary control policies. In large networks, it becomes computationally burdensome to derive an o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S10-S10

    authors: Ghaffari N,Ivanov I,Qian X,Dougherty ER

    更新日期:2011-10-18 00:00:00

  • Predicting protein functions by relaxation labelling protein interaction network.

    abstract:BACKGROUND:One of key issues in the post-genomic era is to assign functions to uncharacterized proteins. Since proteins seldom act alone; rather, they must interact with other biomolecular units to execute their functions. Thus, the functions of unknown proteins may be discovered through studying their interactions wit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S1-S64

    authors: Hu P,Jiang H,Emili A

    更新日期:2010-01-18 00:00:00