Conceptual-level workflow modeling of scientific experiments using NMR as a case study.

Abstract:

BACKGROUND:Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. RESULTS:We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. CONCLUSION:Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Verdi KK,Ellis HJ,Gryk MR

doi

10.1186/1471-2105-8-31

subject

Has Abstract

pub_date

2007-01-30 00:00:00

pages

31

issn

1471-2105

pii

1471-2105-8-31

journal_volume

8

pub_type

杂志文章
  • Hierarchical modularity of nested bow-ties in metabolic networks.

    abstract:BACKGROUND:The exploration of the structural topology and the organizing principles of genome-based large-scale metabolic networks is essential for studying possible relations between structure and functionality of metabolic networks. Topological analysis of graph models has often been applied to study the structural c...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-386

    authors: Zhao J,Yu H,Luo JH,Cao ZW,Li YX

    更新日期:2006-08-18 00:00:00

  • imputeqc: an R package for assessing imputation quality of genotypes and optimizing imputation parameters.

    abstract:BACKGROUND:The imputation of genotypes increases the power of genome-wide association studies. However, the imputation quality should be assessed in each particular case. Nevertheless, not all imputation softwares control the error of output, e.g., the last release of fastPHASE program (1.4.8) lacks such an option. In ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03589-0

    authors: Khvorykh GV,Khrunin AV

    更新日期:2020-07-24 00:00:00

  • A weighted string kernel for protein fold recognition.

    abstract:BACKGROUND:Alignment-free methods for comparing protein sequences have proved to be viable alternatives to approaches that first rely on an alignment of the sequences to be compared. Much work however need to be done before those methods provide reliable fold recognition for proteins whose sequences share little simila...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1795-5

    authors: Nojoomi S,Koehl P

    更新日期:2017-08-25 00:00:00

  • Estimation of evolutionary parameters using short, random and partial sequences from mixed samples of anonymous individuals.

    abstract:BACKGROUND:Over the last decade, next generation sequencing (NGS) has become widely available, and is now the sequencing technology of choice for most researchers. Nonetheless, NGS presents a challenge for the evolutionary biologists who wish to estimate evolutionary genetic parameters from a mixed sample of unlabelled...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0810-y

    authors: Wu SH,Rodrigo AG

    更新日期:2015-11-04 00:00:00

  • Web-TCGA: an online platform for integrated analysis of molecular cancer data sets.

    abstract:BACKGROUND:The Cancer Genome Atlas (TCGA) is a pool of molecular data sets publicly accessible and freely available to cancer researchers anywhere around the world. However, wide spread use is limited since an advanced knowledge of statistics and statistical software is required. RESULTS:In order to improve accessibil...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-0917-9

    authors: Deng M,Brägelmann J,Schultze JL,Perner S

    更新日期:2016-02-06 00:00:00

  • IILLS: predicting virus-receptor interactions based on similarity and semi-supervised learning.

    abstract:BACKGROUND:Viral infectious diseases are the serious threat for human health. The receptor-binding is the first step for the viral infection of hosts. To more effectively treat human viral infectious diseases, the hidden virus-receptor interactions must be discovered. However, current computational methods for predicti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3278-3

    authors: Yan C,Duan G,Wu FX,Wang J

    更新日期:2019-12-27 00:00:00

  • A Bayesian data fusion based approach for learning genome-wide transcriptional regulatory networks.

    abstract:BACKGROUND:Reverse engineering of transcriptional regulatory networks (TRN) from genomics data has always represented a computational challenge in System Biology. The major issue is modeling the complex crosstalk among transcription factors (TFs) and their target genes, with a method able to handle both the high number...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3510-1

    authors: Sauta E,Demartini A,Vitali F,Riva A,Bellazzi R

    更新日期:2020-05-29 00:00:00

  • SPECS: a non-parametric method to identify tissue-specific molecular features for unbalanced sample groups.

    abstract:BACKGROUND:To understand biology and differences among various tissues or cell types, one typically searches for molecular features that display characteristic abundance patterns. Several specificity metrics have been introduced to identify tissue-specific molecular features, but these either require an equal number of...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3407-z

    authors: Everaert C,Volders PJ,Morlion A,Thas O,Mestdagh P

    更新日期:2020-02-17 00:00:00

  • Robust detection of periodic time series measured from biological systems.

    abstract:BACKGROUND:Periodic phenomena are widespread in biology. The problem of finding periodicity in biological time series can be viewed as a multiple hypothesis testing of the spectral content of a given time series. The exact noise characteristics are unknown in many bioinformatics applications. Furthermore, the observed ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-117

    authors: Ahdesmäki M,Lähdesmäki H,Pearson R,Huttunen H,Yli-Harja O

    更新日期:2005-05-13 00:00:00

  • KinMap: a web-based tool for interactive navigation through human kinome data.

    abstract:BACKGROUND:Annotations of the phylogenetic tree of the human kinome is an intuitive way to visualize compound profiling data, structural features of kinases or functional relationships within this important class of proteins. The increasing volume and complexity of kinase-related data underlines the need for a tool tha...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1433-7

    authors: Eid S,Turk S,Volkamer A,Rippmann F,Fulle S

    更新日期:2017-01-05 00:00:00

  • Predicting MoRFs in protein sequences using HMM profiles.

    abstract:BACKGROUND:Intrinsically Disordered Proteins (IDPs) lack an ordered three-dimensional structure and are enriched in various biological processes. The Molecular Recognition Features (MoRFs) are functional regions within IDPs that undergo a disorder-to-order transition on binding to a partner protein. Identifying MoRFs i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1375-0

    authors: Sharma R,Kumar S,Tsunoda T,Patil A,Sharma A

    更新日期:2016-12-22 00:00:00

  • ProLego: tool for extracting and visualizing topological modules in protein structures.

    abstract:BACKGROUND:In protein design, correct use of topology is among the initial and most critical feature. Meticulous selection of backbone topology aids in drastically reducing the structure search space. With ProLego, we present a server application to explore the component aspect of protein structures and provide an intu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2171-9

    authors: Khan T,Panday SK,Ghosh I

    更新日期:2018-05-04 00:00:00

  • Metabolic network alignment in large scale by network compression.

    abstract::Metabolic network alignment is a system scale comparative analysis that discovers important similarities and differences across different metabolisms and organisms. Although the problem of aligning metabolic networks has been considered in the past, the computational complexity of the existing solutions has so far lim...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S3-S2

    authors: Ay F,Dang M,Kahveci T

    更新日期:2012-03-21 00:00:00

  • NEAT: an efficient network enrichment analysis test.

    abstract:BACKGROUND:Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1203-6

    authors: Signorelli M,Vinciotti V,Wit EC

    更新日期:2016-09-05 00:00:00

  • Widespread evidence of viral miRNAs targeting host pathways.

    abstract:BACKGROUND:MicroRNAs (miRNA) are regulatory genes that target and repress other RNA molecules via sequence-specific binding. Several biological processes are regulated across many organisms by evolutionarily conserved miRNAs. Plants and invertebrates employ their miRNA in defense against viruses by targeting and degrad...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S2-S3

    authors: Carl JW Jr,Trgovcich J,Hannenhalli S

    更新日期:2013-01-01 00:00:00

  • ProMEX: a mass spectral reference database for proteins and protein phosphorylation sites.

    abstract:BACKGROUND:In the last decade, techniques were established for the large scale genome-wide analysis of proteins, RNA, and metabolites, and database solutions have been developed to manage the generated data sets. The Golm Metabolome Database for metabolite data (GMD) represents one such effort to make these data broadl...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-216

    authors: Hummel J,Niemann M,Wienkoop S,Schulze W,Steinhauser D,Selbig J,Walther D,Weckwerth W

    更新日期:2007-06-23 00:00:00

  • Efficient and automated large-scale detection of structural relationships in proteins with a flexible aligner.

    abstract:BACKGROUND:The total number of known three-dimensional protein structures is rapidly increasing. Consequently, the need for fast structural search against complete databases without a significant loss of accuracy is increasingly demanding. Recently, TopSearch, an ultra-fast method for finding rigid structural relations...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0866-8

    authors: Gutiérrez FI,Rodriguez-Valenzuela F,Ibarra IL,Devos DP,Melo F

    更新日期:2016-01-05 00:00:00

  • Calibration and assessment of channel-specific biases in microarray data with extended dynamical range.

    abstract:BACKGROUND:Non-linearities in observed log-ratios of gene expressions, also known as intensity dependent log-ratios, can often be accounted for by global biases in the two channels being compared. Any step in a microarray process may introduce such offsets and in this article we study the biases introduced by the micro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-177

    authors: Bengtsson H,Jönsson G,Vallon-Christersson J

    更新日期:2004-11-12 00:00:00

  • Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts.

    abstract:BACKGROUND:Word sense disambiguation (WSD) attempts to solve lexical ambiguities by identifying the correct meaning of a word based on its context. WSD has been demonstrated to be an important step in knowledge-based approaches to automatic summarization. However, the correlation between the accuracy of the WSD methods...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-355

    authors: Plaza L,Jimeno-Yepes AJ,Díaz A,Aronson AR

    更新日期:2011-08-26 00:00:00

  • Prediction of novel long non-coding RNAs based on RNA-Seq data of mouse Klf1 knockout study.

    abstract:BACKGROUND:Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions. RESULTS:We present a computational pipeline for detecting novel lncRNAs fro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-331

    authors: Sun L,Zhang Z,Bailey TL,Perkins AC,Tallack MR,Xu Z,Liu H

    更新日期:2012-12-13 00:00:00

  • Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie.

    abstract:BACKGROUND:Bioinformatics software quality assurance is essential in genomic medicine. Systematic verification and validation of bioinformatics software is difficult because it is often not possible to obtain a realistic "gold standard" for systematic evaluation. Here we apply a technique that originates from the softw...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S16-S15

    authors: Giannoulatou E,Park SH,Humphreys DT,Ho JW

    更新日期:2014-01-01 00:00:00

  • SpectralNET--an application for spectral graph analysis and visualization.

    abstract:BACKGROUND:Graph theory provides a computational framework for modeling a variety of datasets including those emerging from genomics, proteomics, and chemical genetics. Networks of genes, proteins, small molecules, or other objects of study can be represented as graphs of nodes (vertices) and interactions (edges) that ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-260

    authors: Forman JJ,Clemons PA,Schreiber SL,Haggarty SJ

    更新日期:2005-10-19 00:00:00

  • Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to life span.

    abstract:BACKGROUND:The statistical modeling of biomedical corpora could yield integrated, coarse-to-fine views of biological phenomena that complement discoveries made from analysis of molecular sequence and profiling data. Here, the potential of such modeling is demonstrated by examining the 5,225 free-text items in the Caeno...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-250

    authors: Blei DM,Franks K,Jordan MI,Mian IS

    更新日期:2006-05-08 00:00:00

  • Prediction of TF target sites based on atomistic models of protein-DNA complexes.

    abstract:BACKGROUND:The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for model...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-436

    authors: Angarica VE,Pérez AG,Vasconcelos AT,Collado-Vides J,Contreras-Moreira B

    更新日期:2008-10-16 00:00:00

  • Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq.

    abstract:BACKGROUND:Alternative splicing isoforms have been reported as a new and robust class of diagnostic biomarkers. Over 95% of human genes are estimated to be alternatively spliced as a powerful means of producing functionally diverse proteins from a single gene. The emergence of next-generation sequencing technologies, e...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03824-8

    authors: Zhang F,Deng CK,Wang M,Deng B,Barber R,Huang G

    更新日期:2020-12-03 00:00:00

  • Smith-Waterman peak alignment for comprehensive two-dimensional gas chromatography-mass spectrometry.

    abstract:BACKGROUND:Comprehensive two-dimensional gas chromatography coupled with mass spectrometry (GC × GC-MS) is a powerful technique which has gained increasing attention over the last two decades. The GC × GC-MS provides much increased separation capacity, chemical selectivity and sensitivity for complex sample analysis an...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-235

    authors: Kim S,Koo I,Fang A,Zhang X

    更新日期:2011-06-15 00:00:00

  • SLR: a scaffolding algorithm based on long reads and contig classification.

    abstract:BACKGROUND:Scaffolding is an important step in genome assembly that orders and orients the contigs produced by assemblers. However, repetitive regions in contigs usually prevent scaffolding from producing accurate results. How to solve the problem of repetitive regions has received a great deal of attention. In the pas...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3114-9

    authors: Luo J,Lyu M,Chen R,Zhang X,Luo H,Yan C

    更新日期:2019-10-30 00:00:00

  • Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology.

    abstract:BACKGROUND:Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging vari...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-248

    authors: Masino AJ,Dechene ET,Dulik MC,Wilkens A,Spinner NB,Krantz ID,Pennington JW,Robinson PN,White PS

    更新日期:2014-07-21 00:00:00

  • Accelerating a cross-correlation score function to search modifications using a single GPU.

    abstract:BACKGROUND:A cross-correlation (XCorr) score function is one of the most popular score functions utilized to search peptide identifications in databases, and many computer programs, such as SEQUEST, Comet, and Tide, currently use this score function. Recently, the HiXCorr algorithm was developed to speed up this score ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2559-6

    authors: Kim H,Han S,Um JH,Park K

    更新日期:2018-12-12 00:00:00

  • Reporting and connecting cell type names and gating definitions through ontologies.

    abstract:BACKGROUND:Human immunology studies often rely on the isolation and quantification of cell populations from an input sample based on flow cytometry and related techniques. Such techniques classify cells into populations based on the detection of a pattern of markers. The description of the cell populations targeted in ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2725-5

    authors: Overton JA,Vita R,Dunn P,Burel JG,Bukhari SAC,Cheung KH,Kleinstein SH,Diehl AD,Peters B

    更新日期:2019-04-25 00:00:00