Reuse of imputed data in microarray analysis increases imputation efficiency.

Abstract:

BACKGROUND:The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the methods was low and the validity of imputed values in these methods had not been fully checked. RESULTS:We developed a new cluster-based imputation method called sequential K-nearest neighbor (SKNN) method. This imputes the missing values sequentially from the gene having least missing values, and uses the imputed values for the later imputation. Although it uses the imputed values, the efficiency of this new method is greatly improved in its accuracy and computational complexity over the conventional KNN-based method and other methods based on maximum likelihood estimation. The performance of SKNN was in particular higher than other imputation methods for the data with high missing rates and large number of experiments. Application of Expectation Maximization (EM) to the SKNN method improved the accuracy, but increased computational time proportional to the number of iterations. The Multiple Imputation (MI) method, which is well known but not applied previously to microarray data, showed a similarly high accuracy as the SKNN method, with slightly higher dependency on the types of data sets. CONCLUSIONS:Sequential reuse of imputed data in KNN-based imputation greatly increases the efficiency of imputation. The SKNN method should be practically useful to save the data of some microarray experiments which have high amounts of missing entries. The SKNN method generates reliable imputed values which can be used for further cluster-based analysis of microarray data.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Kim KY,Kim BJ,Yi GS

doi

10.1186/1471-2105-5-160

keywords:

subject

Has Abstract

pub_date

2004-10-26 00:00:00

pages

160

issn

1471-2105

pii

1471-2105-5-160

journal_volume

5

pub_type

杂志文章
  • NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model.

    abstract:BACKGROUND:PacBio sequencing platform offers longer read lengths than the second-generation sequencing technologies. It has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. Due to its extremely wide range of application areas, fast sequencing simulation syste...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2208-0

    authors: Wei ZG,Zhang SW

    更新日期:2018-05-22 00:00:00

  • ProCKSI: a decision support system for Protein (structure) Comparison, Knowledge, Similarity and Information.

    abstract:BACKGROUND:We introduce the decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information (ProCKSI). ProCKSI integrates various protein similarity measures through an easy to use interface that allows the comparison of multiple proteins simultaneously. It employs the Universal Simila...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-416

    authors: Barthel D,Hirst JD,Błazewicz J,Burke EK,Krasnogor N

    更新日期:2007-10-26 00:00:00

  • A benchmark study of sequence alignment methods for protein clustering.

    abstract:BACKGROUND:Protein sequence alignment analyses have become a crucial step for many bioinformatics studies during the past decades. Multiple sequence alignment (MSA) and pair-wise sequence alignment (PSA) are two major approaches in sequence alignment. Former benchmark studies revealed drawbacks of MSA methods on nucleo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2524-4

    authors: Wang Y,Wu H,Cai Y

    更新日期:2018-12-31 00:00:00

  • Coordinates and intervals in graph-based reference genomes.

    abstract:BACKGROUND:It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1678-9

    authors: Rand KD,Grytten I,Nederbragt AJ,Storvik GO,Glad IK,Sandve GK

    更新日期:2017-05-18 00:00:00

  • HAT: hypergeometric analysis of tiling-arrays with application to promoter-GeneChip data.

    abstract:BACKGROUND:Tiling-arrays are applicable to multiple types of biological research questions. Due to its advantages (high sensitivity, resolution, unbiased), the technology is often employed in genome-wide investigations. A major challenge in the analysis of tiling-array data is to define regions-of-interest, i.e., conti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-275

    authors: Taskesen E,Beekman R,de Ridder J,Wouters BJ,Peeters JK,Touw IP,Reinders MJ,Delwel R

    更新日期:2010-05-21 00:00:00

  • ImmunoGlobe: enabling systems immunology with a manually curated intercellular immune interaction network.

    abstract:BACKGROUND:While technological advances have made it possible to profile the immune system at high resolution, translating high-throughput data into knowledge of immune mechanisms has been challenged by the complexity of the interactions underlying immune processes. Tools to explore the immune network are critical for ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03702-3

    authors: Atallah MB,Tandon V,Hiam KJ,Boyce H,Hori M,Atallah W,Spitzer MH,Engleman E,Mallick P

    更新日期:2020-08-10 00:00:00

  • methCancer-gen: a DNA methylome dataset generator for user-specified cancer type based on conditional variational autoencoder.

    abstract:BACKGROUND:Recently, DNA methylation has drawn great attention due to its strong correlation with abnormal gene activities and informative representation of the cancer status. As a number of studies focus on DNA methylation signatures in cancer, demand for utilizing publicly available methylome dataset has been increas...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3516-8

    authors: Choi J,Chae H

    更新日期:2020-05-11 00:00:00

  • Combining sequence and network information to enhance protein-protein interaction prediction.

    abstract:BACKGROUND:Protein-protein interactions (PPIs) are of great importance in cellular systems of organisms, since they are the basis of cellular structure and function and many essential cellular processes are related to that. Most proteins perform their functions by interacting with other proteins, so predicting PPIs acc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03896-6

    authors: Liu L,Zhu X,Ma Y,Piao H,Yang Y,Hao X,Fu Y,Wang L,Peng J

    更新日期:2020-12-16 00:00:00

  • Trees on networks: resolving statistical patterns of phylogenetic similarities among interacting proteins.

    abstract:BACKGROUND:Phylogenies capture the evolutionary ancestry linking extant species. Correlations and similarities among a set of species are mediated by and need to be understood in terms of the phylogenic tree. In a similar way it has been argued that biological networks also induce correlations among sets of interacting...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-470

    authors: Kelly WP,Stumpf MP

    更新日期:2010-09-20 00:00:00

  • ESTIMA, a tool for EST management in a multi-project environment.

    abstract:BACKGROUND:Single-pass, partial sequencing of complementary DNA (cDNA) libraries generates thousands of chromatograms that are processed into high quality expressed sequence tags (ESTs), and then assembled into contigs representative of putative genes. Usually, to be of value, ESTs and contigs must be associated with m...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-176

    authors: Kumar CG,LeDuc R,Gong G,Roinishivili L,Lewin HA,Liu L

    更新日期:2004-11-04 00:00:00

  • Graph-based prediction of Protein-protein interactions with attributed signed graph embedding.

    abstract:BACKGROUND:Protein-protein interactions (PPIs) are central to many biological processes. Considering that the experimental methods for identifying PPIs are time-consuming and expensive, it is important to develop automated computational methods to better predict PPIs. Various machine learning methods have been proposed...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03646-8

    authors: Yang F,Fan K,Song D,Lin H

    更新日期:2020-07-21 00:00:00

  • Improving interoperability between microbial information and sequence databases.

    abstract:BACKGROUND:Biological resources are essential tools for biomedical research. Their availability is promoted through on-line catalogues. Common Access to Biological Resources and Information (CABRI) is a service for distribution of biological resources and related data collected by 28 European culture collections. Linki...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-S4-S23

    authors: Romano P,Dawyndt P,Piersigilli F,Swings J

    更新日期:2005-12-01 00:00:00

  • Shared data science infrastructure for genomics data.

    abstract:BACKGROUND:Creating a scalable computational infrastructure to analyze the wealth of information contained in data repositories is difficult due to significant barriers in organizing, extracting and analyzing relevant data. Shared data science infrastructures like Boag is needed to efficiently process and parse data co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2967-2

    authors: Bagheri H,Muppirala U,Masonbrink RE,Severin AJ,Rajan H

    更新日期:2019-08-22 00:00:00

  • BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data.

    abstract:BACKGROUND:The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2095-4

    authors: Guo Y,Liu S,Li Z,Shang X

    更新日期:2018-04-11 00:00:00

  • NeurphologyJ: an automatic neuronal morphology quantification method and its application in pharmacological discovery.

    abstract:BACKGROUND:Automatic quantification of neuronal morphology from images of fluorescence microscopy plays an increasingly important role in high-content screenings. However, there exist very few freeware tools and methods which provide automatic neuronal morphology quantification for pharmacological discovery. RESULTS:T...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-230

    authors: Ho SY,Chao CY,Huang HL,Chiu TW,Charoenkwan P,Hwang E

    更新日期:2011-06-08 00:00:00

  • HH-suite3 for fast remote homology detection and deep protein annotation.

    abstract:BACKGROUND:HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous proteins. RESULTS:We developed a single-instructi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3019-7

    authors: Steinegger M,Meier M,Mirdita M,Vöhringer H,Haunsberger SJ,Söding J

    更新日期:2019-09-14 00:00:00

  • Automated NMR relaxation dispersion data analysis using NESSY.

    abstract:BACKGROUND:Proteins are dynamic molecules with motions ranging from picoseconds to longer than seconds. Many protein functions, however, appear to occur on the micro to millisecond timescale and therefore there has been intense research of the importance of these motions in catalysis and molecular interactions. Nuclear...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-421

    authors: Bieri M,Gooley PR

    更新日期:2011-10-27 00:00:00

  • A machine learning strategy for predicting localization of post-translational modification sites in protein-protein interacting regions.

    abstract:BACKGROUND:One very important functional domain of proteins is the protein-protein interacting region (PPIR), which forms the binding interface between interacting polypeptide chains. Post-translational modifications (PTMs) that occur in the PPIR can either interfere with or facilitate the interaction between proteins....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1165-8

    authors: Saethang T,Payne DM,Avihingsanon Y,Pisitkun T

    更新日期:2016-08-17 00:00:00

  • Detecting disease-associated genotype patterns.

    abstract:BACKGROUND:In addition to single-locus (main) effects of disease variants, there is a growing consensus that gene-gene and gene-environment interactions may play important roles in disease etiology. However, for the very large numbers of genetic markers currently in use, it has proven difficult to develop suitable and ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S1-S75

    authors: Long Q,Zhang Q,Ott J

    更新日期:2009-01-30 00:00:00

  • Meta-eQTL: a tool set for flexible eQTL meta-analysis.

    abstract:BACKGROUND:Increasing number of eQTL (Expression Quantitative Trait Loci) datasets facilitate genetics and systems biology research. Meta-analysis tools are in need to jointly analyze datasets of same or similar issue types to improve statistical power especially in trans-eQTL mapping. Meta-analysis framework is also n...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0392-0

    authors: Di Narzo AF,Cheng H,Lu J,Hao K

    更新日期:2014-11-28 00:00:00

  • Bayesian semiparametric regression models to characterize molecular evolution.

    abstract:BACKGROUND:Statistical models and methods that associate changes in the physicochemical properties of amino acids with natural selection at the molecular level typically do not take into account the correlations between such properties. We propose a Bayesian hierarchical regression model with a generalization of the Di...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-278

    authors: Datta S,Rodriguez A,Prado R

    更新日期:2012-10-30 00:00:00

  • Developing optimal input design strategies in cancer systems biology with applications to microfluidic device engineering.

    abstract:BACKGROUND:Mechanistic models are becoming more and more popular in Systems Biology; identification and control of models underlying biochemical pathways of interest in oncology is a primary goal in this field. Unfortunately the scarce availability of data still limits our understanding of the intrinsic characteristics...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S12-S4

    authors: Menolascina F,Bellomo D,Maiwald T,Bevilacqua V,Ciminelli C,Paradiso A,Tommasi S

    更新日期:2009-10-15 00:00:00

  • GLOSSI: a method to assess the association of genetic loci-sets with complex diseases.

    abstract:BACKGROUND:The developments of high-throughput genotyping technologies, which enable the simultaneous genotyping of hundreds of thousands of single nucleotide polymorphisms (SNP) have the potential to increase the benefits of genetic epidemiology studies. Although the enhanced resolution of these platforms increases th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-102

    authors: Chai HS,Sicotte H,Bailey KR,Turner ST,Asmann YW,Kocher JP

    更新日期:2009-04-03 00:00:00

  • BAGEL: a computational framework for identifying essential genes from pooled library screens.

    abstract:BACKGROUND:The adaptation of the CRISPR-Cas9 system to pooled library gene knockout screens in mammalian cells represents a major technological leap over RNA interference, the prior state of the art. New methods for analyzing the data and evaluating results are needed. RESULTS:We offer BAGEL (Bayesian Analysis of Gene...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1015-8

    authors: Hart T,Moffat J

    更新日期:2016-04-16 00:00:00

  • SplicerAV: a tool for mining microarray expression data for changes in RNA processing.

    abstract:BACKGROUND:Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but h...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-108

    authors: Robinson TJ,Dinan MA,Dewhirst M,Garcia-Blanco MA,Pearson JL

    更新日期:2010-02-25 00:00:00

  • The EnzymeTracker: an open-source laboratory information management system for sample tracking.

    abstract:BACKGROUND:In many laboratories, researchers store experimental data on their own workstation using spreadsheets. However, this approach poses a number of problems, ranging from sharing issues to inefficient data-mining. Standard spreadsheets are also error-prone, as data do not undergo any validation process. To overc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-15

    authors: Triplet T,Butler G

    更新日期:2012-01-26 00:00:00

  • FastqPuri: high-performance preprocessing of RNA-seq data.

    abstract:BACKGROUND:RNA sequencing (RNA-seq) has become the standard means of analyzing gene and transcript expression in high-throughput. While previously sequence alignment was a time demanding step, fast alignment methods and even more so transcript counting methods which avoid mapping and quantify gene and transcript expres...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2799-0

    authors: Pérez-Rubio P,Lottaz C,Engelmann JC

    更新日期:2019-05-03 00:00:00

  • PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.

    abstract:BACKGROUND:The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-rea...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-4-11

    authors: Donaldson I,Martin J,de Bruijn B,Wolting C,Lay V,Tuekam B,Zhang S,Baskin B,Bader GD,Michalickova K,Pawson T,Hogue CW

    更新日期:2003-03-27 00:00:00

  • Insertion and deletion correcting DNA barcodes based on watermarks.

    abstract:BACKGROUND:Barcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcodes for a joint sequenc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0482-7

    authors: Kracht D,Schober S

    更新日期:2015-02-18 00:00:00

  • DeepSort: deep convolutional networks for sorting haploid maize seeds.

    abstract:BACKGROUND:Maize is a leading crop in the modern agricultural industry that accounts for more than 40% grain production worldwide. THe double haploid technique that uses fewer breeding generations for generating a maize line has accelerated the pace of development of superior commercial seed varieties and has been tran...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2267-2

    authors: Veeramani B,Raymond JW,Chanda P

    更新日期:2018-08-13 00:00:00