Unsupervised fuzzy pattern discovery in gene expression data.

Abstract:

BACKGROUND:Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. However, when class information is unavailable, discovering gene expression patterns becomes difficult. METHODS:For a gene pool with large number of genes, we first cluster the genes into smaller groups. In each group, we use the representative gene, one with highest interdependence with others in the group, to drive the discretization of the gene expression levels of other genes. Treating intervals as discrete events, association patterns of events can be discovered. If the gene groups obtained are crisp gene clusters, significant patterns overlapping different gene clusters cannot be found. This paper presents a new method of "fuzzifying" the crisp gene clusters to overcome such problem. RESULTS:To evaluate the effectiveness of our approach, we first apply the above described procedure on a synthetic data set and then a gene expression data set with known class labels. The class labels are not being used in both analyses but used later as the ground truth in a classificatory problem for assessing the algorithm's effectiveness in fuzzy gene clustering and discretization. The results show the efficacy of the proposed method. The existence of correlation among continuous valued gene expression levels suggests that certain genes in the gene groups have high interdependence with other genes in the group. Fuzzification of a crisp gene cluster allows the cluster to take in genes from other clusters so that overlapping relationship among gene clusters could be uncovered. Hence, previously unknown hidden patterns resided in overlapping gene clusters are discovered. From the experimental results, the high order patterns discovered reveal multiple gene interaction patterns in cancerous tissues not found in normal tissues. It was also found that for the colon cancer experiment, 70% of the top patterns and most of the discriminative patterns between cancerous and normal tissues are among those spanning across different crisp gene clusters. CONCLUSIONS:We show that the proposed method for analyzing the error-prone microarray is effective even without the presence of tissue class information. A unified framework is presented, allowing fast and accurate pattern discovery for gene expression data. For a large gene set, to discover a comprehensive set of patterns, gene clustering, gene expression discretization and gene cluster fuzzification are absolutely necessary.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Wu GP,Chan KC,Wong AK

doi

10.1186/1471-2105-12-S5-S5

subject

Has Abstract

pub_date

2011-01-01 00:00:00

pages

S5

issn

1471-2105

pii

1471-2105-12-S5-S5

journal_volume

12 Suppl 5

pub_type

杂志文章
  • iMEGES: integrated mental-disorder GEnome score by deep neural network for prioritizing the susceptibility genes for mental disorders in personal genomes.

    abstract:BACKGROUND:A range of rare and common genetic variants have been discovered to be potentially associated with mental diseases, but many more have not been uncovered. Powerful integrative methods are needed to systematically prioritize both variants and genes that confer susceptibility to mental diseases in personal gen...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2469-7

    authors: Khan A,Liu Q,Wang K

    更新日期:2018-12-28 00:00:00

  • Amino acid sequence associated with bacteriophage recombination site helps to reveal genes potentially acquired through horizontal gene transfer.

    abstract:BACKGROUND:Horizontal gene transfer, i.e. the acquisition of genetic material from nonparent organism, is considered an important force driving species evolution. Many cases of horizontal gene transfer from prokaryotes to eukaryotes have been registered, but no transfer mechanism has been deciphered so far, although vi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03599-y

    authors: Daugavet MA,Shabelnikov SV,Podgornaya OI

    更新日期:2020-07-24 00:00:00

  • PseUI: Pseudouridine sites identification based on RNA sequence information.

    abstract:BACKGROUND:Pseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefit for understanding t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2321-0

    authors: He J,Fang T,Zhang Z,Huang B,Zhu X,Xiong Y

    更新日期:2018-08-29 00:00:00

  • Identification of CD8+ T cell epitopes through proteasome cleavage site predictions.

    abstract:BACKGROUND:We previously introduced PCPS (Proteasome Cleavage Prediction Server), a web-based tool to predict proteasome cleavage sites using n-grams. Here, we evaluated the ability of PCPS immunoproteasome cleavage model to discriminate CD8+ T cell epitopes. RESULTS:We first assembled an epitope dataset consisting of...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03782-1

    authors: Gomez-Perosanz M,Ras-Carmona A,Lafuente EM,Reche PA

    更新日期:2020-12-14 00:00:00

  • Mining published lists of cancer related microarray experiments: identification of a gene expression signature having a critical role in cell-cycle control.

    abstract:BACKGROUND:Routine application of gene expression microarray technology is rapidly producing large amounts of data that necessitate new approaches of analysis. The analysis of a specific microarray experiment profits enormously from cross-comparing to other experiments. This process is generally performed by numerical ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-S4-S14

    authors: Finocchiaro G,Mancuso F,Muller H

    更新日期:2005-12-01 00:00:00

  • Combining calls from multiple somatic mutation-callers.

    abstract:BACKGROUND:Accurate somatic mutation-calling is essential for insightful mutation analyses in cancer studies. Several mutation-callers are publicly available and more are likely to appear. Nonetheless, mutation-calling is still challenging and there is unlikely to be one established caller that systematically outperfor...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-154

    authors: Kim SY,Jacob L,Speed TP

    更新日期:2014-05-21 00:00:00

  • EVA: Exome Variation Analyzer, an efficient and versatile tool for filtering strategies in medical genomics.

    abstract:BACKGROUND:Whole exome sequencing (WES) has become the strategy of choice to identify a coding allelic variant for a rare human monogenic disorder. This approach is a revolution in medical genetics history, impacting both fundamental research, and diagnostic methods leading to personalized medicine. A plethora of effic...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S14-S9

    authors: Coutant S,Cabot C,Lefebvre A,Léonard M,Prieur-Gaston E,Campion D,Lecroq T,Dauchel H

    更新日期:2012-01-01 00:00:00

  • Identifying cancer prognostic modules by module network analysis.

    abstract:BACKGROUND:The identification of prognostic genes that can distinguish the prognostic risks of cancer patients remains a significant challenge. Previous works have proven that functional gene sets were more reliable for this task than the gene signature. However, few works have considered the cross-talk among functiona...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2674-z

    authors: Zhou XH,Chu XY,Xue G,Xiong JH,Zhang HY

    更新日期:2019-02-18 00:00:00

  • RWRMTN: a tool for predicting disease-associated microRNAs based on a microRNA-target gene network.

    abstract:BACKGROUND:The misregulation of microRNA (miRNA) has been shown to cause diseases. Recently, we have proposed a computational method based on a random walk framework on a miRNA-target gene network to predict disease-associated miRNAs. The prediction performance of our method is better than that of some existing state-o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03578-3

    authors: Le DH,Tran TTH

    更新日期:2020-06-15 00:00:00

  • Performance of a genetic algorithm for mass spectrometry proteomics.

    abstract:BACKGROUND:Recently, mass spectrometry data have been mined using a genetic algorithm to produce discriminatory models that distinguish healthy individuals from those with cancer. This algorithm is the basis for claims of 100% sensitivity and specificity in two related publicly available datasets. To date, no detailed ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-180

    authors: Jeffries NO

    更新日期:2004-11-19 00:00:00

  • antaRNA--Multi-objective inverse folding of pseudoknot RNA using ant-colony optimization.

    abstract:BACKGROUND:Many functional RNA molecules fold into pseudoknot structures, which are often essential for the formation of an RNA's 3D structure. Currently the design of RNA molecules, which fold into a specific structure (known as RNA inverse folding) within biotechnological applications, is lacking the feature of incor...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0815-6

    authors: Kleinkauf R,Houwaart T,Backofen R,Mann M

    更新日期:2015-11-18 00:00:00

  • Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting.

    abstract:BACKGROUND:Supervised learning and many stochastic methods for predicting protein-protein interactions require both negative and positive interactions in the training data set. Unlike positive interactions, negative interactions cannot be readily obtained from interaction data, so these must be generated. In protein-pr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S1-S57

    authors: Kim J,Huang DS,Han K

    更新日期:2009-01-30 00:00:00

  • Reordering based integrative expression profiling for microarray classification.

    abstract:BACKGROUND:Current network-based microarray analysis uses the information of interactions among concerned genes/gene products, but still considers each gene expression individually. We propose an organized knowledge-supervised approach - Integrative eXpression Profiling (IXP), to improve microarray classification accur...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S2-S1

    authors: Wu X,Huang H,Sonachalam M,Reinhard S,Shen J,Pandey R,Chen JY

    更新日期:2012-03-13 00:00:00

  • Discovering biological connections between experimental conditions based on common patterns of differential gene expression.

    abstract:BACKGROUND:Identifying similarities between patterns of differential gene expression provides an opportunity to identify similarities between the experimental and biological conditions that give rise to these gene expression alterations. The growing volume of gene expression data in open data repositories such as the N...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-381

    authors: Gower AC,Spira A,Lenburg ME

    更新日期:2011-09-27 00:00:00

  • TMB-Hunt: an amino acid composition based method to screen proteomes for beta-barrel transmembrane proteins.

    abstract:BACKGROUND:Beta-barrel transmembrane (bbtm) proteins are a functionally important and diverse group of proteins expressed in the outer membranes of bacteria (both gram negative and acid fast gram positive), mitochondria and chloroplasts. Despite recent publications describing reasonable levels of accuracy for discrimin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-56

    authors: Garrow AG,Agnew A,Westhead DR

    更新日期:2005-03-15 00:00:00

  • SemaTyP: a knowledge graph based literature mining method for drug discovery.

    abstract:BACKGROUND:Drug discovery is the process through which potential new medicines are identified. High-throughput screening and computer-aided drug discovery/design are the two main drug discovery methods for now, which have successfully discovered a series of drugs. However, development of new drugs is still an extremely...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2167-5

    authors: Sang S,Yang Z,Wang L,Liu X,Lin H,Wang J

    更新日期:2018-05-30 00:00:00

  • Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns.

    abstract:BACKGROUND:It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enou...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-226

    authors: Meyer F,Kurtz S,Beckstette M

    更新日期:2013-07-17 00:00:00

  • ProMEX: a mass spectral reference database for proteins and protein phosphorylation sites.

    abstract:BACKGROUND:In the last decade, techniques were established for the large scale genome-wide analysis of proteins, RNA, and metabolites, and database solutions have been developed to manage the generated data sets. The Golm Metabolome Database for metabolite data (GMD) represents one such effort to make these data broadl...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-216

    authors: Hummel J,Niemann M,Wienkoop S,Schulze W,Steinhauser D,Selbig J,Walther D,Weckwerth W

    更新日期:2007-06-23 00:00:00

  • Inclusion of the fitness sharing technique in an evolutionary algorithm to analyze the fitness landscape of the genetic code adaptability.

    abstract:BACKGROUND:The canonical code, although prevailing in complex genomes, is not universal. It was shown the canonical genetic code superior robustness compared to random codes, but it is not clearly determined how it evolved towards its current form. The error minimization theory considers the minimization of point mutat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1608-x

    authors: Santos J,Monteagudo Á

    更新日期:2017-03-27 00:00:00

  • knnAUC: an open-source R package for detecting nonlinear dependence between one continuous variable and one binary variable.

    abstract:BACKGROUND:Testing the dependence of two variables is one of the fundamental tasks in statistics. In this work, we developed an open-source R package (knnAUC) for detecting nonlinear dependence between one continuous variable X and one binary dependent variables Y (0 or 1). RESULTS:We addressed this problem by using k...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2427-4

    authors: Li Y,Liu X,Ma Y,Wang Y,Zhou W,Hao M,Yuan Z,Liu J,Xiong M,Shugart YY,Wang J,Jin L

    更新日期:2018-11-22 00:00:00

  • Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering.

    abstract:BACKGROUND:Microarray technologies produced large amount of data. The hierarchical clustering is commonly used to identify clusters of co-expressed genes. However, microarray datasets often contain missing values (MVs) representing a major drawback for the use of the clustering methods. Usually the MVs are not treated,...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-114

    authors: de Brevern AG,Hazout S,Malpertuy A

    更新日期:2004-08-23 00:00:00

  • Natural computation meta-heuristics for the in silico optimization of microbial strains.

    abstract:BACKGROUND:One of the greatest challenges in Metabolic Engineering is to develop quantitative models and algorithms to identify a set of genetic manipulations that will result in a microbial strain with a desirable metabolic phenotype which typically means having a high yield/productivity. This challenge is not only du...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-499

    authors: Rocha M,Maia P,Mendes R,Pinto JP,Ferreira EC,Nielsen J,Patil KR,Rocha I

    更新日期:2008-11-27 00:00:00

  • Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets.

    abstract:BACKGROUND:Shotgun metagenomics based on untargeted sequencing can explore the taxonomic profile and the function of unknown microorganisms in samples, and complement the shortage of amplicon sequencing. Binning assembled sequences into individual groups, which represent microbial genomes, is the key step and a major c...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03667-3

    authors: Yue Y,Huang H,Qi Z,Dou HM,Liu XY,Han TF,Chen Y,Song XJ,Zhang YH,Tu J

    更新日期:2020-07-28 00:00:00

  • dupRadar: a Bioconductor package for the assessment of PCR artifacts in RNA-Seq data.

    abstract:BACKGROUND:PCR clonal artefacts originating from NGS library preparation can affect both genomic as well as RNA-Seq applications when protocols are pushed to their limits. In RNA-Seq however the artifactual reads are not easy to tell apart from normal read duplication due to natural over-sequencing of highly expressed ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1276-2

    authors: Sayols S,Scherzinger D,Klein H

    更新日期:2016-10-21 00:00:00

  • Cascaded classifiers for confidence-based chemical named entity recognition.

    abstract:BACKGROUND:Chemical named entities represent an important facet of biomedical text. RESULTS:We have developed a system to use character-based n-grams, Maximum Entropy Markov Models and rescoring to recognise chemical names and other such entities, and to make confidence estimates for the extracted entities. An adjusta...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S11-S4

    authors: Corbett P,Copestake A

    更新日期:2008-11-19 00:00:00

  • GLIDERS--a web-based search engine for genome-wide linkage disequilibrium between HapMap SNPs.

    abstract:BACKGROUND:A number of tools for the examination of linkage disequilibrium (LD) patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb). We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search en...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-367

    authors: Lawrence R,Day-Williams AG,Mott R,Broxholme J,Cardon LR,Zeggini E

    更新日期:2009-10-31 00:00:00

  • Ferret: a sentence-based literature scanning system.

    abstract:BACKGROUND:The rapid pace of bioscience research makes it very challenging to track relevant articles in one's area of interest. MEDLINE, a primary source for biomedical literature, offers access to more than 20 million citations with three-quarters of a million new ones added each year. Thus it is not surprising to se...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0630-0

    authors: Srinivasan P,Zhang XN,Bouten R,Chang C

    更新日期:2015-06-20 00:00:00

  • Prediction of MHC class I binding peptides, using SVMHC.

    abstract:BACKGROUND:T-cells are key players in regulating a specific immune response. Activation of cytotoxic T-cells requires recognition of specific peptides bound to Major Histocompatibility Complex (MHC) class I molecules. MHC-peptide complexes are potential tools for diagnosis and treatment of pathogens and cancer, as well...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-3-25

    authors: Dönnes P,Elofsson A

    更新日期:2002-09-11 00:00:00

  • Correction to: Similarities and differences between variants called with human reference genome HG19 or HG38.

    abstract::After publication of this supplement article. ...

    journal_title:BMC bioinformatics

    pub_type: 已发布勘误

    doi:10.1186/s12859-019-2776-7

    authors: Pan B,Kusko R,Xiao W,Zheng Y,Liu Z,Xiao C,Sakkiah S,Guo W,Gong P,Zhang C,Ge W,Shi L,Tong W,Hong H

    更新日期:2019-05-15 00:00:00

  • Asymmetric bagging and feature selection for activities prediction of drug molecules.

    abstract:BACKGROUND:Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S6-S7

    authors: Li GZ,Meng HH,Lu WC,Yang JY,Yang MQ

    更新日期:2008-05-28 00:00:00