Abstract:
BACKGROUND:Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification. METHODS:We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces. RESULTS:We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at https://sites.google.com/site/heyaumapbc2011/. CONCLUSIONS:This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a 'profile-biomarker'. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale 'omics' data mining.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Han H,Li XLdoi
10.1186/1471-2105-12-S1-S7subject
Has Abstractpub_date
2011-02-15 00:00:00pages
S7issn
1471-2105pii
1471-2105-12-S1-S7journal_volume
12 Suppl 1pub_type
杂志文章abstract:BACKGROUND:Tiling-arrays are applicable to multiple types of biological research questions. Due to its advantages (high sensitivity, resolution, unbiased), the technology is often employed in genome-wide investigations. A major challenge in the analysis of tiling-array data is to define regions-of-interest, i.e., conti...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-275
更新日期:2010-05-21 00:00:00
abstract:BACKGROUND:Molecular recognition in proteins occurs due to appropriate arrangements of physical, chemical, and geometric properties of an atomic surface. Similar surface regions should create similar binding interfaces. Effective methods for comparing surface regions can be used in identifying similar regions, and to p...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-314
更新日期:2012-11-24 00:00:00
abstract:BACKGROUND:The locations and shapes of synapses are important in reconstructing connectomes and analyzing synaptic plasticity. However, current synapse detection and segmentation methods are still not adequate for accurately acquiring the synaptic connectivity, and they cannot effectively alleviate the burden of synaps...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2232-0
更新日期:2018-07-13 00:00:00
abstract:BACKGROUND:Microarray experiments enable simultaneous measurement of the expression levels of virtually all transcripts present in cells, thereby providing a 'molecular picture' of the cell state. On the other hand, the genomic responses to a pharmacological or hormonal stimulus are dynamic molecular processes, where t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S2-S12
更新日期:2008-03-26 00:00:00
abstract:BACKGROUND:In real-time PCR data analysis, the cycle threshold (CT) method is currently the gold standard. This method is based on an assumption of equal PCR efficiency in all reactions, and precision may suffer if this condition is not met. Nonlinear regression analysis (NLR) or curve fitting has therefore been sugges...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-107
更新日期:2006-03-03 00:00:00
abstract:BACKGROUND:Systems biology modeling from microarray data requires the most contemporary structural and functional array annotation. However, microarray annotations, especially for non-commercial, non-traditional biomedical model organisms, are often dated. In addition, most microarray analysis tools do not readily acce...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-30
更新日期:2009-01-23 00:00:00
abstract:BACKGROUND:In recent years, successful contact prediction methods and contact-guided ab initio protein structure prediction methods have highlighted the importance of incorporating contact information into protein structure prediction methods. It is also observed that for almost all globular proteins, the quality of co...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1404-z
更新日期:2016-12-07 00:00:00
abstract::Identifying segments in the genome of different individuals that are identical-by-descent (IBD) is a fundamental element of genetics. IBD data is used for numerous applications including demographic inference, heritability estimation, and mapping disease loci. Simultaneous detection of IBD over multiple haplotypes has...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-16-S5-S9
更新日期:2015-01-01 00:00:00
abstract:BACKGROUND:Heritability of a phenotypic or molecular trait measures the proportion of variance that is attributable to genotypic variance. It is an important concept in breeding and genetics. Few methods are available for calculating heritability for traits derived from high-throughput sequencing. RESULTS:We propose s...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1539-6
更新日期:2017-03-02 00:00:00
abstract:BACKGROUND:HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous proteins. RESULTS:We developed a single-instructi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3019-7
更新日期:2019-09-14 00:00:00
abstract:BACKGROUND:Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published da...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-203
更新日期:2008-04-21 00:00:00
abstract:BACKGROUND:Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships. RESULTS:In this paper, we have proposed a Vertical...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-353
更新日期:2011-08-25 00:00:00
abstract:BACKGROUND:The Bacteria Biotope (BB) task is a biomedical relation extraction (RE) that aims to study the interaction between bacteria and their locations. This task is considered to pertain to fundamental knowledge in applied microbiology. Some previous investigations conducted the study by applying feature-based mode...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3217-3
更新日期:2019-12-03 00:00:00
abstract:BACKGROUND:Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S6-S7
更新日期:2008-05-28 00:00:00
abstract:BACKGROUND:Many functional RNA molecules fold into pseudoknot structures, which are often essential for the formation of an RNA's 3D structure. Currently the design of RNA molecules, which fold into a specific structure (known as RNA inverse folding) within biotechnological applications, is lacking the feature of incor...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0815-6
更新日期:2015-11-18 00:00:00
abstract:BACKGROUND:Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-183
更新日期:2009-06-15 00:00:00
abstract:BACKGROUND:Cross-species comparisons of gene neighborhoods (also called genomic contexts) in microbes may provide insight into determining functionally related or co-regulated sets of genes, suggest annotations of previously un-annotated genes, and help to identify horizontal gene transfer events across microbial speci...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-18
更新日期:2013-01-16 00:00:00
abstract:BACKGROUND:Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-287
更新日期:2008-06-18 00:00:00
abstract:BACKGROUND:Guide-trees are used as part of an essential heuristic to enable the calculation of multiple sequence alignments. They have been the focus of much method development but there has been little effort at determining systematically, which guide-trees, if any, give the best alignments. Some guide-tree constructi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-338
更新日期:2014-10-04 00:00:00
abstract:BACKGROUND:Phylogenetic profiles record the occurrence of homologs of genes across fully sequenced organisms. Proteins with similar profiles are typically components of protein complexes or metabolic pathways. Various existing methods measure similarity between two profiles and, hence, the likelihood that the two prote...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-S4-S7
更新日期:2007-05-22 00:00:00
abstract:BACKGROUND:The Damerau-Levenshtein (DL) distance metric has been widely used in the biological science. It tries to identify the similar region of DNA,RNA and protein sequences by transforming one sequence to the another using the substitution, insertion, deletion and transposition operations. Lowrance and Wagner have ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3184-8
更新日期:2020-12-09 00:00:00
abstract:BACKGROUND:Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case-control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Meth...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-61
更新日期:2013-02-22 00:00:00
abstract:BACKGROUND:Multidimensional protein identification technology (MudPIT)-based shot-gun proteomics has been proven to be an effective platform for functional proteomics. In particular, the various sample preparation methods and bioinformatics tools can be integrated to improve the proteomics platform for applications lik...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S15-S8
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino ac...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-118
更新日期:2010-03-05 00:00:00
abstract:BACKGROUND:Fluorescent reporter genes have become widely used for monitoring gene expression in living cells. When a microbial strain carrying a reporter gene is grown in a microplate reader, the fluorescence and the absorbance (optical density) of the culture can be automatically measured every few minutes in a highly...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2920-4
更新日期:2019-06-11 00:00:00
abstract:BACKGROUND:Biologists often conduct multiple but different cDNA microarray studies that all target the same biological system or pathway. Within each study, replicate slides within repeated identical experiments are often produced. Pooling information across studies can help more accurately identify true target genes. ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-247
更新日期:2006-05-05 00:00:00
abstract:BACKGROUND:Biological data have traditionally been stored and made publicly available through a variety of on-line databases, whereas biological knowledge has traditionally been found in the printed literature. With journals now on-line and providing an increasing amount of open access content, often free of copyright ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-220
更新日期:2010-04-29 00:00:00
abstract:BACKGROUND:DNA methylation of CpG dinucleotides is an essential epigenetic modification that plays a key role in transcription. Widely used DNA enrichment-based methods offer high coverage for measuring methylated CpG dinucleotides, with the lowest cost per CpG covered genome-wide. However, these methods measure the DN...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2574-7
更新日期:2018-12-22 00:00:00
abstract:BACKGROUND:The scientific literature contains millions of microbial gene identifiers within the full text and tables, but these annotations rarely get incorporated into public sequence databases. We propose to utilize the Open Access (OA) subset of PubMed Central (PMC) as a gene annotation database and have developed a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-43
更新日期:2014-02-05 00:00:00
abstract:BACKGROUND:While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We rep...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-356
更新日期:2006-07-25 00:00:00