Abstract:
BACKGROUND:Microarray technology can acquire information about thousands of genes simultaneously. We analyzed published breast cancer microarray databases to predict five-year recurrence and compared the performance of three data mining algorithms of artificial neural networks (ANN), decision trees (DT) and logistic regression (LR) and two composite models of DT-ANN and DT-LR. The collection of microarray datasets from the Gene Expression Omnibus, four breast cancer datasets were pooled for predicting five-year breast cancer relapse. After data compilation, 757 subjects, 5 clinical variables and 13,452 genetic variables were aggregated. The bootstrap method, Mann-Whitney U test and 20-fold cross-validation were performed to investigate candidate genes with 100 most-significant p-values. The predictive powers of DT, LR and ANN models were assessed using accuracy and the area under ROC curve. The associated genes were evaluated using Cox regression. RESULTS:The DT models exhibited the lowest predictive power and the poorest extrapolation when applied to the test samples. The ANN models displayed the best predictive power and showed the best extrapolation. The 21 most-associated genes, as determined by integration of each model, were analyzed using Cox regression with a 3.53-fold (95% CI: 2.24-5.58) increased risk of breast cancer five-year recurrence. CONCLUSIONS:The 21 selected genes can predict breast cancer recurrence. Among these genes, CCNB1, PLK1 and TOP2A are in the cell cycle G2/M DNA damage checkpoint pathway. Oncologists can offer the genetic information for patients when understanding the gene expression profiles on breast cancer recurrence.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Chou HL,Yao CT,Su SL,Lee CY,Hu KY,Terng HJ,Shih YW,Chang YT,Lu YF,Chang CW,Wahlqvist ML,Wetter T,Chu CMdoi
10.1186/1471-2105-14-100subject
Has Abstractpub_date
2013-03-19 00:00:00pages
100issn
1471-2105pii
1471-2105-14-100journal_volume
14pub_type
杂志文章abstract:BACKGROUND:Heritability of a phenotypic or molecular trait measures the proportion of variance that is attributable to genotypic variance. It is an important concept in breeding and genetics. Few methods are available for calculating heritability for traits derived from high-throughput sequencing. RESULTS:We propose s...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1539-6
更新日期:2017-03-02 00:00:00
abstract:BACKGROUND:Observed co-expression of a group of genes is frequently attributed to co-regulation by shared transcription factors. This assumption has led to the hypothesis that promoters of co-expressed genes should share common regulatory motifs, which forms the basis for numerous computational tools that search for th...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-267
更新日期:2010-05-20 00:00:00
abstract:BACKGROUND:In recent years, substantial effort has been applied to de novo regulatory motif discovery. At this time, more than 150 software tools exist to detect regulatory binding sites given a set of genomic sequences. As the number of software packages increases, it becomes more important to identify the tools with ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S9-S6
更新日期:2008-08-12 00:00:00
abstract:BACKGROUND:Bistability and ability to switch between two stable states is the hallmark of cellular responses. Cellular signaling pathways often contain bistable switches that regulate the transmission of the extracellular information to the nucleus where important biological functions are executed. RESULTS:In this wor...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3155-0
更新日期:2019-11-28 00:00:00
abstract:BACKGROUND:This study addresses a recurrent biological problem, that is to define a formal clustering structure for a set of tissues on the basis of the relative abundance of multiple alternatively spliced isoforms mRNAs generated by the same gene. To this aim, we have used a model-based clustering approach, based on a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0689-7
更新日期:2015-09-15 00:00:00
abstract:BACKGROUND:The BioCreative text mining evaluation investigated the application of text mining methods to the task of automatically extracting information from text in biomedical research articles. We participated in Task 2 of the evaluation. For this task, we built a system to automatically annotate a given protein wit...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-S1-S18
更新日期:2005-01-01 00:00:00
abstract:BACKGROUND:Molecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype. Reliable signature discovery methods could yield valuable insight into cell biology and mechanisms of human disease. However, it is currently not clear how to control ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-38
更新日期:2009-01-29 00:00:00
abstract:BACKGROUND:Bioinformatics software quality assurance is essential in genomic medicine. Systematic verification and validation of bioinformatics software is difficult because it is often not possible to obtain a realistic "gold standard" for systematic evaluation. Here we apply a technique that originates from the softw...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S16-S15
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:The large increase in the size of patent collections has led to the need of efficient search strategies. But the development of advanced text-mining applications dedicated to patents of the biomedical field remains rare, in particular to address the needs of the pharmaceutical & biotech industry, which inten...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S1-S15
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:Proteins are comprised of one or several building blocks, known as domains. Such domains can be classified into families according to their evolutionary origin. Whereas sequencing technologies have advanced immensely in recent years, there are no matching computational methodologies for large-scale determina...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-277
更新日期:2006-06-02 00:00:00
abstract:BACKGROUND:The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where seq...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1751-4
更新日期:2017-07-17 00:00:00
abstract:BACKGROUND:Our knowledge of global protein-protein interaction (PPI) networks in complex organisms such as humans is hindered by technical limitations of current methods. RESULTS:On the basis of short co-occurring polypeptide regions, we developed a tool called MP-PIPE capable of predicting a global human PPI network ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0383-1
更新日期:2014-12-10 00:00:00
abstract:BACKGROUND:One of the important goals in the post-genomic era is to determine the regulatory elements within the non-coding DNA of a given organism's genome. The identification of functional cis-regulatory modules has proven difficult since the component factor binding sites are small and the rules governing their arra...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-4-57
更新日期:2003-11-20 00:00:00
abstract:BACKGROUND:Gene expression patterns of olfactory receptors (ORs) are an important component of the signal encoding mechanism in the olfactory system since they determine the interactions between odorant ligands and sensory neurons. We have developed the Olfactory Receptor Microarray Database (ORMD) to house OR gene exp...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-231
更新日期:2007-06-30 00:00:00
abstract:BACKGROUND:Although protein-protein interaction networks determined with high-throughput methods are incomplete, they are commonly used to infer the topology of the complete interactome. These partial networks often show a scale-free behavior with only a few proteins having many and the majority having only a few conne...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-519
更新日期:2006-11-30 00:00:00
abstract:BACKGROUND:Two-dimensional electrophoresis (2DE) is one of the most widely applied techniques in comparative proteomics. The basic task of 2DE is to identify differential protein expression by quantitative analysis of 2DE images. To reduce the errors of spot quantification in 2DE images, a novel brightness correction m...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3432-y
更新日期:2020-03-19 00:00:00
abstract:BACKGROUND:Knowing the subcellular location of proteins provides clues to their function as well as the interconnectivity of biological processes. Dozens of tools are available for predicting protein location in the eukaryotic cell. Each tool performs well on certain data sets, but their predictions often disagree for ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-420
更新日期:2007-10-29 00:00:00
abstract:BACKGROUND:Recent studies have shown that copy number variations (CNVs) are frequent in higher eukaryotes and associated with a substantial portion of inherited and acquired risk for various human diseases. The increasing availability of high-resolution genome surveillance platforms provides opportunity for rapidly ass...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-74
更新日期:2010-02-04 00:00:00
abstract:BACKGROUND:Proteins having similar functions from different sources can be identified by the occurrence in their sequences, a conserved cluster of amino acids referred to as pattern, motif, signature or fingerprint. The wide usage of protein sequence analysis in par with the growth of databases signifies the importance...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-127
更新日期:2004-09-09 00:00:00
abstract:BACKGROUND:Frequent pattern mining analysis applied on microarray dataset appears to be a promising strategy for identifying relationships between gene expression levels. Unfortunately, too many itemsets (co-expressed genes) are identified by this analysis method since it does not consider the importance of each gene w...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-230
更新日期:2013-07-21 00:00:00
abstract:BACKGROUND:Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactio...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S3-S5
更新日期:2008-04-11 00:00:00
abstract:BACKGROUND:Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on. Unique subsequences are markers of special interest in genome comparison, expression profiling, and genetic engineering. Relat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-167
更新日期:2008-03-26 00:00:00
abstract:BACKGROUND:Identifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics. Existing computational methods try to incorporate a variety of biological evidences to enhance the quality of predicted complexes. However, it is still a challenge to integrate diffe...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2555-x
更新日期:2018-12-20 00:00:00
abstract:BACKGROUND:MicroRNAs (miRNAs) are single-stranded non-coding RNAs known to regulate a wide range of cellular processes by silencing the gene expression at the protein and/or mRNA levels. Computational prediction of miRNA targets is essential for elucidating the detailed functions of miRNA. However, the prediction speci...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-476
更新日期:2010-09-22 00:00:00
abstract:BACKGROUND:Human breast cancer resistance protein (BCRP) is an ATP-binding cassette (ABC) efflux transporter that confers multidrug resistance in cancers and also plays an important role in the absorption, distribution and elimination of drugs. Prediction as to if drugs or new molecular entities are BCRP substrates sho...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-130
更新日期:2013-04-15 00:00:00
abstract:BACKGROUND:Long short-term memory (LSTM) is one of the most attractive deep learning methods to learn time series or contexts of input data. Increasing studies, including biological sequence analyses in bioinformatics, utilize this architecture. Amino acid sequence profiles are widely used for bioinformatics studies, s...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2284-1
更新日期:2018-07-18 00:00:00
abstract:BACKGROUND:Computer simulation is a resource which can be employed to identify optimal breeding strategies to effectively and efficiently achieve specific goals in developing improved cultivars. In some instances, it is crucial to assess in silico the options as well as the impact of various crossing schemes and breedi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-0906-z
更新日期:2016-02-06 00:00:00
abstract:BACKGROUND:The rate of protein structures being deposited in the Protein Data Bank surpasses the capacity to experimentally characterise them and therefore computational methods to analyse these structures have become increasingly important. Identifying the region of the protein most likely to be involved in function i...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-379
更新日期:2009-11-18 00:00:00
abstract:BACKGROUND:Under conditions of no strand bias the number of Gs is equal to that of Cs for each DNA strand; similarly, the total number of Ts is equal to that of As. However, within each strand there are considerable local deviations from the A = T and G = C equality. These asymmetries in nucleotide composition have bee...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-21
更新日期:2007-01-23 00:00:00
abstract:BACKGROUND:Community structure is ubiquitous in biological networks. There has been an increased interest in unraveling the community structure of biological systems as it may provide important insights into a system's functional components and the impact of local structures on dynamics at a global scale. Choosing an a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-220
更新日期:2014-06-25 00:00:00