Abstract:
:A class-imbalanced classifier is a decision rule to predict the class membership of new samples from an available data set where the class sizes differ considerably. When the class sizes are very different, most standard classification algorithms may favor the larger (majority) class resulting in poor accuracy in the minority class prediction. A class-imbalanced classifier typically modifies a standard classifier by a correction strategy or by incorporating a new strategy in the training phase to account for differential class sizes. This article reviews and evaluates some most important methods for class prediction of high-dimensional imbalanced data. The evaluation addresses the fundamental issues of the class-imbalanced classification problem: imbalance ratio, small disjuncts and overlap complexity, lack of data and feature selection. Four class-imbalanced classifiers are considered. The four classifiers include three standard classification algorithms each coupled with an ensemble correction strategy and one support vector machines (SVM)-based correction classifier. The three algorithms are (i) diagonal linear discriminant analysis (DLDA), (ii) random forests (RFs) and (ii) SVMs. The SVM-based correction classifier is SVM threshold adjustment (SVM-THR). A Monte-Carlo simulation and five genomic data sets were used to illustrate the analysis and address the issues. The SVM-ensemble classifier appears to perform the best when the class imbalance is not too severe. The SVM-THR performs well if the imbalance is severe and predictors are highly correlated. The DLDA with a feature selection can perform well without using the ensemble correction.
journal_name
Brief Bioinformjournal_title
Briefings in bioinformaticsauthors
Lin WJ,Chen JJdoi
10.1093/bib/bbs006subject
Has Abstractpub_date
2013-01-01 00:00:00pages
13-26issue
1eissn
1467-5463issn
1477-4054pii
bbs006journal_volume
14pub_type
杂志文章,评审abstract::In clinical cancer treatment, genomic alterations would often affect the response of patients to anticancer drugs. Studies have shown that molecular features of tumors could be biomarkers predictive of sensitivity or resistance to anticancer agents, but the identification of actionable mutations are often constrained ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbz109
更新日期:2020-12-01 00:00:00
abstract::Atomic charges play a very important role in drug-target recognition. However, computation of atomic charges with high-level quantum mechanics (QM) calculations is very time-consuming. A number of machine learning (ML)-based atomic charge prediction methods have been proposed to speed up the calculation of high-accura...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa183
更新日期:2020-08-25 00:00:00
abstract::Since the small RNA-sequencing (sRNA-seq) technology became available, it allowed the discovery of thousands new microRNAs (miRNAs) in humans and many other species, providing new data on these small RNAs (sRNAs) of high biological and translational relevance. MiRNA discovery has not yet reached saturation, even in th...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbx148
更新日期:2019-05-21 00:00:00
abstract::Exploring protein-ligand interactions is a subject of immense interest, as it provides deeper insights into molecular recognition, mechanism of interaction and subsequent functions. Predicting an accurate model for a protein-ligand interaction is a challenging task. Molecular docking is a computational method used for...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa279
更新日期:2020-10-26 00:00:00
abstract::While the number of sequenced genes is increasing dramatically, the number of different protein structural families is expected to be more limited. Changes in enzymatic activity or protein interactions can dramatically modify the role of homologous proteins in different organisms or mutants. However, experimental data...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/2.1.30
更新日期:2001-03-01 00:00:00
abstract::The adoption of agent technologies and multi-agent systems constitutes an emerging area in bioinformatics. In this article, we report on the activity of the Working Group on Agents in Bioinformatics (BIOAGENTS) founded during the first AgentLink III Technical Forum meeting on the 2nd of July, 2004, in Rome. The meetin...
journal_title:Briefings in bioinformatics
pub_type:
doi:10.1093/bib/bbl014
更新日期:2007-01-01 00:00:00
abstract::Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate prote...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbz081
更新日期:2020-07-15 00:00:00
abstract:BACKGROUND:Whole genome sequencing (WGS) is increasingly used for Mycobacterium tuberculosis (Mtb) research. Countries with the highest tuberculosis (TB) burden face important challenges to integrate WGS into surveillance and research. METHODS:We assessed the global status of Mtb WGS and developed a 3-week training co...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa246
更新日期:2020-10-03 00:00:00
abstract::Despite gene expression programs being notoriously complex, RNA abundance is usually assumed as a proxy for transcriptional activity. Recently developed approaches, able to disentangle transcriptional and post-transcriptional regulatory processes, have revealed a more complex scenario. It is now possible to work out h...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa389
更新日期:2020-12-22 00:00:00
abstract::Effectively representing Medical Subject Headings (MeSH) headings (terms) such as disease and drug as discriminative vectors could greatly improve the performance of downstream computational prediction models. However, these terms are often abstract and difficult to quantify. In this paper, we converted the MeSH tree ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa037
更新日期:2020-03-31 00:00:00
abstract::The contribution of transposable elements (TEs) to genome structure and evolution as well as their impact on genome sequencing, assembly, annotation and alignment has generated increasing interest in developing new methods for their computational analysis. Here we review the diversity of innovative approaches to ident...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/bbm048
更新日期:2007-11-01 00:00:00
abstract::Computational aspects of host-parasite phylogenies form part of a set of general associations between areas and organisms, hosts and parasites, and species and genes. The problem is not new and the commonalities of exploring vicariance biogeography (organisms tracking areas) and host-parasite co-speciation (parasites ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/5.4.339
更新日期:2004-12-01 00:00:00
abstract::Phenotypes have gained increased notoriety in the clinical and biological domain owing to their application in numerous areas such as the discovery of disease genes and drug targets, phylogenetics and pharmacogenomics. Phenotypes, defined as observable characteristics of organisms, can be seen as one of the bridges th...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbv083
更新日期:2016-09-01 00:00:00
abstract::Understanding the functioning of biological systems depends on tackling complexity spanning spatial scales from genome to organ to whole organism. The basic unit of life, the cell, acts to co-ordinate information received across these scales and processes the myriad of signals to produce an integrated cellular respons...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/bbp010
更新日期:2009-07-01 00:00:00
abstract::Cell lines are widely used as in vitro models of tumorigenesis. However, an increasing number of researchers have found that cell lines differ from their sourced tumour samples after long-term cell culture. The application of unsuitable cell lines in experiments will affect the experimental accuracy and the treatment ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/bbw082
更新日期:2017-05-01 00:00:00
abstract::RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a numb...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbt086
更新日期:2015-01-01 00:00:00
abstract::Accurately identifying the interactions between genomic factors and the response of cancer drugs plays important roles in drug discovery, drug repositioning and cancer treatment. A number of studies revealed that interactions between genes and drugs were 'many-genes-to-many drugs' interactions, i.e. common modules, op...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa087
更新日期:2020-06-26 00:00:00
abstract:BACKGROUND:The most frequently mutated gene pairs in pancreatic adenocarcinoma (PAAD) are KRAS and TP53, and our goal is to illustrate the multiomics and molecular dynamics landscapes of KRAS/TP53 mutation and also to obtain prospective novel drugs for KRAS- and TP53-mutated PAAD patients. Moreover, we also made an att...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa149
更新日期:2020-07-31 00:00:00
abstract::The iteratively reweighted least square (IRLS) method is mostly identical to maximum likelihood (ML) method in terms of parameter estimation and power of quantitative trait locus (QTL) detection. But the IRLS is greatly superior to ML in terms of computing speed and the robustness of parameter estimation. In conjuncti...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbs062
更新日期:2014-01-01 00:00:00
abstract::High-throughput omics data are generated almost with no limit nowadays. It becomes increasingly important to integrate different omics data types to disentangle the molecular machinery of complex diseases with the hope for better disease prevention and treatment. Since the relationship among different omics data featu...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bby115
更新日期:2018-11-29 00:00:00
abstract::Mass spectrometry (MS)-based proteomics has undergone rapid advancements in recent years, creating challenging problems for bioinformatics. We focus on four aspects where bioinformatics plays a crucial role (and proteomics is needed for clinical application): peptide-spectra matching (PSM) based on the new data-indepe...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbx128
更新日期:2019-01-18 00:00:00
abstract::Occurrence and development of cancers are governed by complex networks of interacting intercellular and intracellular signals. The technology of single-cell RNA sequencing (scRNA-seq) provides an unprecedented opportunity for dissecting the interplay between the cancer cells and the associated microenvironment. Here w...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbz040
更新日期:2020-05-21 00:00:00
abstract::Deoxyribonucleic acid replication is one of the most crucial tasks taking place in the cell, and it has to be precisely regulated. This process is initiated in the replication origins (ORIs), and thus it is essential to identify such sites for a deeper understanding of the cellular processes and functions related to t...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa304
更新日期:2020-11-25 00:00:00
abstract:OBJECTIVE:Development of novel informatics methods focused on improving pregnancy outcomes remains an active area of research. The purpose of this study is to systematically review the ways that artificial intelligence (AI) and machine learning (ML), including deep learning (DL), methodologies can inform patient care d...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbaa369
更新日期:2021-01-06 00:00:00
abstract::Next generation sequencers have greatly improved our ability to mine polymorphisms and mutations out of entire (or portions of) genomes. The reliability of their outputs, though, showed to be very related to the sequencing chemistry and to deeply affect the quality of the downstream analyses. We focus here on the two-...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbs048
更新日期:2013-11-01 00:00:00
abstract::The discipline of bioinformatics has developed rapidly since the complete sequencing of the first genomes in the 1990s. The development of many high-throughput techniques during the last decades has ensured that bioinformatics has grown into a discipline that overlaps with, and is required for, the modern practice of ...
journal_title:Briefings in bioinformatics
pub_type: 历史文章,杂志文章
doi:10.1093/bib/bbu022
更新日期:2015-03-01 00:00:00
abstract::Synonymous mutations do not change the encoded amino acids but may alter the structure or function of an mRNA in ways that impact gene function. Advances in next generation sequencing technologies have detected numerous synonymous mutations in the human genome. Several computational models have been proposed to predic...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbz047
更新日期:2020-05-21 00:00:00
abstract::Regulation of proteolysis plays a critical role in a myriad of important cellular processes. The key to better understanding the mechanisms that control this process is to identify the specific substrates that each protease targets. To address this, we have developed iProt-Sub, a powerful bioinformatics tool for the a...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章,评审
doi:10.1093/bib/bby028
更新日期:2019-03-25 00:00:00
abstract::Information Integrator is an extension to IBM's relational database DB2, which uses data federation to provide benefits to molecular biology researchers through two unique capabilities: increased flexibility in combining data from disparate sources, and SQL access to non-SQL data, easing the task of automating data an...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/4.4.375
更新日期:2003-12-01 00:00:00
abstract::This Briefing reviews the widely used, currently active, up-to-date databases derived from the worldwide Protein Data Bank (PDB) to facilitate browsing, finding and exploring its entries. These databases contain visualization and analysis tools tailored to specific kinds of molecules and interactions, often including ...
journal_title:Briefings in bioinformatics
pub_type: 杂志文章
doi:10.1093/bib/bbw049
更新日期:2017-07-01 00:00:00