Class-imbalanced classifiers for high-dimensional data.

Abstract:

:A class-imbalanced classifier is a decision rule to predict the class membership of new samples from an available data set where the class sizes differ considerably. When the class sizes are very different, most standard classification algorithms may favor the larger (majority) class resulting in poor accuracy in the minority class prediction. A class-imbalanced classifier typically modifies a standard classifier by a correction strategy or by incorporating a new strategy in the training phase to account for differential class sizes. This article reviews and evaluates some most important methods for class prediction of high-dimensional imbalanced data. The evaluation addresses the fundamental issues of the class-imbalanced classification problem: imbalance ratio, small disjuncts and overlap complexity, lack of data and feature selection. Four class-imbalanced classifiers are considered. The four classifiers include three standard classification algorithms each coupled with an ensemble correction strategy and one support vector machines (SVM)-based correction classifier. The three algorithms are (i) diagonal linear discriminant analysis (DLDA), (ii) random forests (RFs) and (ii) SVMs. The SVM-based correction classifier is SVM threshold adjustment (SVM-THR). A Monte-Carlo simulation and five genomic data sets were used to illustrate the analysis and address the issues. The SVM-ensemble classifier appears to perform the best when the class imbalance is not too severe. The SVM-THR performs well if the imbalance is severe and predictors are highly correlated. The DLDA with a feature selection can perform well without using the ensemble correction.

journal_name

Brief Bioinform

authors

Lin WJ,Chen JJ

doi

10.1093/bib/bbs006

subject

Has Abstract

pub_date

2013-01-01 00:00:00

pages

13-26

issue

1

eissn

1467-5463

issn

1477-4054

pii

bbs006

journal_volume

14

pub_type

杂志文章,评审
  • Methods and resources to access mutation-dependent effects on cancer drug treatment.

    abstract::In clinical cancer treatment, genomic alterations would often affect the response of patients to anticancer drugs. Studies have shown that molecular features of tumors could be biomarkers predictive of sensitivity or resistance to anticancer agents, but the identification of actionable mutations are often constrained ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbz109

    authors: Yao H,Liang Q,Qian X,Wang J,Sham PC,Li MJ

    更新日期:2020-12-01 00:00:00

  • DeepAtomicCharge: a new graph convolutional network-based architecture for accurate prediction of atomic charges.

    abstract::Atomic charges play a very important role in drug-target recognition. However, computation of atomic charges with high-level quantum mechanics (QM) calculations is very time-consuming. A number of machine learning (ML)-based atomic charge prediction methods have been proposed to speed up the calculation of high-accura...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa183

    authors: Wang J,Cao D,Tang C,Xu L,He Q,Yang B,Chen X,Sun H,Hou T

    更新日期:2020-08-25 00:00:00

  • A survey of software tools for microRNA discovery and characterization using RNA-seq.

    abstract::Since the small RNA-sequencing (sRNA-seq) technology became available, it allowed the discovery of thousands new microRNAs (miRNAs) in humans and many other species, providing new data on these small RNAs (sRNAs) of high biological and translational relevance. MiRNA discovery has not yet reached saturation, even in th...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbx148

    authors: Bortolomeazzi M,Gaffo E,Bortoluzzi S

    更新日期:2019-05-21 00:00:00

  • InstaDock: A single-click graphical user interface for molecular docking-based virtual high-throughput screening.

    abstract::Exploring protein-ligand interactions is a subject of immense interest, as it provides deeper insights into molecular recognition, mechanism of interaction and subsequent functions. Predicting an accurate model for a protein-ligand interaction is a challenging task. Molecular docking is a computational method used for...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa279

    authors: Mohammad T,Mathur Y,Hassan MI

    更新日期:2020-10-26 00:00:00

  • Links between kinetic data and sequences in the alpha/beta-hydrolases fold database.

    abstract::While the number of sequenced genes is increasing dramatically, the number of different protein structural families is expected to be more limited. Changes in enzymatic activity or protein interactions can dramatically modify the role of homologous proteins in different organisms or mutants. However, experimental data...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/2.1.30

    authors: Chatonnet A,Cousin X,Robinson A

    更新日期:2001-03-01 00:00:00

  • Agents in bioinformatics, computational and systems biology.

    abstract::The adoption of agent technologies and multi-agent systems constitutes an emerging area in bioinformatics. In this article, we report on the activity of the Working Group on Agents in Bioinformatics (BIOAGENTS) founded during the first AgentLink III Technical Forum meeting on the 2nd of July, 2004, in Rome. The meetin...

    journal_title:Briefings in bioinformatics

    pub_type:

    doi:10.1093/bib/bbl014

    authors: Merelli E,Armano G,Cannata N,Corradini F,d'Inverno M,Doms A,Lord P,Martin A,Milanesi L,Möller S,Schroeder M,Luck M

    更新日期:2007-01-01 00:00:00

  • Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning.

    abstract::Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate prote...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbz081

    authors: Hong J,Luo Y,Zhang Y,Ying J,Xue W,Xie T,Tao L,Zhu F

    更新日期:2020-07-15 00:00:00

  • Capacity building for whole genome sequencing of Mycobacterium tuberculosis and bioinformatics in high TB burden countries.

    abstract:BACKGROUND:Whole genome sequencing (WGS) is increasingly used for Mycobacterium tuberculosis (Mtb) research. Countries with the highest tuberculosis (TB) burden face important challenges to integrate WGS into surveillance and research. METHODS:We assessed the global status of Mtb WGS and developed a 3-week training co...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa246

    authors: Rivière E,Heupink TH,Ismail N,Dippenaar A,Clarke C,Abebe G,Heusden P,Warren R,Meehan CJ,Van Rie A

    更新日期:2020-10-03 00:00:00

  • Dynamics of transcriptional and post-transcriptional regulation.

    abstract::Despite gene expression programs being notoriously complex, RNA abundance is usually assumed as a proxy for transcriptional activity. Recently developed approaches, able to disentangle transcriptional and post-transcriptional regulatory processes, have revealed a more complex scenario. It is now possible to work out h...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa389

    authors: Furlan M,de Pretis S,Pelizzola M

    更新日期:2020-12-22 00:00:00

  • MeSHHeading2vec: a new method for representing MeSH headings as vectors based on graph embedding algorithm.

    abstract::Effectively representing Medical Subject Headings (MeSH) headings (terms) such as disease and drug as discriminative vectors could greatly improve the performance of downstream computational prediction models. However, these terms are often abstract and difficult to quantify. In this paper, we converted the MeSH tree ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa037

    authors: Guo ZH,You ZH,Huang DS,Yi HC,Zheng K,Chen ZH,Wang YB

    更新日期:2020-03-31 00:00:00

  • Discovering and detecting transposable elements in genome sequences.

    abstract::The contribution of transposable elements (TEs) to genome structure and evolution as well as their impact on genome sequencing, assembly, annotation and alignment has generated increasing interest in developing new methods for their computational analysis. Here we review the diversity of innovative approaches to ident...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/bbm048

    authors: Bergman CM,Quesneville H

    更新日期:2007-11-01 00:00:00

  • Computational aspects of host-parasite phylogenies.

    abstract::Computational aspects of host-parasite phylogenies form part of a set of general associations between areas and organisms, hosts and parasites, and species and genes. The problem is not new and the commonalities of exploring vicariance biogeography (organisms tracking areas) and host-parasite co-speciation (parasites ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/5.4.339

    authors: Stevens J

    更新日期:2004-12-01 00:00:00

  • The digital revolution in phenotyping.

    abstract::Phenotypes have gained increased notoriety in the clinical and biological domain owing to their application in numerous areas such as the discovery of disease genes and drug targets, phylogenetics and pharmacogenomics. Phenotypes, defined as observable characteristics of organisms, can be seen as one of the bridges th...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbv083

    authors: Oellrich A,Collier N,Groza T,Rebholz-Schuhmann D,Shah N,Bodenreider O,Boland MR,Georgiev I,Liu H,Livingston K,Luna A,Mallon AM,Manda P,Robinson PN,Rustici G,Simon M,Wang L,Winnenburg R,Dumontier M

    更新日期:2016-09-01 00:00:00

  • The virtual cell--a candidate co-ordinator for 'middle-out' modelling of biological systems.

    abstract::Understanding the functioning of biological systems depends on tackling complexity spanning spatial scales from genome to organ to whole organism. The basic unit of life, the cell, acts to co-ordinate information received across these scales and processes the myriad of signals to produce an integrated cellular respons...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/bbp010

    authors: Walker DC,Southgate J

    更新日期:2009-07-01 00:00:00

  • Optimization of cell lines as tumour models by integrating multi-omics data.

    abstract::Cell lines are widely used as in vitro models of tumorigenesis. However, an increasing number of researchers have found that cell lines differ from their sourced tumour samples after long-term cell culture. The application of unsuitable cell lines in experiments will affect the experimental accuracy and the treatment ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/bbw082

    authors: Zhao N,Liu Y,Wei Y,Yan Z,Zhang Q,Wu C,Chang Z,Xu Y

    更新日期:2017-05-01 00:00:00

  • Comparison of software packages for detecting differential expression in RNA-seq studies.

    abstract::RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a numb...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbt086

    authors: Seyednasrollah F,Laiho A,Elo LL

    更新日期:2015-01-01 00:00:00

  • Evaluation of gene-drug common module identification methods using pharmacogenomics data.

    abstract::Accurately identifying the interactions between genomic factors and the response of cancer drugs plays important roles in drug discovery, drug repositioning and cancer treatment. A number of studies revealed that interactions between genes and drugs were 'many-genes-to-many drugs' interactions, i.e. common modules, op...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa087

    authors: Huang J,Chen J,Zhang B,Zhu L,Cai H

    更新日期:2020-06-26 00:00:00

  • Irinotecan and vandetanib create synergies for treatment of pancreatic cancer patients with concomitant TP53 and KRAS mutations.

    abstract:BACKGROUND:The most frequently mutated gene pairs in pancreatic adenocarcinoma (PAAD) are KRAS and TP53, and our goal is to illustrate the multiomics and molecular dynamics landscapes of KRAS/TP53 mutation and also to obtain prospective novel drugs for KRAS- and TP53-mutated PAAD patients. Moreover, we also made an att...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa149

    authors: Kaushik AC,Wang YJ,Wang X,Wei DQ

    更新日期:2020-07-31 00:00:00

  • Iteratively reweighted LASSO for mapping multiple quantitative trait loci.

    abstract::The iteratively reweighted least square (IRLS) method is mostly identical to maximum likelihood (ML) method in terms of parameter estimation and power of quantitative trait locus (QTL) detection. But the IRLS is greatly superior to ML in terms of computing speed and the robustness of parameter estimation. In conjuncti...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbs062

    authors: Liu Y,Yang T,Li H,Yang R

    更新日期:2014-01-01 00:00:00

  • Multilevel heterogeneous omics data integration with kernel fusion.

    abstract::High-throughput omics data are generated almost with no limit nowadays. It becomes increasingly important to integrate different omics data types to disentangle the molecular machinery of complex diseases with the hope for better disease prevention and treatment. Since the relationship among different omics data featu...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bby115

    authors: Yang H,Cao H,He T,Wang T,Cui Y

    更新日期:2018-11-29 00:00:00

  • Advanced bioinformatics methods for practical applications in proteomics.

    abstract::Mass spectrometry (MS)-based proteomics has undergone rapid advancements in recent years, creating challenging problems for bioinformatics. We focus on four aspects where bioinformatics plays a crucial role (and proteomics is needed for clinical application): peptide-spectra matching (PSM) based on the new data-indepe...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbx128

    authors: Goh WWB,Wong L

    更新日期:2019-01-18 00:00:00

  • Single-cell transcriptome-based multilayer network biomarker for predicting prognosis and therapeutic response of gliomas.

    abstract::Occurrence and development of cancers are governed by complex networks of interacting intercellular and intracellular signals. The technology of single-cell RNA sequencing (scRNA-seq) provides an unprecedented opportunity for dissecting the interplay between the cancer cells and the associated microenvironment. Here w...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbz040

    authors: Zhang J,Guan M,Wang Q,Zhang J,Zhou T,Sun X

    更新日期:2020-05-21 00:00:00

  • Computational prediction of species-specific yeast DNA replication origin via iterative feature representation.

    abstract::Deoxyribonucleic acid replication is one of the most crucial tasks taking place in the cell, and it has to be precisely regulated. This process is initiated in the replication origins (ORIs), and thus it is essential to identify such sites for a deeper understanding of the cellular processes and functions related to t...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa304

    authors: Manavalan B,Basith S,Shin TH,Lee G

    更新日期:2020-11-25 00:00:00

  • Towards deep phenotyping pregnancy: a systematic review on artificial intelligence and machine learning methods to improve pregnancy outcomes.

    abstract:OBJECTIVE:Development of novel informatics methods focused on improving pregnancy outcomes remains an active area of research. The purpose of this study is to systematically review the ways that artificial intelligence (AI) and machine learning (ML), including deep learning (DL), methodologies can inform patient care d...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa369

    authors: Davidson L,Boland MR

    更新日期:2021-01-06 00:00:00

  • A solid quality-control analysis of AB SOLiD short-read sequencing data.

    abstract::Next generation sequencers have greatly improved our ability to mine polymorphisms and mutations out of entire (or portions of) genomes. The reliability of their outputs, though, showed to be very related to the sequencing chemistry and to deeply affect the quality of the downstream analyses. We focus here on the two-...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbs048

    authors: Castellana S,Romani M,Valente EM,Mazza T

    更新日期:2013-11-01 00:00:00

  • Bioinformatics education--perspectives and challenges out of Africa.

    abstract::The discipline of bioinformatics has developed rapidly since the complete sequencing of the first genomes in the 1990s. The development of many high-throughput techniques during the last decades has ensured that bioinformatics has grown into a discipline that overlaps with, and is required for, the modern practice of ...

    journal_title:Briefings in bioinformatics

    pub_type: 历史文章,杂志文章

    doi:10.1093/bib/bbu022

    authors: Tastan Bishop Ö,Adebiyi EF,Alzohairy AM,Everett D,Ghedira K,Ghouila A,Kumuthini J,Mulder NJ,Panji S,Patterton HG,H3ABioNet Consortium.,H3Africa Consortium.

    更新日期:2015-03-01 00:00:00

  • Comparison and integration of computational methods for deleterious synonymous mutation prediction.

    abstract::Synonymous mutations do not change the encoded amino acids but may alter the structure or function of an mRNA in ways that impact gene function. Advances in next generation sequencing technologies have detected numerous synonymous mutations in the human genome. Several computational models have been proposed to predic...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbz047

    authors: Cheng N,Li M,Zhao L,Zhang B,Yang Y,Zheng CH,Xia J

    更新日期:2020-05-21 00:00:00

  • iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites.

    abstract::Regulation of proteolysis plays a critical role in a myriad of important cellular processes. The key to better understanding the mechanisms that control this process is to identify the specific substrates that each protease targets. To address this, we have developed iProt-Sub, a powerful bioinformatics tool for the a...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/bby028

    authors: Song J,Wang Y,Li F,Akutsu T,Rawlings ND,Webb GI,Chou KC

    更新日期:2019-03-25 00:00:00

  • Federating data with Information Integrator.

    abstract::Information Integrator is an extension to IBM's relational database DB2, which uses data federation to provide benefits to molecular biology researchers through two unique capabilities: increased flexibility in combining data from disparate sources, and SQL access to non-SQL data, easing the task of automating data an...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/4.4.375

    authors: Arenson AD

    更新日期:2003-12-01 00:00:00

  • Structural database resources for biological macromolecules.

    abstract::This Briefing reviews the widely used, currently active, up-to-date databases derived from the worldwide Protein Data Bank (PDB) to facilitate browsing, finding and exploring its entries. These databases contain visualization and analysis tools tailored to specific kinds of molecules and interactions, often including ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbw049

    authors: Abriata LA

    更新日期:2017-07-01 00:00:00