Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods.

Abstract:

BACKGROUND:Alternative splicing is the critical process in a single gene coding, which removes introns and joins exons, and splicing branchpoints are indicators for the alternative splicing. Wet experiments have identified a great number of human splicing branchpoints, but many branchpoints are still unknown. In order to guide wet experiments, we develop computational methods to predict human splicing branchpoints. RESULTS:Considering the fact that an intron may have multiple branchpoints, we transform the branchpoint prediction as the multi-label learning problem, and attempt to predict branchpoint sites from intron sequences. First, we investigate a variety of intron sequence-derived features, such as sparse profile, dinucleotide profile, position weight matrix profile, Markov motif profile and polypyrimidine tract profile. Second, we consider several multi-label learning methods: partial least squares regression, canonical correlation analysis and regularized canonical correlation analysis, and use them as the basic classification engines. Third, we propose two ensemble learning schemes which integrate different features and different classifiers to build ensemble learning systems for the branchpoint prediction. One is the genetic algorithm-based weighted average ensemble method; the other is the logistic regression-based ensemble method. CONCLUSIONS:In the computational experiments, two ensemble learning methods outperform benchmark branchpoint prediction methods, and can produce high-accuracy results on the benchmark dataset.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Zhang W,Zhu X,Fu Y,Tsuji J,Weng Z

doi

10.1186/s12859-017-1875-6

subject

Has Abstract

pub_date

2017-12-01 00:00:00

pages

464

issue

Suppl 13

issn

1471-2105

pii

10.1186/s12859-017-1875-6

journal_volume

18

pub_type

杂志文章
  • A unifying model of genome evolution under parsimony.

    abstract:BACKGROUND:Parsimony and maximum likelihood methods of phylogenetic tree estimation and parsimony methods for genome rearrangements are central to the study of genome evolution yet to date they have largely been pursued in isolation. RESULTS:We present a data structure called a history graph that offers a practical ba...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-206

    authors: Paten B,Zerbino DR,Hickey G,Haussler D

    更新日期:2014-06-19 00:00:00

  • GObar: a gene ontology based analysis and visualization tool for gene sets.

    abstract:BACKGROUND:Microarray experiments, as well as other genomic analyses, often result in large gene sets containing up to several hundred genes. The biological significance of such sets of genes is, usually, not readily apparent. Identification of the functions of the genes in the set can help highlight features of intere...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-189

    authors: Lee JS,Katari G,Sachidanandam R

    更新日期:2005-07-25 00:00:00

  • Bioinformatics Resource Manager: a systems biology web tool for microRNA and omics data integration.

    abstract:BACKGROUND:The Bioinformatics Resource Manager (BRM) is a web-based tool developed to facilitate identifier conversion and data integration for Homo sapiens (human), Mus musculus (mouse), Rattus norvegicus (rat), Danio rerio (zebrafish), and Macaca mulatta (macaque), as well as perform orthologous conversions among the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2805-6

    authors: Brown J,Phillips AR,Lewis DA,Mans MA,Chang Y,Tanguay RL,Peterson ES,Waters KM,Tilton SC

    更新日期:2019-05-17 00:00:00

  • BicPAMS: software for biological data analysis with pattern-based biclustering.

    abstract:BACKGROUND:Biclustering has been largely applied for the unsupervised analysis of biological data, being recognised today as a key technique to discover putative modules in both expression data (subsets of genes correlated in subsets of conditions) and network data (groups of coherently interconnected biological entiti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1493-3

    authors: Henriques R,Ferreira FL,Madeira SC

    更新日期:2017-02-02 00:00:00

  • BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features.

    abstract:BACKGROUND:Bioinformatics tools for automatic processing of biomedical literature are invaluable for both the design and interpretation of large-scale experiments. Many information extraction (IE) systems that incorporate natural language processing (NLP) techniques have thus been developed for use in the biomedical fi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-325

    authors: Tsai RT,Chou WC,Su YS,Lin YC,Sung CL,Dai HJ,Yeh IT,Ku W,Sung TY,Hsu WL

    更新日期:2007-09-01 00:00:00

  • Directed acyclic graph kernels for structural RNA analysis.

    abstract:BACKGROUND:Recent discoveries of a large variety of important roles for non-coding RNAs (ncRNAs) have been reported by numerous researchers. In order to analyze ncRNAs by kernel methods including support vector machines, we propose stem kernels as an extension of string kernels for measuring the similarities between tw...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-318

    authors: Sato K,Mituyama T,Asai K,Sakakibara Y

    更新日期:2008-07-22 00:00:00

  • Software for selecting the most informative sets of genomic loci for multi-target microbial typing.

    abstract:BACKGROUND:High-throughput sequencing can identify numerous potential genomic targets for microbial strain typing, but identification of the most informative combinations requires the use of computational screening tools. This paper describes novel software-- Automated Selection of Typing Target Subsets (AuSeTTS)--that...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-148

    authors: O'Sullivan MV,Sintchenko V,Gilbert GL

    更新日期:2013-05-01 00:00:00

  • MGOGP: a gene module-based heuristic algorithm for cancer-related gene prioritization.

    abstract:BACKGROUND:Prioritizing genes according to their associations with a cancer allows researchers to explore genes in more informed ways. By far, Gene-centric or network-centric gene prioritization methods are predominated. Genes and their protein products carry out cellular processes in the context of functional modules....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2216-0

    authors: Su L,Liu G,Bai T,Meng X,Ma Q

    更新日期:2018-06-05 00:00:00

  • MTAP: the motif tool assessment platform.

    abstract:BACKGROUND:In recent years, substantial effort has been applied to de novo regulatory motif discovery. At this time, more than 150 software tools exist to detect regulatory binding sites given a set of genomic sequences. As the number of software packages increases, it becomes more important to identify the tools with ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S9-S6

    authors: Quest D,Dempsey K,Shafiullah M,Bastola D,Ali H

    更新日期:2008-08-12 00:00:00

  • Simple binary segmentation frameworks for identifying variation in DNA copy number.

    abstract:BACKGROUND:Variation in DNA copy number, due to gains and losses of chromosome segments, is common. A first step for analyzing DNA copy number data is to identify amplified or deleted regions in individuals. To locate such regions, we propose a circular binary segmentation procedure, which is based on a sequence of nes...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-277

    authors: Yang TY

    更新日期:2012-10-30 00:00:00

  • Structural analysis on mutation residues and interfacial water molecules for human TIM disease understanding.

    abstract:BACKGROUND:Human triosephosphate isomerase (HsTIM) deficiency is a genetic disease caused often by the pathogenic mutation E104D. This mutation, located at the side of an abnormally large cluster of water in the inter-subunit interface, reduces the thermostability of the enzyme. Why and how these water molecules are di...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S16-S11

    authors: Li Z,He Y,Liu Q,Zhao L,Wong L,Kwoh CK,Nguyen H,Li J

    更新日期:2013-01-01 00:00:00

  • Promoter prediction in E. coli based on SIDD profiles and Artificial Neural Networks.

    abstract:BACKGROUND:One of the major challenges in biology is the correct identification of promoter regions. Computational methods based on motif searching have been the traditional approach taken. Recent studies have shown that DNA structural properties, such as curvature, stacking energy, and stress-induced duplex destabiliz...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S6-S17

    authors: Bland C,Newsome AS,Markovets AA

    更新日期:2010-10-07 00:00:00

  • MCA: Multiresolution Correlation Analysis, a graphical tool for subpopulation identification in single-cell gene expression data.

    abstract:BACKGROUND:Biological data often originate from samples containing mixtures of subpopulations, corresponding e.g. to distinct cellular phenotypes. However, identification of distinct subpopulations may be difficult if biological measurements yield distributions that are not easily separable. RESULTS:We present Multire...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-240

    authors: Feigelman J,Theis FJ,Marr C

    更新日期:2014-07-11 00:00:00

  • WellInverter: a web application for the analysis of fluorescent reporter gene data.

    abstract:BACKGROUND:Fluorescent reporter genes have become widely used for monitoring gene expression in living cells. When a microbial strain carrying a reporter gene is grown in a microplate reader, the fluorescence and the absorbance (optical density) of the culture can be automatically measured every few minutes in a highly...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2920-4

    authors: Martin Y,Page M,Blanchet C,de Jong H

    更新日期:2019-06-11 00:00:00

  • Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes.

    abstract:BACKGROUND:The landscape of biological and biomedical research is being changed rapidly with the invention of microarrays which enables simultaneous view on the transcription levels of a huge number of genes across different experimental conditions or time points. Using microarray data sets, clustering algorithms have ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-27

    authors: Maulik U,Mukhopadhyay A,Bandyopadhyay S

    更新日期:2009-01-20 00:00:00

  • A novel method to identify high order gene-gene interactions in genome-wide association studies: gene-based MDR.

    abstract:BACKGROUND:Because common complex diseases are affected by multiple genes and environmental factors, it is essential to investigate gene-gene and/or gene-environment interactions to understand genetic architecture of complex diseases. After the great success of large scale genome-wide association (GWA) studies using th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S9-S5

    authors: Oh S,Lee J,Kwon MS,Weir B,Ha K,Park T

    更新日期:2012-06-11 00:00:00

  • Image-based classification of plant genus and family for trained and untrained plant species.

    abstract:BACKGROUND:Modern plant taxonomy reflects phylogenetic relationships among taxa based on proposed morphological and genetic similarities. However, taxonomical relation is not necessarily reflected by close overall resemblance, but rather by commonality of very specific morphological characters or similarity on the mole...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2474-x

    authors: Seeland M,Rzanny M,Boho D,Wäldchen J,Mäder P

    更新日期:2019-01-03 00:00:00

  • A simple and robust method for connecting small-molecule drugs using gene-expression signatures.

    abstract:BACKGROUND:Interaction of a drug or chemical with a biological system can result in a gene-expression profile or signature characteristic of the event. Using a suitably robust algorithm these signatures can potentially be used to connect molecules with similar pharmacological or toxicological properties by gene express...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-258

    authors: Zhang SD,Gant TW

    更新日期:2008-06-02 00:00:00

  • A two-phase procedure for non-normal quantitative trait genetic association study.

    abstract:BACKGROUND:The nonparametric trend test (NPT) is well suitable for identifying the genetic variants associated with quantitative traits when the trait values do not satisfy the normal distribution assumption. If the genetic model, defined according to the mode of inheritance, is known, the NPT derived under the given g...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-0888-x

    authors: Zhang W,Li H,Li Z,Li Q

    更新日期:2016-01-28 00:00:00

  • Augmented annotation and orthologue analysis for Oryctolagus cuniculus: Better Bunny.

    abstract:BACKGROUND:The rabbit is an important model organism used in a wide range of biomedical research. However, the rabbit genome is still sparsely annotated, thus prohibiting extensive functional analysis of gene sets derived from whole-genome experiments. We developed a web-based application that provides augmented annota...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-84

    authors: Craig DB,Kannan S,Dombkowski AA

    更新日期:2012-05-08 00:00:00

  • MetaMIS: a metagenomic microbial interaction simulator based on microbial community profiles.

    abstract:BACKGROUND:The complexity and dynamics of microbial communities are major factors in the ecology of a system. With the NGS technique, metagenomics data provides a new way to explore microbial interactions. Lotka-Volterra models, which have been widely used to infer animal interactions in dynamic systems, have recently ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1359-0

    authors: Shaw GT,Pao YY,Wang D

    更新日期:2016-11-25 00:00:00

  • Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection.

    abstract:BACKGROUND:Several studies demonstrated the feasibility of predicting bacterial antibiotic resistance phenotypes from whole-genome sequences, the prediction process usually amounting to detecting the presence of genes involved in antibiotic resistance mechanisms, or of specific mutations, previously identified from a t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2403-z

    authors: Mahé P,Tournoud M

    更新日期:2018-10-17 00:00:00

  • Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets.

    abstract:BACKGROUND:Shotgun metagenomics based on untargeted sequencing can explore the taxonomic profile and the function of unknown microorganisms in samples, and complement the shortage of amplicon sequencing. Binning assembled sequences into individual groups, which represent microbial genomes, is the key step and a major c...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03667-3

    authors: Yue Y,Huang H,Qi Z,Dou HM,Liu XY,Han TF,Chen Y,Song XJ,Zhang YH,Tu J

    更新日期:2020-07-28 00:00:00

  • Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013.

    abstract:BACKGROUND:Since their introduction in 2009, the BioNLP Shared Task events have been instrumental in advancing the development of methods and resources for the automatic extraction of information from the biomedical literature. In this paper, we present the Cancer Genetics (CG) and Pathway Curation (PC) tasks, two even...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S10-S2

    authors: Pyysalo S,Ohta T,Rak R,Rowley A,Chun HW,Jung SJ,Choi SP,Tsujii J,Ananiadou S

    更新日期:2015-01-01 00:00:00

  • Network-based group variable selection for detecting expression quantitative trait loci (eQTL).

    abstract:BACKGROUND:Analysis of expression quantitative trait loci (eQTL) aims to identify the genetic loci associated with the expression level of genes. Penalized regression with a proper penalty is suitable for the high-dimensional biological data. Its performance should be enhanced when we incorporate biological knowledge o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-269

    authors: Wang W,Zhang X

    更新日期:2011-06-30 00:00:00

  • Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays.

    abstract:BACKGROUND:Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematica...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-344

    authors: Chain B,Bowen H,Hammond J,Posch W,Rasaiyaah J,Tsang J,Noursadeghi M

    更新日期:2010-06-24 00:00:00

  • A computational diffusion model to study antibody transport within reconstructed tumor microenvironments.

    abstract:BACKGROUND:Antibodies revolutionized cancer treatment over the past decades. Despite their successfully application, there are still challenges to overcome to improve efficacy, such as the heterogeneous distribution of antibodies within tumors. Tumor microenvironment features, such as the distribution of tumor and othe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03854-2

    authors: Cartaxo AL,Almeida J,Gualda EJ,Marsal M,Loza-Alvarez P,Brito C,Isidro IA

    更新日期:2020-11-17 00:00:00

  • Calibration and assessment of channel-specific biases in microarray data with extended dynamical range.

    abstract:BACKGROUND:Non-linearities in observed log-ratios of gene expressions, also known as intensity dependent log-ratios, can often be accounted for by global biases in the two channels being compared. Any step in a microarray process may introduce such offsets and in this article we study the biases introduced by the micro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-177

    authors: Bengtsson H,Jönsson G,Vallon-Christersson J

    更新日期:2004-11-12 00:00:00

  • Shared data science infrastructure for genomics data.

    abstract:BACKGROUND:Creating a scalable computational infrastructure to analyze the wealth of information contained in data repositories is difficult due to significant barriers in organizing, extracting and analyzing relevant data. Shared data science infrastructures like Boag is needed to efficiently process and parse data co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2967-2

    authors: Bagheri H,Muppirala U,Masonbrink RE,Severin AJ,Rajan H

    更新日期:2019-08-22 00:00:00

  • Rearrangement analysis of multiple bacterial genomes.

    abstract:BACKGROUND:Genomes are subjected to rearrangements that change the orientation and ordering of genes during evolution. The most common rearrangements that occur in uni-chromosomal genomes are inversions (or reversals) to adapt to the changing environment. Since genome rearrangements are rarer than point mutations, gene...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3293-4

    authors: Noureen M,Tada I,Kawashima T,Arita M

    更新日期:2019-12-27 00:00:00