当前位置： SCI文献检索 > BMC BIOINFORMATICS期刊下所有文献 > Improved identification of conserved cassette exons using Bayesian networks.

Improved identification of conserved cassette exons using Bayesian networks.

Abstract：

BACKGROUND:Alternative splicing is a major contributor to the diversity of eukaryotic transcriptomes and proteomes. Currently, large scale detection of alternative splicing using expressed sequence tags (ESTs) or microarrays does not capture all alternative splicing events. Moreover, for many species genomic data is being produced at a far greater rate than corresponding transcript data, hence in silico methods of predicting alternative splicing have to be improved. RESULTS:Here, we show that the use of Bayesian networks (BNs) allows accurate prediction of evolutionary conserved exon skipping events. At a stringent false positive rate of 0.5%, our BN achieves an improved true positive rate of 61%, compared to a previously reported 50% on the same dataset using support vector machines (SVMs). Incorporating several novel discriminative features such as intronic splicing regulatory elements leads to the improvement. Features related to mRNA secondary structure increase the prediction performance, corroborating previous findings that secondary structures are important for exon recognition. Random labelling tests rule out overfitting. Cross-validation on another dataset confirms the increased performance. When using the same dataset and the same set of features, the BN matches the performance of an SVM in earlier literature. Remarkably, we could show that about half of the exons which are labelled constitutive but receive a high probability of being alternative by the BN, are in fact alternative exons according to the latest EST data. Finally, we predict exon skipping without using conservation-based features, and achieve a true positive rate of 29% at a false positive rate of 0.5%. CONCLUSION:BNs can be used to achieve accurate identification of alternative exons and provide clues about possible dependencies between relevant features. The near-identical performance of the BN and SVM when using the same features shows that good classification depends more on features than on the choice of classifier. Conservation based features continue to be the most informative, and hence distinguishing alternative exons from constitutive ones without using conservation based features remains a challenging problem.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Sinha R,Hiller M,Pudimat R,Gausmann U,Platzer M,Backofen R

doi

10.1186/1471-2105-9-477

subject

Has Abstract

pub_date

2008-11-12 00:00:00

pages

477

issn

1471-2105

pii

1471-2105-9-477

journal_volume

pub_type

杂志文章

在线工具

Improved identification of conserved cassette exons using Bayesian networks.