Reranking candidate gene models with cross-species comparison for improved gene prediction.

Abstract:

BACKGROUND:Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. RESULTS:We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. CONCLUSION:Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Liu Q,Crammer K,Pereira FC,Roos DS

doi

10.1186/1471-2105-9-433

subject

Has Abstract

pub_date

2008-10-14 00:00:00

pages

433

issn

1471-2105

pii

1471-2105-9-433

journal_volume

9

pub_type

杂志文章
  • Efficient use of unlabeled data for protein sequence classification: a comparative study.

    abstract:BACKGROUND:Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved acc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S4-S2

    authors: Kuksa P,Huang PH,Pavlovic V

    更新日期:2009-04-29 00:00:00

  • Construction and analysis of the protein-protein interaction networks for schizophrenia, bipolar disorder, and major depression.

    abstract:BACKGROUND:Schizophrenia, bipolar disorder, and major depression are devastating mental diseases, each with distinctive yet overlapping epidemiologic characteristics. Microarray and proteomics data have revealed genes which expressed abnormally in patients. Several single nucleotide polymorphisms (SNPs) and mutations a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S13-S20

    authors: Lee SA,Tsao TT,Yang KC,Lin H,Kuo YL,Hsu CH,Lee WK,Huang KC,Kao CY

    更新日期:2011-01-01 00:00:00

  • Multi-scale structural community organisation of the human genome.

    abstract:BACKGROUND:Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1616-x

    authors: Boulos RE,Tremblay N,Arneodo A,Borgnat P,Audit B

    更新日期:2017-04-11 00:00:00

  • A fast indexing approach for protein structure comparison.

    abstract:BACKGROUND:Protein structure comparison is a fundamental task in structural biology. While the number of known protein structures has grown rapidly over the last decade, searching a large database of protein structures is still relatively slow using existing methods. There is a need for new techniques which can rapidly...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S1-S46

    authors: Zhang L,Bailey J,Konagurthu AS,Ramamohanarao K

    更新日期:2010-01-18 00:00:00

  • MPAgenomics: an R package for multi-patient analysis of genomic markers.

    abstract:BACKGROUND:Last generations of Single Nucleotide Polymorphism (SNP) arrays allow to study copy-number variations in addition to genotyping measures. RESULTS:MPAgenomics, standing for multi-patient analysis (MPA) of genomic markers, is an R-package devoted to: (i) efficient segmentation and (ii) selection of genomic ma...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0394-y

    authors: Grimonprez Q,Celisse A,Blanck S,Cheok M,Figeac M,Marot G

    更新日期:2014-12-14 00:00:00

  • Effects of Mecp2 loss of function in embryonic cortical neurons: a bioinformatics strategy to sort out non-neuronal cells variability from transcriptome profiling.

    abstract:BACKGROUND:Mecp2 null mice model Rett syndrome (RTT) a human neurological disorder affecting females after apparent normal pre- and peri-natal developmental periods. Neuroanatomical studies in cerebral cortex of RTT mouse models revealed delayed maturation of neuronal morphology and autonomous as well as non-cell auton...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0859-7

    authors: Vacca M,Tripathi KP,Speranza L,Aiese Cigliano R,Scalabrì F,Marracino F,Madonna M,Sanseverino W,Perrone-Capano C,Guarracino MR,D'Esposito M

    更新日期:2016-01-20 00:00:00

  • OpenMS - an open-source software framework for mass spectrometry.

    abstract:BACKGROUND:Mass spectrometry is an essential analytical technique for high-throughput analysis in proteomics and metabolomics. The development of new separation techniques, precise mass analyzers and experimental protocols is a very active field of research. This leads to more complex experimental setups yielding ever ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-163

    authors: Sturm M,Bertsch A,Gröpl C,Hildebrandt A,Hussong R,Lange E,Pfeifer N,Schulz-Trieglaff O,Zerck A,Reinert K,Kohlbacher O

    更新日期:2008-03-26 00:00:00

  • BiPOm: a rule-based ontology to represent and infer molecule knowledge from a biological process-centered viewpoint.

    abstract:BACKGROUND:Managing and organizing biological knowledge remains a major challenge, due to the complexity of living systems. Recently, systemic representations have been promising in tackling such a challenge at the whole-cell scale. In such representations, the cell is considered as a system composed of interlocked sub...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03637-9

    authors: Henry V,Saïs F,Inizan O,Marchadier E,Dibie J,Goelzer A,Fromion V

    更新日期:2020-07-23 00:00:00

  • Pushing the accuracy limit of shape complementarity for protein-protein docking.

    abstract:BACKGROUND:Protein-protein docking is a valuable computational approach for investigating protein-protein interactions. Shape complementarity is the most basic component of a scoring function and plays an important role in protein-protein docking. Despite significant progresses, shape representation remains an open que...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3270-y

    authors: Yan Y,Huang SY

    更新日期:2019-12-24 00:00:00

  • KinMap: a web-based tool for interactive navigation through human kinome data.

    abstract:BACKGROUND:Annotations of the phylogenetic tree of the human kinome is an intuitive way to visualize compound profiling data, structural features of kinases or functional relationships within this important class of proteins. The increasing volume and complexity of kinase-related data underlines the need for a tool tha...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1433-7

    authors: Eid S,Turk S,Volkamer A,Rippmann F,Fulle S

    更新日期:2017-01-05 00:00:00

  • Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data.

    abstract:BACKGROUND:Copy number alterations (CNAs), due to their large impact on the genome, have been an important contributing factor to oncogenesis and metastasis. Detecting genomic alterations from the shallow-sequencing data of a low-purity tumor sample remains a challenging task. RESULTS:We introduce Accucopy, a method t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03924-5

    authors: Fan X,Luo G,Huang YS

    更新日期:2021-01-15 00:00:00

  • Visualizing complex feature interactions and feature sharing in genomic deep neural networks.

    abstract:BACKGROUND:Visualization tools for deep learning models typically focus on discovering key input features without considering how such low level features are combined in intermediate layers to make decisions. Moreover, many of these methods examine a network's response to specific input examples that may be insufficien...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2957-4

    authors: Liu G,Zeng H,Gifford DK

    更新日期:2019-07-19 00:00:00

  • RocSampler: regularizing overlapping protein complexes in protein-protein interaction networks.

    abstract:BACKGROUND:In recent years, protein-protein interaction (PPI) networks have been well recognized as important resources to elucidate various biological processes and cellular mechanisms. In this paper, we address the problem of predicting protein complexes from a PPI network. This problem has two difficulties. One is r...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1920-5

    authors: Maruyama O,Kuwahara Y

    更新日期:2017-12-06 00:00:00

  • NASQAR: a web-based platform for high-throughput sequencing data analysis and visualization.

    abstract:BACKGROUND:As high-throughput sequencing applications continue to evolve, the rapid growth in quantity and variety of sequence-based data calls for the development of new software libraries and tools for data analysis and visualization. Often, effective use of these tools requires computational skills beyond those of m...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03577-4

    authors: Yousif A,Drou N,Rowe J,Khalfan M,Gunsalus KC

    更新日期:2020-06-29 00:00:00

  • Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information.

    abstract:BACKGROUND:High-throughput technology allows for genome-wide measurements at different molecular levels for the same patient, e.g. single nucleotide polymorphisms (SNPs) and gene expression. Correspondingly, it might be beneficial to also integrate complementary information from different molecular levels when building...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1183-6

    authors: Hieke S,Benner A,Schlenl RF,Schumacher M,Bullinger L,Binder H

    更新日期:2016-08-30 00:00:00

  • Towards an automatic classification of protein structural domains based on structural similarity.

    abstract:BACKGROUND:Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dict...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-74

    authors: Sam V,Tai CH,Garnier J,Gibrat JF,Lee B,Munson PJ

    更新日期:2008-01-31 00:00:00

  • OmicsARules: a R package for integration of multi-omics datasets via association rules mining.

    abstract:BACKGROUND:The improvements of high throughput technologies have produced large amounts of multi-omics experiments datasets. Initial analysis of these data has revealed many concurrent gene alterations within single dataset or/and among multiple omics datasets. Although powerful bioinformatics pipelines have been devel...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3171-0

    authors: Chen D,Zhang F,Zhao Q,Xu J

    更新日期:2019-11-08 00:00:00

  • AntiBP2: improved version of antibacterial peptide prediction.

    abstract:BACKGROUND:Antibacterial peptides are one of the effecter molecules of innate immune system. Over the last few decades several antibacterial peptides have successfully approved as drug by FDA, which has prompted an interest in these antibacterial peptides. In our recent study we analyzed 999 antibacterial peptides, whi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S1-S19

    authors: Lata S,Mishra NK,Raghava GP

    更新日期:2010-01-18 00:00:00

  • Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes.

    abstract:BACKGROUND:A major goal of the analysis of high-dimensional RNA expression data from tumor tissue is to identify prognostic signatures for discriminating patient subgroups. For this purpose genome-wide identification of bimodally expressed genes from gene array data is relevant because distinguishability of high and lo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-276

    authors: Hellwig B,Hengstler JG,Schmidt M,Gehrmann MC,Schormann W,Rahnenführer J

    更新日期:2010-05-25 00:00:00

  • The G protein-coupled receptors in the pufferfish Takifugu rubripes.

    abstract:BACKGROUND:Guanine protein-coupled receptors (GPCRs) constitute a eukaryotic transmembrane protein family and function as "molecular switches" in the second messenger cascades and are found in all organisms between yeast and humans. They form the single, biggest drug-target family due to their versatility of action and...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S1-S3

    authors: Sarkar A,Kumar S,Sundar D

    更新日期:2011-02-15 00:00:00

  • TMB-Hunt: an amino acid composition based method to screen proteomes for beta-barrel transmembrane proteins.

    abstract:BACKGROUND:Beta-barrel transmembrane (bbtm) proteins are a functionally important and diverse group of proteins expressed in the outer membranes of bacteria (both gram negative and acid fast gram positive), mitochondria and chloroplasts. Despite recent publications describing reasonable levels of accuracy for discrimin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-56

    authors: Garrow AG,Agnew A,Westhead DR

    更新日期:2005-03-15 00:00:00

  • SLDR: a computational technique to identify novel genetic regulatory relationships.

    abstract::We developed a new computational technique called Step-Level Differential Response (SLDR) to identify genetic regulatory relationships. Our technique takes advantages of functional genomics data for the same species under different perturbation conditions, therefore complementary to current popular computational techn...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S11-S1

    authors: Yue Z,Wan P,Huang H,Xie Z,Chen JY

    更新日期:2014-01-01 00:00:00

  • Optimal neighborhood indexing for protein similarity search.

    abstract:BACKGROUND:Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional informa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-534

    authors: Peterlongo P,Noé L,Lavenier D,Nguyen VH,Kucherov G,Giraud M

    更新日期:2008-12-16 00:00:00

  • A framework for space-efficient read clustering in metagenomic samples.

    abstract:BACKGROUND:A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using reference genomes. Si...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1466-6

    authors: Alanko J,Cunial F,Belazzougui D,Mäkinen V

    更新日期:2017-03-14 00:00:00

  • Progressive multiple sequence alignment with indel evolution.

    abstract:BACKGROUND:Sequence alignment is crucial in genomics studies. However, optimal multiple sequence alignment (MSA) is NP-hard. Thus, modern MSA methods employ progressive heuristics, breaking the problem into a series of pairwise alignments guided by a phylogeny. Changes between homologous characters are typically modell...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2357-1

    authors: Maiolo M,Zhang X,Gil M,Anisimova M

    更新日期:2018-09-21 00:00:00

  • Bioinformatics approach to predict target genes for dysregulated microRNAs in hepatocellular carcinoma: study on a chemically-induced HCC mouse model.

    abstract:BACKGROUND:Hepatocellular carcinoma (HCC) is an aggressive epithelial tumor which shows very poor prognosis and high rate of recurrence, representing an urgent problem for public healthcare. MicroRNAs (miRNAs/miRs) are a class of small, non-coding RNAs that attract great attention because of their role in regulation of...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0836-1

    authors: Del Vecchio F,Gallo F,Di Marco A,Mastroiaco V,Caianiello P,Zazzeroni F,Alesse E,Tessitore A

    更新日期:2015-12-10 00:00:00

  • ProMEX: a mass spectral reference database for proteins and protein phosphorylation sites.

    abstract:BACKGROUND:In the last decade, techniques were established for the large scale genome-wide analysis of proteins, RNA, and metabolites, and database solutions have been developed to manage the generated data sets. The Golm Metabolome Database for metabolite data (GMD) represents one such effort to make these data broadl...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-216

    authors: Hummel J,Niemann M,Wienkoop S,Schulze W,Steinhauser D,Selbig J,Walther D,Weckwerth W

    更新日期:2007-06-23 00:00:00

  • MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data.

    abstract:BACKGROUND:Mass spectrometry (MS) coupled with online separation methods is commonly applied for differential and quantitative profiling of biological samples in metabolomic as well as proteomic research. Such approaches are used for systems biology, functional genomics, and biomarker discovery, among others. An ongoin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-395

    authors: Pluskal T,Castillo S,Villar-Briones A,Oresic M

    更新日期:2010-07-23 00:00:00

  • CNV-WebStore: online CNV analysis, storage and interpretation.

    abstract:BACKGROUND:Microarray technology allows the analysis of genomic aberrations at an ever increasing resolution, making functional interpretation of these vast amounts of data the main bottleneck in routine implementation of high resolution array platforms, and emphasising the need for a centralised and easy to use CNV da...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-4

    authors: Vandeweyer G,Reyniers E,Wuyts W,Rooms L,Kooy RF

    更新日期:2011-01-05 00:00:00

  • LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

    abstract:BACKGROUND:A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sop...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1146-y

    authors: Vanhoutreve R,Kress A,Legrand B,Gass H,Poch O,Thompson JD

    更新日期:2016-07-07 00:00:00