Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme.

Abstract:

BACKGROUND:Bioluminescent proteins (BLPs) widely exist in many living organisms. As BLPs are featured by the capability of emitting lights, they can be served as biomarkers and easily detected in biomedical research, such as gene expression analysis and signal transduction pathways. Therefore, accurate identification of BLPs is important for disease diagnosis and biomedical engineering. In this paper, we propose a novel accurate sequence-based method named PredBLP (Prediction of BioLuminescent Proteins) to predict BLPs. RESULTS:We collect a series of sequence-derived features, which have been proved to be involved in the structure and function of BLPs. These features include amino acid composition, dipeptide composition, sequence motifs and physicochemical properties. We further prove that the combination of four types of features outperforms any other combinations or individual features. To remove potential irrelevant or redundant features, we also introduce Fisher Markov Selector together with Sequential Backward Selection strategy to select the optimal feature subsets. Additionally, we design a lineage-specific scheme, which is proved to be more effective than traditional universal approaches. CONCLUSION:Experiment on benchmark datasets proves the robustness of PredBLP. We demonstrate that lineage-specific models significantly outperform universal ones. We also test the generalization capability of PredBLP based on independent testing datasets as well as newly deposited BLPs in UniProt. PredBLP is proved to be able to exceed many state-of-art methods. A web server named PredBLP, which implements the proposed method, is free available for academic use.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Zhang J,Chai H,Yang G,Ma Z

doi

10.1186/s12859-017-1709-6

subject

Has Abstract

pub_date

2017-06-05 00:00:00

pages

294

issue

1

issn

1471-2105

pii

10.1186/s12859-017-1709-6

journal_volume

18

pub_type

杂志文章
  • Correction to: Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage.

    abstract::Following publication of the original article [1], the author reported that there are several errors in the original article. ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章,已发布勘误

    doi:10.1186/s12859-019-3318-z

    authors: Ranjard L,Wong TKF,Rodrigo AG

    更新日期:2020-01-22 00:00:00

  • Building blocks for automated elucidation of metabolites: natural product-likeness for candidate ranking.

    abstract:BACKGROUND:In metabolomics experiments, spectral fingerprints of metabolites with no known structural identity are detected routinely. Computer-assisted structure elucidation (CASE) has been used to determine the structural identities of unknown compounds. It is generally accepted that a single 1D NMR spectrum or mass ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-234

    authors: Jayaseelan KV,Steinbeck C

    更新日期:2014-07-05 00:00:00

  • NeurphologyJ: an automatic neuronal morphology quantification method and its application in pharmacological discovery.

    abstract:BACKGROUND:Automatic quantification of neuronal morphology from images of fluorescence microscopy plays an increasingly important role in high-content screenings. However, there exist very few freeware tools and methods which provide automatic neuronal morphology quantification for pharmacological discovery. RESULTS:T...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-230

    authors: Ho SY,Chao CY,Huang HL,Chiu TW,Charoenkwan P,Hwang E

    更新日期:2011-06-08 00:00:00

  • BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features.

    abstract:BACKGROUND:Bioinformatics tools for automatic processing of biomedical literature are invaluable for both the design and interpretation of large-scale experiments. Many information extraction (IE) systems that incorporate natural language processing (NLP) techniques have thus been developed for use in the biomedical fi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-325

    authors: Tsai RT,Chou WC,Su YS,Lin YC,Sung CL,Dai HJ,Yeh IT,Ku W,Sung TY,Hsu WL

    更新日期:2007-09-01 00:00:00

  • A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis.

    abstract:BACKGROUND:Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to f...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-510

    authors: Liu B,Wang X,Lin L,Dong Q,Wang X

    更新日期:2008-12-01 00:00:00

  • Partial correlation analysis indicates causal relationships between GC-content, exon density and recombination rate in the human genome.

    abstract:BACKGROUND:Several features are known to correlate with the GC-content in the human genome, including recombination rate, gene density and distance to telomere. However, by testing for pairwise correlation only, it is impossible to distinguish direct associations from indirect ones and to distinguish between causes and...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S1-S66

    authors: Freudenberg J,Wang M,Yang Y,Li W

    更新日期:2009-01-30 00:00:00

  • Non-coding RNA detection methods combined to improve usability, reproducibility and precision.

    abstract:BACKGROUND:Non-coding RNAs gain more attention as their diverse roles in many cellular processes are discovered. At the same time, the need for efficient computational prediction of ncRNAs increases with the pace of sequencing technology. Existing tools are based on various approaches and techniques, but none of them p...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-491

    authors: Raasch P,Schmitz U,Patenge N,Vera J,Kreikemeyer B,Wolkenhauer O

    更新日期:2010-09-29 00:00:00

  • The development of a comparison approach for Illumina bead chips unravels unexpected challenges applying newest generation microarrays.

    abstract:BACKGROUND:The MAQC project demonstrated that microarrays with comparable content show inter- and intra-platform reproducibility. However, since the content of gene databases still increases, the development of new generations of microarrays covering new content is mandatory. To better understand the potential challeng...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-186

    authors: Eggle D,Debey-Pascher S,Beyer M,Schultze JL

    更新日期:2009-06-18 00:00:00

  • An improved distance measure between the expression profiles linking co-expression and co-regulation in mouse.

    abstract:BACKGROUND:Many statistical algorithms combine microarray expression data and genome sequence data to identify transcription factor binding motifs in the low eukaryotic genomes. Finding cis-regulatory elements in higher eukaryote genomes, however, remains a challenge, as searching in the promoter regions of genes with ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-44

    authors: Kim RS,Ji H,Wong WH

    更新日期:2006-01-26 00:00:00

  • GLIDERS--a web-based search engine for genome-wide linkage disequilibrium between HapMap SNPs.

    abstract:BACKGROUND:A number of tools for the examination of linkage disequilibrium (LD) patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb). We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search en...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-367

    authors: Lawrence R,Day-Williams AG,Mott R,Broxholme J,Cardon LR,Zeggini E

    更新日期:2009-10-31 00:00:00

  • Combining techniques for screening and evaluating interaction terms on high-dimensional time-to-event data.

    abstract:BACKGROUND:Molecular data, e.g. arising from microarray technology, is often used for predicting survival probabilities of patients. For multivariate risk prediction models on such high-dimensional data, there are established techniques that combine parameter estimation and variable selection. One big challenge is to i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-58

    authors: Sariyar M,Hoffmann I,Binder H

    更新日期:2014-02-26 00:00:00

  • Shared data science infrastructure for genomics data.

    abstract:BACKGROUND:Creating a scalable computational infrastructure to analyze the wealth of information contained in data repositories is difficult due to significant barriers in organizing, extracting and analyzing relevant data. Shared data science infrastructures like Boag is needed to efficiently process and parse data co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2967-2

    authors: Bagheri H,Muppirala U,Masonbrink RE,Severin AJ,Rajan H

    更新日期:2019-08-22 00:00:00

  • A novel parametric approach to mine gene regulatory relationship from microarray datasets.

    abstract:BACKGROUND:Microarray has been widely used to measure the gene expression level on the genome scale in the current decade. Many algorithms have been developed to reconstruct gene regulatory networks based on microarray data. Unfortunately, most of these models and algorithms focus on global properties of the expression...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S11-S15

    authors: Liu W,Li D,Liu Q,Zhu Y,He F

    更新日期:2010-12-14 00:00:00

  • Bacterial protein meta-interactomes predict cross-species interactions and protein function.

    abstract:BACKGROUND:Protein-protein interactions (PPIs) can offer compelling evidence for protein function, especially when viewed in the context of proteome-wide interactomes. Bacteria have been popular subjects of interactome studies: more than six different bacterial species have been the subjects of comprehensive interactom...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1585-0

    authors: Caufield JH,Wimble C,Shary S,Wuchty S,Uetz P

    更新日期:2017-03-16 00:00:00

  • A simple and robust method for connecting small-molecule drugs using gene-expression signatures.

    abstract:BACKGROUND:Interaction of a drug or chemical with a biological system can result in a gene-expression profile or signature characteristic of the event. Using a suitably robust algorithm these signatures can potentially be used to connect molecules with similar pharmacological or toxicological properties by gene express...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-258

    authors: Zhang SD,Gant TW

    更新日期:2008-06-02 00:00:00

  • Fregene: simulation of realistic sequence-level data in populations and ascertained samples.

    abstract:BACKGROUND:FREGENE simulates sequence-level data over large genomic regions in large populations. Because, unlike coalescent simulators, it works forwards through time, it allows complex scenarios of selection, demography, and recombination to be modelled simultaneously. Detailed tracking of sites under selection is im...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-364

    authors: Chadeau-Hyam M,Hoggart CJ,O'Reilly PF,Whittaker JC,De Iorio M,Balding DJ

    更新日期:2008-09-08 00:00:00

  • PoGO: Prediction of Gene Ontology terms for fungal proteins.

    abstract:BACKGROUND:Automated protein function prediction methods are the only practical approach for assigning functions to genes obtained from model organisms. Many of the previously reported function annotation methods are of limited utility for fungal protein annotation. They are often trained only to one species, are not a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-215

    authors: Jung J,Yi G,Sukno SA,Thon MR

    更新日期:2010-04-29 00:00:00

  • Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures.

    abstract:BACKGROUND:Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignme...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S1-S48

    authors: Saito Y,Sato K,Sakakibara Y

    更新日期:2011-02-15 00:00:00

  • Integrating gene expression and GO classification for PCA by preclustering.

    abstract:BACKGROUND:Gene expression data can be analyzed by summarizing groups of individual gene expression profiles based on GO annotation information. The mean expression profile per group can then be used to identify interesting GO categories in relation to the experimental settings. However, the expression profiles present...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-158

    authors: De Haan JR,Piek E,van Schaik RC,de Vlieg J,Bauerschmidt S,Buydens LM,Wehrens R

    更新日期:2010-03-26 00:00:00

  • OscoNet: inferring oscillatory gene networks.

    abstract:BACKGROUND:Oscillatory genes, with periodic expression at the mRNA and/or protein level, have been shown to play a pivotal role in many biological contexts. However, with the exception of the circadian clock and cell cycle, only a few such genes are known. Detecting oscillatory genes from snapshot single-cell experimen...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03561-y

    authors: Cutillo L,Boukouvalas A,Marinopoulou E,Papalopulu N,Rattray M

    更新日期:2020-08-21 00:00:00

  • High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID).

    abstract:BACKGROUND:We previously developed GoMiner, an application that organizes lists of 'interesting' genes (for example, under-and overexpressed genes from a microarray experiment) for biological interpretation in the context of the Gene Ontology. The original version of GoMiner was oriented toward visualization and interp...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-168

    authors: Zeeberg BR,Qin H,Narasimhan S,Sunshine M,Cao H,Kane DW,Reimers M,Stephens RM,Bryant D,Burt SK,Elnekave E,Hari DM,Wynn TA,Cunningham-Rundles C,Stewart DM,Nelson D,Weinstein JN

    更新日期:2005-07-05 00:00:00

  • Characterization of phylogenetic networks with NetTest.

    abstract:BACKGROUND:Typical evolutionary events like recombination, hybridization or gene transfer make necessary the use of phylogenetic networks to properly depict the evolution of DNA and protein sequences. Although several theoretical classes have been proposed to characterize these networks, they make stringent assumptions...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-268

    authors: Arenas M,Patricio M,Posada D,Valiente G

    更新日期:2010-05-20 00:00:00

  • Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection.

    abstract:BACKGROUND:When constructing new biomarker or gene signature scores for time-to-event outcomes, the underlying aims are to develop a discrimination model that helps to predict whether patients have a poor or good prognosis and to identify the most influential variables for this task. In practice, this is often done fit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1149-8

    authors: Mayr A,Hofner B,Schmid M

    更新日期:2016-07-22 00:00:00

  • Conceptual-level workflow modeling of scientific experiments using NMR as a case study.

    abstract:BACKGROUND:Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phase...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-31

    authors: Verdi KK,Ellis HJ,Gryk MR

    更新日期:2007-01-30 00:00:00

  • Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest.

    abstract:BACKGROUND:In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure. The width of the Gaus...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0526-z

    authors: Lee J,Lee K,Joung I,Joo K,Brooks BR,Lee J

    更新日期:2015-03-21 00:00:00

  • Effects of Mecp2 loss of function in embryonic cortical neurons: a bioinformatics strategy to sort out non-neuronal cells variability from transcriptome profiling.

    abstract:BACKGROUND:Mecp2 null mice model Rett syndrome (RTT) a human neurological disorder affecting females after apparent normal pre- and peri-natal developmental periods. Neuroanatomical studies in cerebral cortex of RTT mouse models revealed delayed maturation of neuronal morphology and autonomous as well as non-cell auton...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0859-7

    authors: Vacca M,Tripathi KP,Speranza L,Aiese Cigliano R,Scalabrì F,Marracino F,Madonna M,Sanseverino W,Perrone-Capano C,Guarracino MR,D'Esposito M

    更新日期:2016-01-20 00:00:00

  • Measure of synonymous codon usage diversity among genes in bacteria.

    abstract:BACKGROUND:In many bacteria, intragenomic diversity in synonymous codon usage among genes has been reported. However, no quantitative attempt has been made to compare the diversity levels among different genomes. Here, we introduce a mean dissimilarity-based index (Dmean) for quantifying the level of diversity in synon...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-167

    authors: Suzuki H,Saito R,Tomita M

    更新日期:2009-06-01 00:00:00

  • SPECS: a non-parametric method to identify tissue-specific molecular features for unbalanced sample groups.

    abstract:BACKGROUND:To understand biology and differences among various tissues or cell types, one typically searches for molecular features that display characteristic abundance patterns. Several specificity metrics have been introduced to identify tissue-specific molecular features, but these either require an equal number of...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3407-z

    authors: Everaert C,Volders PJ,Morlion A,Thas O,Mestdagh P

    更新日期:2020-02-17 00:00:00

  • MRCQuant- an accurate LC-MS relative isotopic quantification algorithm on TOF instruments.

    abstract:BACKGROUND:Relative isotope abundance quantification, which can be used for peptide identification and differential peptide quantification, plays an important role in liquid chromatography-mass spectrometry (LC-MS)-based proteomics. However, several major issues exist in the relative isotopic quantification of peptides...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-74

    authors: Haskins WE,Petritis K,Zhang J

    更新日期:2011-03-15 00:00:00

  • Graph-representation of oxidative folding pathways.

    abstract:BACKGROUND:The process of oxidative folding combines the formation of native disulfide bond with conformational folding resulting in the native three-dimensional fold. Oxidative folding pathways can be described in terms of disulfide intermediate species (DIS) which can also be isolated and characterized. Each DIS corr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-19

    authors: Agoston V,Cemazar M,Kaján L,Pongor S

    更新日期:2005-01-27 00:00:00