Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM.

Abstract:

BACKGROUND:Identification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated method for reliably and quickly identifying recombination spots is thus urgently needed. RESULTS:Here we proposed a novel approach by fusing features from pseudo nucleic acid composition (PseNAC), including NAC, n-tier NAC and pseudo dinucleotide composition (PseDNC). A recursive feature extraction by linear kernel support vector machine (SVM) was then used to rank the integrated feature vectors and extract optimal features. SVM was adopted for identifying recombination spots based on these optimal features. To evaluate the performance of the proposed method, jackknife cross-validation test was employed on a benchmark dataset. The overall accuracy of this approach was 84.09%, which was higher (from 0.37% to 3.79%) than those of state-of-the-art tools. CONCLUSIONS:Comparison results suggested that linear kernel SVM is a useful vehicle for identifying recombination hot/cold spots.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Li L,Yu S,Xiao W,Li Y,Huang L,Zheng X,Zhou S,Yang H

doi

10.1186/1471-2105-15-340

subject

Has Abstract

pub_date

2014-11-20 00:00:00

pages

340

issn

1471-2105

pii

1471-2105-15-340

journal_volume

15

pub_type

杂志文章
  • LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

    abstract:BACKGROUND:A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sop...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1146-y

    authors: Vanhoutreve R,Kress A,Legrand B,Gass H,Poch O,Thompson JD

    更新日期:2016-07-07 00:00:00

  • Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie.

    abstract:BACKGROUND:Bioinformatics software quality assurance is essential in genomic medicine. Systematic verification and validation of bioinformatics software is difficult because it is often not possible to obtain a realistic "gold standard" for systematic evaluation. Here we apply a technique that originates from the softw...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S16-S15

    authors: Giannoulatou E,Park SH,Humphreys DT,Ho JW

    更新日期:2014-01-01 00:00:00

  • MultiDCoX: Multi-factor analysis of differential co-expression.

    abstract:BACKGROUND:Differential co-expression (DCX) signifies change in degree of co-expression of a set of genes among different biological conditions. It has been used to identify differential co-expression networks or interactomes. Many algorithms have been developed for single-factor differential co-expression analysis and...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1963-7

    authors: Liany H,Rajapakse JC,Karuturi RKM

    更新日期:2017-12-28 00:00:00

  • SCOPA and META-SCOPA: software for the analysis and aggregation of genome-wide association studies of multiple correlated phenotypes.

    abstract:BACKGROUND:Genome-wide association studies (GWAS) of single nucleotide polymorphisms (SNPs) have been successful in identifying loci contributing genetic effects to a wide range of complex human diseases and quantitative traits. The traditional approach to GWAS analysis is to consider each phenotype separately, despite...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1437-3

    authors: Mägi R,Suleimanov YV,Clarke GM,Kaakinen M,Fischer K,Prokopenko I,Morris AP

    更新日期:2017-01-11 00:00:00

  • A method of predicting changes in human gene splicing induced by genetic variants in context of cis-acting elements.

    abstract:BACKGROUND:Polymorphic variants and mutations disrupting canonical splicing isoforms are among the leading causes of human hereditary disorders. While there is a substantial evidence of aberrant splicing causing Mendelian diseases, the implication of such events in multi-genic disorders is yet to be well understood. We...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-22

    authors: Churbanov A,Vorechovský I,Hicks C

    更新日期:2010-01-12 00:00:00

  • Improving performance of mammalian microRNA target prediction.

    abstract:BACKGROUND:MicroRNAs (miRNAs) are single-stranded non-coding RNAs known to regulate a wide range of cellular processes by silencing the gene expression at the protein and/or mRNA levels. Computational prediction of miRNA targets is essential for elucidating the detailed functions of miRNA. However, the prediction speci...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-476

    authors: Liu H,Yue D,Chen Y,Gao SJ,Huang Y

    更新日期:2010-09-22 00:00:00

  • Combining techniques for screening and evaluating interaction terms on high-dimensional time-to-event data.

    abstract:BACKGROUND:Molecular data, e.g. arising from microarray technology, is often used for predicting survival probabilities of patients. For multivariate risk prediction models on such high-dimensional data, there are established techniques that combine parameter estimation and variable selection. One big challenge is to i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-58

    authors: Sariyar M,Hoffmann I,Binder H

    更新日期:2014-02-26 00:00:00

  • Knowledge-based variable selection for learning rules from proteomic data.

    abstract:BACKGROUND:The incorporation of biological knowledge can enhance the analysis of biomedical data. We present a novel method that uses a proteomic knowledge base to enhance the performance of a rule-learning algorithm in identifying putative biomarkers of disease from high-dimensional proteomic mass spectral data. In pa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S9-S16

    authors: Lustgarten JL,Visweswaran S,Bowser RP,Hogan WR,Gopalakrishnan V

    更新日期:2009-09-17 00:00:00

  • Stepwise kinetic equilibrium models of quantitative polymerase chain reaction.

    abstract:BACKGROUND:Numerous models for use in interpreting quantitative PCR (qPCR) data are present in recent literature. The most commonly used models assume the amplification in qPCR is exponential and fit an exponential model with a constant rate of increase to a select part of the curve. Kinetic theory may be used to model...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-203

    authors: Cobbs G

    更新日期:2012-08-16 00:00:00

  • Novel domain expansion methods to improve the computational efficiency of the Chemical Master Equation solution for large biological networks.

    abstract:BACKGROUND:Numerical solutions of the chemical master equation (CME) are important for understanding the stochasticity of biochemical systems. However, solving CMEs is a formidable task. This task is complicated due to the nonlinear nature of the reactions and the size of the networks which result in different realizat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03668-2

    authors: Kosarwal R,Kulasiri D,Samarasinghe S

    更新日期:2020-11-11 00:00:00

  • BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features.

    abstract:BACKGROUND:Bioinformatics tools for automatic processing of biomedical literature are invaluable for both the design and interpretation of large-scale experiments. Many information extraction (IE) systems that incorporate natural language processing (NLP) techniques have thus been developed for use in the biomedical fi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-325

    authors: Tsai RT,Chou WC,Su YS,Lin YC,Sung CL,Dai HJ,Yeh IT,Ku W,Sung TY,Hsu WL

    更新日期:2007-09-01 00:00:00

  • Fregene: simulation of realistic sequence-level data in populations and ascertained samples.

    abstract:BACKGROUND:FREGENE simulates sequence-level data over large genomic regions in large populations. Because, unlike coalescent simulators, it works forwards through time, it allows complex scenarios of selection, demography, and recombination to be modelled simultaneously. Detailed tracking of sites under selection is im...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-364

    authors: Chadeau-Hyam M,Hoggart CJ,O'Reilly PF,Whittaker JC,De Iorio M,Balding DJ

    更新日期:2008-09-08 00:00:00

  • Learning by aggregating experts and filtering novices: a solution to crowdsourcing problems in bioinformatics.

    abstract:BACKGROUND:In many biomedical applications, there is a need for developing classification models based on noisy annotations. Recently, various methods addressed this scenario by relaying on unreliable annotations obtained from multiple sources. RESULTS:We proposed a probabilistic classification algorithm based on labe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S12-S5

    authors: Zhang P,Cao W,Obradovic Z

    更新日期:2013-01-01 00:00:00

  • Widespread evidence of viral miRNAs targeting host pathways.

    abstract:BACKGROUND:MicroRNAs (miRNA) are regulatory genes that target and repress other RNA molecules via sequence-specific binding. Several biological processes are regulated across many organisms by evolutionarily conserved miRNAs. Plants and invertebrates employ their miRNA in defense against viruses by targeting and degrad...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S2-S3

    authors: Carl JW Jr,Trgovcich J,Hannenhalli S

    更新日期:2013-01-01 00:00:00

  • Modeling genomic data with type attributes, balancing stability and maintainability.

    abstract:BACKGROUND:Molecular biology (MB) is a dynamic research domain that benefits greatly from the use of modern software technology in preparing experiments, analyzing acquired data, and even performing "in-silico" analyses. As ever new findings change the face of this domain, software for MB has to be sufficiently flexibl...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-97

    authors: Busch N,Wedemann G

    更新日期:2009-03-27 00:00:00

  • BAGEL: a computational framework for identifying essential genes from pooled library screens.

    abstract:BACKGROUND:The adaptation of the CRISPR-Cas9 system to pooled library gene knockout screens in mammalian cells represents a major technological leap over RNA interference, the prior state of the art. New methods for analyzing the data and evaluating results are needed. RESULTS:We offer BAGEL (Bayesian Analysis of Gene...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1015-8

    authors: Hart T,Moffat J

    更新日期:2016-04-16 00:00:00

  • XLPM: efficient algorithm for the analysis of protein-protein contacts using chemical cross-linking mass spectrometry.

    abstract:BACKGROUND:Chemical cross-linking is used for protein-protein contacts mapping and for structural analysis. One of the difficulties in cross-linking studies is the analysis of mass-spectrometry data and the assignment of the site of cross-link incorporation. The difficulties are due to higher charges of fragment ions, ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S11-S16

    authors: Jaiswal M,Crabtree N,Bauer MA,Hall R,Raney KD,Zybailov BL

    更新日期:2014-01-01 00:00:00

  • Using mechanistic Bayesian networks to identify downstream targets of the sonic hedgehog pathway.

    abstract:BACKGROUND:The topology of a biological pathway provides clues as to how a pathway operates, but rationally using this topology information with observed gene expression data remains a challenge. RESULTS:We introduce a new general-purpose analytic method called Mechanistic Bayesian Networks (MBNs) that allows for the ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-433

    authors: Shah A,Tenzen T,McMahon AP,Woolf PJ

    更新日期:2009-12-18 00:00:00

  • Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data.

    abstract:BACKGROUND:Copy number alterations (CNAs), due to their large impact on the genome, have been an important contributing factor to oncogenesis and metastasis. Detecting genomic alterations from the shallow-sequencing data of a low-purity tumor sample remains a challenging task. RESULTS:We introduce Accucopy, a method t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03924-5

    authors: Fan X,Luo G,Huang YS

    更新日期:2021-01-15 00:00:00

  • Systematic integration of experimental data and models in systems biology.

    abstract:BACKGROUND:The behaviour of biological systems can be deduced from their mathematical models. However, multiple sources of data in diverse forms are required in the construction of a model in order to define its components and their biochemical reactions, and corresponding parameters. Automating the assembly and use of...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-582

    authors: Li P,Dada JO,Jameson D,Spasic I,Swainston N,Carroll K,Dunn W,Khan F,Malys N,Messiha HL,Simeonidis E,Weichart D,Winder C,Wishart J,Broomhead DS,Goble CA,Gaskell SJ,Kell DB,Westerhoff HV,Mendes P,Paton NW

    更新日期:2010-11-29 00:00:00

  • A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents.

    abstract:BACKGROUND:A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. The increasing volume of the scientific literature overwhelms health care professionals trying to be kept up-to-date with all published studies on DDI. METHODS:This paper describes a hybrid linguistic approa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S2-S1

    authors: Segura-Bedmar I,Martínez P,de Pablo-Sánchez C

    更新日期:2011-03-29 00:00:00

  • BicPAMS: software for biological data analysis with pattern-based biclustering.

    abstract:BACKGROUND:Biclustering has been largely applied for the unsupervised analysis of biological data, being recognised today as a key technique to discover putative modules in both expression data (subsets of genes correlated in subsets of conditions) and network data (groups of coherently interconnected biological entiti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1493-3

    authors: Henriques R,Ferreira FL,Madeira SC

    更新日期:2017-02-02 00:00:00

  • Protein complexes identification based on go attributed network embedding.

    abstract:BACKGROUND:Identifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics. Existing computational methods try to incorporate a variety of biological evidences to enhance the quality of predicted complexes. However, it is still a challenge to integrate diffe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2555-x

    authors: Xu B,Li K,Zheng W,Liu X,Zhang Y,Zhao Z,He Z

    更新日期:2018-12-20 00:00:00

  • CONFOLD2: improved contact-driven ab initio protein structure modeling.

    abstract:BACKGROUND:Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted cont...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2032-6

    authors: Adhikari B,Cheng J

    更新日期:2018-01-25 00:00:00

  • The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation.

    abstract:BACKGROUND:Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method. However, int...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-52

    authors: Yu C,Zavaljevski N,Desai V,Johnson S,Stevens FJ,Reifman J

    更新日期:2008-01-25 00:00:00

  • Island method for estimating the statistical significance of profile-profile alignment scores.

    abstract:BACKGROUND:In the last decade, a significant improvement in detecting remote similarity between protein sequences has been made by utilizing alignment profiles in place of amino-acid strings. Unfortunately, no analytical theory is available for estimating the significance of a gapped alignment of two profiles. Many exp...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-112

    authors: Poleksic A

    更新日期:2009-04-20 00:00:00

  • Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data.

    abstract:BACKGROUND:The ability to confidently predict health outcomes from gene expression would catalyze a revolution in molecular diagnostics. Yet, the goal of developing actionable, robust, and reproducible predictive signatures of phenotypes such as clinical outcome has not been attained in almost any disease area. Here, w...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3427-8

    authors: Smith AM,Walsh JR,Long J,Davis CB,Henstock P,Hodge MR,Maciejewski M,Mu XJ,Ra S,Zhao S,Ziemek D,Fisher CK

    更新日期:2020-03-20 00:00:00

  • Optimal sequencing depth design for whole genome re-sequencing in pigs.

    abstract:BACKGROUND:As whole-genome sequencing is becoming a routine technique, it is important to identify a cost-effective depth of sequencing for such studies. However, the relationship between sequencing depth and biological results from the aspects of whole-genome coverage, variant discovery power and the quality of varian...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3164-z

    authors: Jiang Y,Jiang Y,Wang S,Zhang Q,Ding X

    更新日期:2019-11-08 00:00:00

  • Comparing the performance of selected variant callers using synthetic data and genome segmentation.

    abstract:BACKGROUND:High-throughput sequencing has rapidly become an essential part of precision cancer medicine. But validating results obtained from analyzing and interpreting genomic data remains a rate-limiting factor. The gold standard, of course, remains manual validation by expert panels, which is not without its weaknes...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2440-7

    authors: Bian X,Zhu B,Wang M,Hu Y,Chen Q,Nguyen C,Hicks B,Meerzaman D

    更新日期:2018-11-19 00:00:00

  • Simultaneous fitting of real-time PCR data with efficiency of amplification modeled as Gaussian function of target fluorescence.

    abstract:BACKGROUND:In real-time PCR, it is necessary to consider the efficiency of amplification (EA) of amplicons in order to determine initial target levels properly. EAs can be deduced from standard curves, but these involve extra effort and cost and may yield invalid EAs. Alternatively, EA can be extracted from individual ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-95

    authors: Batsch A,Noetel A,Fork C,Urban A,Lazic D,Lucas T,Pietsch J,Lazar A,Schömig E,Gründemann D

    更新日期:2008-02-12 00:00:00