Abstract:
BACKGROUND:Identification of homologous genes is fundamental to comparative genomics, functional genomics and phylogenomics. Extensive public homology databases are of great value for investigating homology but need to be continually updated to incorporate new sequences. As new sequences are rapidly being generated, there is a need for efficient standalone tools to detect homologs in novel data. RESULTS:To address this, we present a fast method for detecting homology groups across a large number of individuals and/or species. We adopted a k-mer based approach which considerably reduces the number of pairwise protein alignments without sacrificing sensitivity. We demonstrate accuracy, scalability, efficiency and applicability of the presented method for detecting homology in large proteomes of bacteria, fungi, plants and Metazoa. CONCLUSIONS:We clearly observed the trade-off between recall and precision in our homology inference. Favoring recall or precision strongly depends on the application. The clustering behavior of our program can be optimized for particular applications by altering a few key parameters. The program is available for public use at https://github.com/sheikhizadeh/pantools as an extension to our pan-genomic analysis tool, PanTools.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Sheikhizadeh Anari S,de Ridder D,Schranz ME,Smit Sdoi
10.1186/s12859-018-2362-4subject
Has Abstractpub_date
2018-09-26 00:00:00pages
340issue
1issn
1471-2105pii
10.1186/s12859-018-2362-4journal_volume
19pub_type
杂志文章abstract:BACKGROUND:A massive amount of proteomic data is generated on a daily basis, nonetheless annotating all sequences is costly and often unfeasible. As a countermeasure, machine learning methods have been used to automatically annotate new protein functions. More specifically, many studies have investigated hierarchical m...
journal_title:BMC bioinformatics
pub_type: 杂志文章,评审
doi:10.1186/s12859-019-3060-6
更新日期:2019-09-23 00:00:00
abstract:BACKGROUND:Complex networks are studied across many fields of science and are particularly important to understand biological processes. Motifs in networks are small connected sub-graphs that occur significantly in higher frequencies than in random networks. They have recently gathered much attention as a useful concep...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-318
更新日期:2009-10-04 00:00:00
abstract:BACKGROUND:Mammalian germ cells undergo meiosis to produce sperm or eggs, haploid cells that are primed to meet and propagate life. Meiosis is initiated by retinoic acid and meiotic prophase is the first and most complex stage of meiosis when homologous chromosomes pair to exchange genetic information. Errors in meiosi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-72
更新日期:2013-02-27 00:00:00
abstract:BACKGROUND:Visualization tools for deep learning models typically focus on discovering key input features without considering how such low level features are combined in intermediate layers to make decisions. Moreover, many of these methods examine a network's response to specific input examples that may be insufficien...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2957-4
更新日期:2019-07-19 00:00:00
abstract:BACKGROUND:Innovations in biological and biomedical imaging produce complex high-content and multivariate image data. For decision-making and generation of hypotheses, scientists need novel information technology tools that enable them to visually explore and analyze the data and to discuss and communicate results or f...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-297
更新日期:2011-07-21 00:00:00
abstract:BACKGROUND:PSI-BLAST, an extremely popular tool for sequence similarity search, features the utilization of Position-Specific Scoring Matrix (PSSM) constructed from a multiple sequence alignment (MSA). PSSM allows the detection of more distant homologs than a general amino acid substitution matrix does. An accurate est...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1686-9
更新日期:2017-06-02 00:00:00
abstract:BACKGROUND:The information in large collections of phylogenetic trees is useful for many comparative genomic studies. Therefore, there is a need for flexible tools that allow exploration of such collections in order to retrieve relevant data as quickly as possible. RESULTS:In this paper, we present TPMS (Tree Pattern-...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-109
更新日期:2013-03-27 00:00:00
abstract:BACKGROUND:Analyses of molecular high-throughput data often lack in robustness, i.e. results are very sensitive to the addition or removal of a single observation. Therefore, the identification of extreme observations is an important step of quality control before doing further data analysis. Standard outlier detection...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1645-5
更新日期:2017-05-02 00:00:00
abstract:BACKGROUND:Mecp2 null mice model Rett syndrome (RTT) a human neurological disorder affecting females after apparent normal pre- and peri-natal developmental periods. Neuroanatomical studies in cerebral cortex of RTT mouse models revealed delayed maturation of neuronal morphology and autonomous as well as non-cell auton...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0859-7
更新日期:2016-01-20 00:00:00
abstract:BACKGROUND:We present a model for tagging gene and protein mentions from text using the probabilistic sequence tagging framework of conditional random fields (CRFs). Conditional random fields model the probability P(t/o) of a tag sequence given an observation sequence directly, and have previously been employed success...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-S1-S6
更新日期:2005-01-01 00:00:00
abstract:BACKGROUND:High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino ac...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-118
更新日期:2010-03-05 00:00:00
abstract:BACKGROUND:The analysis of sequence-structure relations of RNA is based on a specific notion and folding of RNA structure. The notion of coarse grained structure employed here is that of canonical RNA pseudoknot contact-structures with at most two mutually crossing bonds (3-noncrossing). These structures are folded by ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S1-S39
更新日期:2009-01-30 00:00:00
abstract:BACKGROUND:Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by simulation analysis. This study ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-278
更新日期:2005-11-23 00:00:00
abstract:BACKGROUND:Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. RESULTS:We present an R/Bioconductor port of a fast novel algorithm for...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-242
更新日期:2009-08-06 00:00:00
abstract:BACKGROUND:Bovine Viral Diarrhea Virus (BVDV) infection is widespread in cattle worldwide, causing important economic losses. Pathogenesis of the disease caused by BVDV is complex, as each BVDV strain has two biotypes: non-cytopathic (ncp) and cytopathic (cp). BVDV can cause a persistent latent infection and immune sup...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S6-S9
更新日期:2010-10-07 00:00:00
abstract:BACKGROUND:Meta-analysis (MA) is widely used to pool genome-wide association studies (GWASes) in order to a) increase the power to detect strong or weak genotype effects or b) as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-231
更新日期:2012-09-12 00:00:00
abstract:BACKGROUND:Two-dimensional electrophoresis (2DE) is one of the most widely applied techniques in comparative proteomics. The basic task of 2DE is to identify differential protein expression by quantitative analysis of 2DE images. To reduce the errors of spot quantification in 2DE images, a novel brightness correction m...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3432-y
更新日期:2020-03-19 00:00:00
abstract:BACKGROUND:While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We rep...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-356
更新日期:2006-07-25 00:00:00
abstract:BACKGROUND:To understand biological processes and diseases, it is crucial to unravel the concerted interplay of transcription factors (TFs), microRNAs (miRNAs) and their targets within regulatory networks and fundamental sub-networks. An integrative computational resource generating a comprehensive view of these regula...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-67
更新日期:2011-03-04 00:00:00
abstract:BACKGROUND:Microorganisms display vast diversity, and each one has its own set of genes, cell components and metabolic reactions. To assess their huge unexploited metabolic potential in different ecosystems, we need high throughput tools, such as functional microarrays, that allow the simultaneous analysis of thousands...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-478
更新日期:2010-09-23 00:00:00
abstract:BACKGROUND:The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform them in information easily accessible to biologists. RESULTS:By exploiting expression data publicly available in the Gene Expression Omnibus (GEO) d...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S1-S6
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:Routine application of gene expression microarray technology is rapidly producing large amounts of data that necessitate new approaches of analysis. The analysis of a specific microarray experiment profits enormously from cross-comparing to other experiments. This process is generally performed by numerical ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-S4-S14
更新日期:2005-12-01 00:00:00
abstract:BACKGROUND:A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. The increasing volume of the scientific literature overwhelms health care professionals trying to be kept up-to-date with all published studies on DDI. METHODS:This paper describes a hybrid linguistic approa...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S2-S1
更新日期:2011-03-29 00:00:00
abstract:BACKGROUND:Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction h...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-330
更新日期:2007-09-08 00:00:00
abstract:BACKGROUND:Many methods for species tree inference require data from a sufficiently large sample of genomic loci in order to produce accurate estimates. However, few studies have attempted to use analytical theory to quantify "sufficiently large". RESULTS:Using the multispecies coalescent model, we report a general an...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1266-4
更新日期:2016-11-11 00:00:00
abstract:BACKGROUND:The sequencing of many genomes and tiling arrays consisting of millions of DNA segments spanning entire genomes have made high-resolution copy number analysis possible. Microarray-based comparative genomic hybridization (array CGH) has enabled the high-resolution detection of DNA copy number aberrations. Whi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-203
更新日期:2007-06-14 00:00:00
abstract:BACKGROUND:Detection of genomic DNA copy number variations (CNVs) can provide a complete and more comprehensive view of human disease. It is interesting to identify and represent relevant CNVs from a genome-wide data due to high data volume and the complexity of interactions. RESULTS:In this paper, we incorporate the ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S5-S4
更新日期:2011-01-01 00:00:00
abstract:BACKGROUND:The Damerau-Levenshtein (DL) distance metric has been widely used in the biological science. It tries to identify the similar region of DNA,RNA and protein sequences by transforming one sequence to the another using the substitution, insertion, deletion and transposition operations. Lowrance and Wagner have ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3184-8
更新日期:2020-12-09 00:00:00
abstract:BACKGROUND:Network co-regulated modules are believed to have the functionality of packaging multiple biological entities, and can thus be assumed to coordinate many biological functions in their network neighbouring regions. RESULTS:Here, we weighted edges of a human protein interaction network and a transcriptional r...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-392
更新日期:2010-07-22 00:00:00
abstract:BACKGROUND:MicroRNAs (miRNA) are regulatory genes that target and repress other RNA molecules via sequence-specific binding. Several biological processes are regulated across many organisms by evolutionarily conserved miRNAs. Plants and invertebrates employ their miRNA in defense against viruses by targeting and degrad...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-S2-S3
更新日期:2013-01-01 00:00:00