xHMMER3x2: Utilizing HMMER3's speed and HMMER2's sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation.

Abstract:

BACKGROUND:While the local-mode HMMER3 is notable for its massive speed improvement, the slower glocal-mode HMMER2 is more exact for domain annotation by enforcing full domain-to-sequence alignments. Since a unit of domain necessarily implies a unit of function, local-mode HMMER3 alone remains insufficient for precise function annotation tasks. In addition, the incomparable E-values for the same domain model by different HMMER builds create difficulty when checking for domain annotation consistency on a large-scale basis. RESULTS:In this work, both the speed of HMMER3 and glocal-mode alignment of HMMER2 are combined within the xHMMER3x2 framework for tackling the large-scale domain annotation task. Briefly, HMMER3 is utilized for initial domain detection so that HMMER2 can subsequently perform the glocal-mode, sequence-to-full-domain alignments for the detected HMMER3 hits. An E-value calibration procedure is required to ensure that the search space by HMMER2 is sufficiently replicated by HMMER3. We find that the latter is straightforwardly possible for ~80% of the models in the Pfam domain library (release 29). However in the case of the remaining ~20% of HMMER3 domain models, the respective HMMER2 counterparts are more sensitive. Thus, HMMER3 searches alone are insufficient to ensure sensitivity and a HMMER2-based search needs to be initiated. When tested on the set of UniProt human sequences, xHMMER3x2 can be configured to be between 7× and 201× faster than HMMER2, but with descending domain detection sensitivity from 99.8 to 95.7% with respect to HMMER2 alone; HMMER3's sensitivity was 95.7%. At extremes, xHMMER3x2 is either the slow glocal-mode HMMER2 or the fast HMMER3 with glocal-mode. Finally, the E-values to false-positive rates (FPR) mapping by xHMMER3x2 allows E-values of different model builds to be compared, so that any annotation discrepancies in a large-scale annotation exercise can be flagged for further examination by dissectHMMER. CONCLUSION:The xHMMER3x2 workflow allows large-scale domain annotation speed to be drastically improved over HMMER2 without compromising for domain-detection with regard to sensitivity and sequence-to-domain alignment incompleteness. The xHMMER3x2 code and its webserver (for Pfam release 27, 28 and 29) are freely available at http://xhmmer3x2.bii.a-star.edu.sg/ . REVIEWERS:Reviewed by Thomas Dandekar, L. Aravind, Oliviero Carugo and Shamil Sunyaev. For the full reviews, please go to the Reviewers' comments section.

journal_name

Biol Direct

journal_title

Biology direct

authors

Yap CK,Eisenhaber B,Eisenhaber F,Wong WC

doi

10.1186/s13062-016-0163-0

subject

Has Abstract

pub_date

2016-11-29 00:00:00

pages

63

issue

1

issn

1745-6150

pii

10.1186/s13062-016-0163-0

journal_volume

11

pub_type

杂志文章
  • Issues associated with the use of phosphospecific antibodies to localise active and inactive pools of GSK-3 in cells.

    abstract:BACKGROUND:Glycogen synthase kinase-3 (GSK-3) is a ubiquitously expressed serine/threonine (Ser/Thr) kinase comprising two isoforms, GSK-3α and GSK-3β. Both enzymes are similarly inactivated by serine phosphorylation (GSK-3α at Ser21 and GSK-3β at Ser9) and activated by tyrosine phosphorylation (GSK-3α at Tyr279 and GS...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-6-4

    authors: Campa VM,Kypta RM

    更新日期:2011-01-24 00:00:00

  • Hereditary profiles of disorderly transcription?

    abstract:BACKGROUND:Microscopic examination of living cells often reveals that cells from some cell strains appear to be in a permanent state of disarray without obvious reason. In all probability such a disorderly state affects cell functioning. The aim of this study was to establish whether a disorderly state could occur that...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-1-9

    authors: Simons JW

    更新日期:2006-04-02 00:00:00

  • MutL homologs in restriction-modification systems and the origin of eukaryotic MORC ATPases.

    abstract::The provenance and biochemical roles of eukaryotic MORC proteins have remained poorly understood since the discovery of their prototype MORC1, which is required for meiotic nuclear division in animals. The MORC family contains a combination of a gyrase, histidine kinase, and MutL (GHKL) and S5 domains that together co...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-3-8

    authors: Iyer LM,Abhiman S,Aravind L

    更新日期:2008-03-17 00:00:00

  • The fundamental units, processes and patterns of evolution, and the tree of life conundrum.

    abstract:BACKGROUND:The elucidation of the dominant role of horizontal gene transfer (HGT) in the evolution of prokaryotes led to a severe crisis of the Tree of Life (TOL) concept and intense debates on this subject. CONCEPT:Prompted by the crisis of the TOL, we attempt to define the primary units and the fundamental patterns ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-4-33

    authors: Koonin EV,Wolf YI

    更新日期:2009-09-29 00:00:00

  • Modeling the population dynamics of lemon sharks.

    abstract:BACKGROUND:Long-lived marine megavertebrates (e.g. sharks, turtles, mammals, and seabirds) are inherently vulnerable to anthropogenic mortality. Although some mathematical models have been applied successfully to manage these animals, more detailed treatments are often needed to assess potential drivers of population d...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-9-23

    authors: White ER,Nagy JD,Gruber SH

    更新日期:2014-11-18 00:00:00

  • A novel superfamily containing the beta-grasp fold involved in binding diverse soluble ligands.

    abstract:BACKGROUND:Domains containing the beta-grasp fold are utilized in a great diversity of physiological functions but their role, if any, in soluble or small molecule ligand recognition is poorly studied. RESULTS:Using sensitive sequence and structure similarity searches we identify a novel superfamily containing the bet...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-2-4

    authors: Burroughs AM,Balaji S,Iyer LM,Aravind L

    更新日期:2007-01-24 00:00:00

  • Is pre-Darwinian evolution plausible?

    abstract:BACKGROUND:This essay highlights critical aspects of the plausibility of pre-Darwinian evolution. It is based on a critical review of some better-known open, far-from-equilibrium system-based scenarios supposed to explain processes that took place before Darwinian evolution had emerged and that resulted in the origin o...

    journal_title:Biology direct

    pub_type: 杂志文章,评审

    doi:10.1186/s13062-018-0216-7

    authors: Tessera M

    更新日期:2018-09-21 00:00:00

  • The multiple personalities of Watson and Crick strands.

    abstract:BACKGROUND:In genetics it is customary to refer to double-stranded DNA as containing a "Watson strand" and a "Crick strand." However, there seems to be no consensus in the literature on the exact meaning of these two terms, and the many usages contradict one another as well as the original definition. Here, we review t...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-6-7

    authors: Cartwright RA,Graur D

    更新日期:2011-02-08 00:00:00

  • Strong association between pseudogenization mechanisms and gene sequence length.

    abstract:UNLABELLED:Pseudogenes arise from the decay of gene copies following either RNA-mediated duplication (processed pseudogenes) or DNA-mediated duplication (nonprocessed pseudogenes). Here, we show that long protein-coding genes tend to produce more nonprocessed pseudogenes than short genes, whereas the opposite is true f...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-4-38

    authors: Khachane AN,Harrison PM

    更新日期:2009-10-06 00:00:00

  • Comprehensive comparative-genomic analysis of type 2 toxin-antitoxin systems and related mobile stress response systems in prokaryotes.

    abstract:BACKGROUND:The prokaryotic toxin-antitoxin systems (TAS, also referred to as TA loci) are widespread, mobile two-gene modules that can be viewed as selfish genetic elements because they evolved mechanisms to become addictive for replicons and cells in which they reside, but also possess "normal" cellular functions in v...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-4-19

    authors: Makarova KS,Wolf YI,Koonin EV

    更新日期:2009-06-03 00:00:00

  • Biased gene transfer and its implications for the concept of lineage.

    abstract:BACKGROUND:In the presence of horizontal gene transfer (HGT), the concepts of lineage and genealogy in the microbial world become more ambiguous because chimeric genomes trace their ancestry from a myriad of sources, both living and extinct. RESULTS:We present the evolutionary histories of three aminoacyl-tRNA synthet...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-6-47

    authors: Andam CP,Gogarten JP

    更新日期:2011-09-23 00:00:00

  • Biochemistry and physiology within the framework of the extended synthesis of evolutionary biology.

    abstract::Functional biologists, like Claude Bernard, ask "How?", meaning that they investigate the mechanisms underlying the emergence of biological functions (proximal causes), while evolutionary biologists, like Charles Darwin, asks "Why?", meaning that they search the causes of adaptation, survival and evolution (remote cau...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-016-0109-6

    authors: Vianello A,Passamonti S

    更新日期:2016-02-09 00:00:00

  • Impairment of translation in neurons as a putative causative factor for autism.

    abstract:BACKGROUND:A dramatic increase in the prevalence of autism and Autistic Spectrum Disorders (ASD) has been observed over the last two decades in USA, Europe and Asia. Given the accumulating data on the possible role of translation in the etiology of ASD, we analyzed potential effects of rare synonymous substitutions ass...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-9-16

    authors: Poliakov E,Koonin EV,Rogozin IB

    更新日期:2014-07-10 00:00:00

  • The progene hypothesis: the nucleoprotein world and how life began.

    abstract::In this article, I review the results of studies on the origin of life distinct from the popular RNA world hypothesis. The alternate scenario postulates the origin of the first bimolecular genetic system (a polynucleotide gene and a polypeptide processive polymerase) with simultaneous replication and translation and i...

    journal_title:Biology direct

    pub_type: 杂志文章,评审

    doi:10.1186/s13062-015-0096-z

    authors: Altstein AD

    更新日期:2015-11-26 00:00:00

  • Unraveling the biochemistry and provenance of pupylation: a prokaryotic analog of ubiquitination.

    abstract:UNLABELLED:Recently Mycobacterium tuberculosis was shown to possess a novel protein modification, in which a small protein Pup is conjugated to the epsilon-amino groups of lysines in target proteins. Analogous to ubiquitin modification in eukaryotes, this remarkable modification recruits proteins for degradation via ar...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-3-45

    authors: Iyer LM,Burroughs AM,Aravind L

    更新日期:2008-11-03 00:00:00

  • Rotational restriction of nascent peptides as an essential element of co-translational protein folding: possible molecular players and structural consequences.

    abstract:BACKGROUND:A basic tenet of protein science is that all information about the spatial structure of proteins is present in their sequences. Nonetheless, many proteins fail to attain native structure upon experimental denaturation and refolding in vitro, raising the question of the specific role of cellular machinery in ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-017-0186-1

    authors: Sorokina I,Mushegian A

    更新日期:2017-05-31 00:00:00

  • Stable feature selection and classification algorithms for multiclass microarray data.

    abstract:BACKGROUND:Recent studies suggest that gene expression profiles are a promising alternative for clinical cancer classification. One major problem in applying DNA microarrays for classification is the dimension of obtained data sets. In this paper we propose a multiclass gene selection method based on Partial Least Squa...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-7-33

    authors: Student S,Fujarewicz K

    更新日期:2012-10-02 00:00:00

  • Domain enhanced lookup time accelerated BLAST.

    abstract:BACKGROUND:BLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated BLAST (PSI-BLAST) iteratively searches a protein sequence database, using the matches in round i to construct a position-specific...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-7-12

    authors: Boratyn GM,Schäffer AA,Agarwala R,Altschul SF,Lipman DJ,Madden TL

    更新日期:2012-04-17 00:00:00

  • Diverse bacterial genomes encode an operon of two genes, one of which is an unusual class-I release factor that potentially recognizes atypical mRNA signals other than normal stop codons.

    abstract:BACKGROUND:While all codons that specify amino acids are universally recognized by tRNA molecules, codons signaling termination of translation are recognized by proteins known as class-I release factors (RF). In most eukaryotes and archaea a single RF accomplishes termination at all three stop codons. In most bacteria,...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-1-28

    authors: Baranov PV,Vestergaard B,Hamelryck T,Gesteland RF,Nyborg J,Atkins JF

    更新日期:2006-09-13 00:00:00

  • LINEs of evidence: noncanonical DNA replication as an epigenetic determinant.

    abstract::LINE-1 (L1) retrotransposons are repetitive elements in mammalian genomes. They are capable of synthesizing DNA on their own RNA templates by harnessing reverse transcriptase (RT) that they encode. Abundantly expressed full-length L1s and their RT are found to globally influence gene expression profiles, differentiati...

    journal_title:Biology direct

    pub_type: 杂志文章,评审

    doi:10.1186/1745-6150-8-22

    authors: Belan E

    更新日期:2013-09-13 00:00:00

  • Elusive data underlying debate at the prokaryote-eukaryote divide.

    abstract:BACKGROUND:The origin of eukaryotic cells was an important transition in evolution. The factors underlying the origin and evolutionary success of the eukaryote lineage are still discussed. One camp argues that mitochondria were essential for eukaryote origin because of the unique configuration of internalized bioenerge...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-018-0221-x

    authors: Gerlitz M,Knopp M,Kapust N,Xavier JC,Martin WF

    更新日期:2018-10-03 00:00:00

  • The manoeuvrability hypothesis to explain the maintenance of bilateral symmetry in animal evolution.

    abstract:BACKGROUND:The overwhelming majority of animal species exhibit bilateral symmetry. However, the precise evolutionary importance of bilateral symmetry is unknown, although elements of the understanding of the phenomenon have been present within the scientific community for decades. PRESENTATION OF THE HYPOTHESIS:Here w...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-7-22

    authors: Holló G,Novák M

    更新日期:2012-07-12 00:00:00

  • Pseudo-chaotic oscillations in CRISPR-virus coevolution predicted by bifurcation analysis.

    abstract:BACKGROUND:The CRISPR-Cas systems of adaptive antivirus immunity are present in most archaea and many bacteria, and provide resistance to specific viruses or plasmids by inserting fragments of foreign DNA into the host genome and then utilizing transcripts of these spacers to inactivate the cognate foreign genome. The ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-9-13

    authors: Berezovskaya FS,Wolf YI,Koonin EV,Karev GP

    更新日期:2014-07-02 00:00:00

  • Systematic evaluation of supervised machine learning for sample origin prediction using metagenomic sequencing data.

    abstract:BACKGROUND:The advent of metagenomic sequencing provides microbial abundance patterns that can be leveraged for sample origin prediction. Supervised machine learning classification approaches have been reported to predict sample origin accurately when the origin has been previously sampled. Using metagenomic datasets p...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-020-00287-y

    authors: Chen JC,Tyler AD

    更新日期:2020-12-10 00:00:00

  • Pathophysiology of Crohn's disease inflammation and recurrence.

    abstract::Chron's Disease is a chronic inflammatory intestinal disease, first described at the beginning of the last century. The disease is characterized by the alternation of periods of flares and remissions influenced by a complex pathogenesis in which inflammation plays a key role. Crohn's disease evolution is mediated by a...

    journal_title:Biology direct

    pub_type: 杂志文章,评审

    doi:10.1186/s13062-020-00280-5

    authors: Petagna L,Antonelli A,Ganini C,Bellato V,Campanelli M,Divizia A,Efrati C,Franceschilli M,Guida AM,Ingallinella S,Montagnese F,Sensi B,Siragusa L,Sica GS

    更新日期:2020-11-07 00:00:00

  • A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis.

    abstract:BACKGROUND:Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches--the examination of similarities to known disease genes and/or the evaluation of functional annotation of...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-6-30

    authors: Lombard Z,Park C,Makova KD,Ramsay M

    更新日期:2011-06-13 00:00:00

  • Evolution before genes.

    abstract:BACKGROUND:Our current understanding of evolution is so tightly linked to template-dependent replication of DNA and RNA molecules that the old idea from Oparin of a self-reproducing 'garbage bag' ('coacervate') of chemicals that predated fully-fledged cell-like entities seems to be farfetched to most scientists today. ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-7-1

    authors: Vasas V,Fernando C,Santos M,Kauffman S,Szathmáry E

    更新日期:2012-01-05 00:00:00

  • The origins of phagocytosis and eukaryogenesis.

    abstract:BACKGROUND:Phagocytosis, that is, engulfment of large particles by eukaryotic cells, is found in diverse organisms and is often thought to be central to the very origin of the eukaryotic cell, in particular, for the acquisition of bacterial endosymbionts including the ancestor of the mitochondrion. RESULTS:Comparisons...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-4-9

    authors: Yutin N,Wolf MY,Wolf YI,Koonin EV

    更新日期:2009-02-26 00:00:00

  • Description of plant tRNA-derived RNA fragments (tRFs) associated with argonaute and identification of their putative targets.

    abstract::tRNA-derived RNA fragments (tRFs) are 19mer small RNAs that associate with Argonaute (AGO) proteins in humans. However, in plants, it is unknown if tRFs bind with AGO proteins. Here, using public deep sequencing libraries of immunoprecipitated Argonaute proteins (AGO-IP) and bioinformatics approaches, we identified th...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-8-6

    authors: Loss-Morais G,Waterhouse PM,Margis R

    更新日期:2013-02-12 00:00:00

  • Origin of the nuclear proteome on the basis of pre-existing nuclear localization signals in prokaryotic proteins.

    abstract:BACKGROUND:The origin of the selective nuclear protein import machinery, which consists of nuclear pore complexes and adaptor molecules interacting with the nuclear localization signals (NLSs) of cargo molecules, is one of the most important events in the evolution of eukaryotic cells. How proteins were selected for im...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-020-00263-6

    authors: Lisitsyna OM,Kurnaeva MA,Arifulin EA,Shubina MY,Musinova YR,Mironov AA,Sheval EV

    更新日期:2020-04-28 00:00:00