Approaching the taxonomic affiliation of unidentified sequences in public databases--an example from the mycorrhizal fungi.

Abstract:

BACKGROUND:During the last few years, DNA sequence analysis has become one of the primary means of taxonomic identification of species, particularly so for species that are minute or otherwise lack distinct, readily obtainable morphological characters. Although the number of sequences available for comparison in public databases such as GenBank increases exponentially, only a minuscule fraction of all organisms have been sequenced, leaving taxon sampling a momentous problem for sequence-based taxonomic identification. When querying GenBank with a set of unidentified sequences, a considerable proportion typically lack fully identified matches, forming an ever-mounting pile of sequences that the researcher will have to monitor manually in the hope that new, clarifying sequences have been submitted by other researchers. To alleviate these concerns, a project to automatically monitor select unidentified sequences in GenBank for taxonomic progress through repeated local BLAST searches was initiated. Mycorrhizal fungi--a field where species identification often is prohibitively complex--and the much used ITS locus were chosen as test bed. RESULTS:A Perl script package called emerencia is presented. On a regular basis, it downloads select sequences from GenBank, separates the identified sequences from those insufficiently identified, and performs BLAST searches between these two datasets, storing all results in an SQL database. On the accompanying web-service http://emerencia.math.chalmers.se, users can monitor the taxonomic progress of insufficiently identified sequences over time, either through active searches or by signing up for e-mail notification upon disclosure of better matches. Other search categories, such as listing all insufficiently identified sequences (and their present best fully identified matches) publication-wise, are also available. DISCUSSION:The ever-increasing use of DNA sequences for identification purposes largely falls back on the assumption that public sequence databases contain a thorough sampling of taxonomically well-annotated sequences. Taxonomy, held by some to be an old-fashioned trade, has accordingly never been more important. emerencia does not automate the taxonomic process, but it does allow researchers to focus their efforts elsewhere than countless manual BLAST runs and arduous sieving of BLAST hit lists. The emerencia system is available on an open source basis for local installation with any organism and gene group as targets.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Nilsson RH,Kristiansson E,Ryberg M,Larsson KH

doi

10.1186/1471-2105-6-178

keywords:

subject

Has Abstract

pub_date

2005-07-18 00:00:00

pages

178

issn

1471-2105

pii

1471-2105-6-178

journal_volume

6

pub_type

杂志文章
  • Analyzing miRNA co-expression networks to explore TF-miRNA regulation.

    abstract:BACKGROUND:Current microRNA (miRNA) research in progress has engendered rapid accumulation of expression data evolving from microarray experiments. Such experiments are generally performed over different tissues belonging to a specific species of metazoan. For disease diagnosis, microarray probes are also prepared with...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-163

    authors: Bandyopadhyay S,Bhattacharyya M

    更新日期:2009-05-28 00:00:00

  • DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark.

    abstract:BACKGROUND:XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts. A more scalable implementation would reduce the need for specialized computational resources and enable increased exploration of the configuration parame...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3108-7

    authors: Linderman MD,Chia D,Wallace F,Nothaft FA

    更新日期:2019-10-11 00:00:00

  • ModuleOrganizer: detecting modules in families of transposable elements.

    abstract:BACKGROUND:Most known eukaryotic genomes contain mobile copied elements called transposable elements. In some species, these elements account for the majority of the genome sequence. They have been subject to many mutations and other genomic events (copies, deletions, captures) during transposition. The identification ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-474

    authors: Tempel S,Rousseau C,Tahi F,Nicolas J

    更新日期:2010-09-22 00:00:00

  • The G protein-coupled receptors in the pufferfish Takifugu rubripes.

    abstract:BACKGROUND:Guanine protein-coupled receptors (GPCRs) constitute a eukaryotic transmembrane protein family and function as "molecular switches" in the second messenger cascades and are found in all organisms between yeast and humans. They form the single, biggest drug-target family due to their versatility of action and...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S1-S3

    authors: Sarkar A,Kumar S,Sundar D

    更新日期:2011-02-15 00:00:00

  • Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data.

    abstract:BACKGROUND:Microarray technology has become very popular for globally evaluating gene expression in biological samples. However, non-linear variation associated with the technology can make data interpretation unreliable. Therefore, methods to correct this kind of technical variation are critical. Here we consider a me...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-520

    authors: Pelz CR,Kulesz-Martin M,Bagby G,Sears RC

    更新日期:2008-12-04 00:00:00

  • Genoviz Software Development Kit: Java tool kit for building genomics visualization applications.

    abstract:BACKGROUND:Visualization software can expose previously undiscovered patterns in genomic data and advance biological science. RESULTS:The Genoviz Software Development Kit (SDK) is an open source, Java-based framework designed for rapid assembly of visualization software applications for genomics. The Genoviz SDK frame...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-266

    authors: Helt GA,Nicol JW,Erwin E,Blossom E,Blanchard SG Jr,Chervitz SA,Harmon C,Loraine AE

    更新日期:2009-08-25 00:00:00

  • Combining techniques for screening and evaluating interaction terms on high-dimensional time-to-event data.

    abstract:BACKGROUND:Molecular data, e.g. arising from microarray technology, is often used for predicting survival probabilities of patients. For multivariate risk prediction models on such high-dimensional data, there are established techniques that combine parameter estimation and variable selection. One big challenge is to i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-58

    authors: Sariyar M,Hoffmann I,Binder H

    更新日期:2014-02-26 00:00:00

  • Automated modelling of signal transduction networks.

    abstract:BACKGROUND:Intracellular signal transduction is achieved by networks of proteins and small molecules that transmit information from the cell surface to the nucleus, where they ultimately effect transcriptional changes. Understanding the mechanisms cells use to accomplish this important process requires a detailed molec...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-3-34

    authors: Steffen M,Petti A,Aach J,D'haeseleer P,Church G

    更新日期:2002-11-01 00:00:00

  • An unsupervised classification scheme for improving predictions of prokaryotic TIS.

    abstract:BACKGROUND:Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-121

    authors: Tech M,Meinicke P

    更新日期:2006-03-09 00:00:00

  • NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data.

    abstract:BACKGROUND:Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is ne...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2207-1

    authors: Fang L,Hu J,Wang D,Wang K

    更新日期:2018-05-23 00:00:00

  • Comparative evaluation of gene set analysis approaches for RNA-Seq data.

    abstract:BACKGROUND:Over the last few years transcriptome sequencing (RNA-Seq) has almost completely taken over microarrays for high-throughput studies of gene expression. Currently, the most popular use of RNA-Seq is to identify genes which are differentially expressed between two or more conditions. Despite the importance of ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0397-8

    authors: Rahmatallah Y,Emmert-Streib F,Glazko G

    更新日期:2014-12-05 00:00:00

  • Algebraic Dynamic Programming over general data structures.

    abstract:BACKGROUND:Dynamic programming algorithms provide exact solutions to many problems in computational biology, such as sequence alignment, RNA folding, hidden Markov models (HMMs), and scoring of phylogenetic trees. Structurally analogous algorithms compute optimal solutions, evaluate score distributions, and perform sto...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S19-S2

    authors: zu Siederdissen CH,Prohaska SJ,Stadler PF

    更新日期:2015-01-01 00:00:00

  • CNV-seq, a new method to detect copy number variation using high-throughput sequencing.

    abstract:BACKGROUND:DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations. RESULTS:Here, we describe a method to detect copy number varia...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-80

    authors: Xie C,Tammi MT

    更新日期:2009-03-06 00:00:00

  • Prediction of heart disease and classifiers' sensitivity analysis.

    abstract:BACKGROUND:Heart disease (HD) is one of the most common diseases nowadays, and an early diagnosis of such a disease is a crucial task for many health care providers to prevent their patients for such a disease and to save lives. In this paper, a comparative analysis of different classifiers was performed for the classi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03626-y

    authors: Almustafa KM

    更新日期:2020-07-02 00:00:00

  • Application of whole genome data for in silico evaluation of primers and probes routinely employed for the detection of viral species by RT-qPCR using dengue virus as a case study.

    abstract:BACKGROUND:Viral infection by dengue virus is a major public health problem in tropical countries. Early diagnosis and detection are increasingly based on quantitative reverse transcriptase real-time polymerase chain reaction (RT-qPCR) directed against genomic regions conserved between different isolates. Genetic varia...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2313-0

    authors: Vanneste K,Garlant L,Broeders S,Van Gucht S,Roosens NH

    更新日期:2018-09-04 00:00:00

  • Hotspot Hunter: a computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes.

    abstract:BACKGROUND:T-cell epitopes that promiscuously bind to multiple alleles of a human leukocyte antigen (HLA) supertype are prime targets for development of vaccines and immunotherapies because they are relevant to a large proportion of the human population. The presence of clusters of promiscuous T-cell epitopes, immunolo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S1-S19

    authors: Zhang GL,Khan AM,Srinivasan KN,Heiny A,Lee K,Kwoh CK,August JT,Brusic V

    更新日期:2008-01-01 00:00:00

  • Functional relevance of dynamic properties of Dimeric NADP-dependent Isocitrate Dehydrogenases.

    abstract:BACKGROUND:Isocitrate Dehydrogenases (IDHs) are important enzymes present in all living cells. Three subfamilies of functionally dimeric IDHs (subfamilies I, II, III) are known. Subfamily I are well-studied bacterial IDHs, like that of Escherischia coli. Subfamily II has predominantly eukaryotic members, but it also ha...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S17-S2

    authors: Vinekar R,Verma C,Ghosh I

    更新日期:2012-01-01 00:00:00

  • Multiple virtual screening approaches for finding new hepatitis C virus RNA-dependent RNA polymerase inhibitors: structure-based screens and molecular dynamics for the pursue of new poly pharmacological inhibitors.

    abstract::The RNA polymerase NS5B of Hepatitis C virus (HCV) is a well-characterised drug target with an active site and four allosteric binding sites. This work presents a workflow for virtual screening and its application to Drug Bank screening targeting the Hepatitis C Virus (HCV) RNA polymerase non-nucleoside binding sites....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S17-S5

    authors: Elhefnawi M,ElGamacy M,Fares M

    更新日期:2012-01-01 00:00:00

  • BioIMAX: a Web 2.0 approach for easy exploratory and collaborative access to multivariate bioimage data.

    abstract:BACKGROUND:Innovations in biological and biomedical imaging produce complex high-content and multivariate image data. For decision-making and generation of hypotheses, scientists need novel information technology tools that enable them to visually explore and analyze the data and to discuss and communicate results or f...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-297

    authors: Loyek C,Rajpoot NM,Khan M,Nattkemper TW

    更新日期:2011-07-21 00:00:00

  • XenofilteR: computational deconvolution of mouse and human reads in tumor xenograft sequence data.

    abstract:BACKGROUND:Mouse xenografts from (patient-derived) tumors (PDX) or tumor cell lines are widely used as models to study various biological and preclinical aspects of cancer. However, analyses of their RNA and DNA profiles are challenging, because they comprise reads not only from the grafted human cancer but also from t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2353-5

    authors: Kluin RJC,Kemper K,Kuilman T,de Ruiter JR,Iyer V,Forment JV,Cornelissen-Steijger P,de Rink I,Ter Brugge P,Song JY,Klarenbeek S,McDermott U,Jonkers J,Velds A,Adams DJ,Peeper DS,Krijgsman O

    更新日期:2018-10-04 00:00:00

  • Improving ontologies by automatic reasoning and evaluation of logical definitions.

    abstract:BACKGROUND:Ontologies are widely used to represent knowledge in biomedicine. Systematic approaches for detecting errors and disagreements are needed for large ontologies with hundreds or thousands of terms and semantic relationships. A recent approach of defining terms using logical definitions is now increasingly bein...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-418

    authors: Köhler S,Bauer S,Mungall CJ,Carletti G,Smith CL,Schofield P,Gkoutos GV,Robinson PN

    更新日期:2011-10-27 00:00:00

  • Enhancing HMM-based protein profile-profile alignment with structural features and evolutionary coupling information.

    abstract:BACKGROUND:Protein sequence profile-profile alignment is an important approach to recognizing remote homologs and generating accurate pairwise alignments. It plays an important role in protein sequence database search, protein structure prediction, protein function prediction, and phylogenetic analysis. RESULTS:In thi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-252

    authors: Deng X,Cheng J

    更新日期:2014-07-25 00:00:00

  • Restricted DCJ-indel model: sorting linear genomes with DCJ and indels.

    abstract:BACKGROUND:The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S19-S14

    authors: da Silva PH,Machado R,Dantas S,Braga MD

    更新日期:2012-01-01 00:00:00

  • Stepwise kinetic equilibrium models of quantitative polymerase chain reaction.

    abstract:BACKGROUND:Numerous models for use in interpreting quantitative PCR (qPCR) data are present in recent literature. The most commonly used models assume the amplification in qPCR is exponential and fit an exponential model with a constant rate of increase to a select part of the curve. Kinetic theory may be used to model...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-203

    authors: Cobbs G

    更新日期:2012-08-16 00:00:00

  • Mining published lists of cancer related microarray experiments: identification of a gene expression signature having a critical role in cell-cycle control.

    abstract:BACKGROUND:Routine application of gene expression microarray technology is rapidly producing large amounts of data that necessitate new approaches of analysis. The analysis of a specific microarray experiment profits enormously from cross-comparing to other experiments. This process is generally performed by numerical ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-S4-S14

    authors: Finocchiaro G,Mancuso F,Muller H

    更新日期:2005-12-01 00:00:00

  • Frnakenstein: multiple target inverse RNA folding.

    abstract:BACKGROUND:RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more rece...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-260

    authors: Lyngsø RB,Anderson JW,Sizikova E,Badugu A,Hyland T,Hein J

    更新日期:2012-10-09 00:00:00

  • Pairwise protein expression classifier for candidate biomarker discovery for early detection of human disease prognosis.

    abstract:BACKGROUND:An approach to molecular classification based on the comparative expression of protein pairs is presented. The method overcomes some of the present limitations in using peptide intensity data for class prediction for problems such as the detection of a disease, disease prognosis, or for predicting treatment ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-191

    authors: Kaur P,Schlatzer D,Cooke K,Chance MR

    更新日期:2012-08-07 00:00:00

  • PubFocus: semantic MEDLINE/PubMed citations analytics through integration of controlled biomedical dictionaries and ranking algorithm.

    abstract:BACKGROUND:Understanding research activity within any given biomedical field is important. Search outputs generated by MEDLINE/PubMed are not well classified and require lengthy manual citation analysis. Automation of citation analytics can be very useful and timesaving for both novices and experts. RESULTS:PubFocus w...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-424

    authors: Plikus MV,Zhang Z,Chuong CM

    更新日期:2006-10-02 00:00:00

  • A multiobjective approach to the genetic code adaptability problem.

    abstract:BACKGROUND:The organization of the canonical code has intrigued researches since it was first described. If we consider all codes mapping the 64 codes into 20 amino acids and one stop codon, there are more than 1.51×10(84) possible genetic codes. The main question related to the organization of the genetic code is why ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0480-9

    authors: de Oliveira LL,de Oliveira PS,Tinós R

    更新日期:2015-02-19 00:00:00

  • Temporal dynamics of protein complexes in PPI networks: a case study using yeast cell cycle dynamics.

    abstract::Complexes of physically interacting proteins are one of the fundamental functional units responsible for driving key biological mechanisms within the cell. With the advent of high-throughput techniques, significant amount of protein interaction (PPI) data has been catalogued for organisms such as yeast, which has in t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S17-S16

    authors: Srihari S,Leong HW

    更新日期:2012-01-01 00:00:00