Abstract:
:Environmental shotgun sequencing (ESS) has potential to give greater insight into microbial communities than targeted sequencing of 16S regions, but requires much higher sequence coverage. The advent of next-generation sequencing has made it feasible for the Human Microbiome Project and other initiatives to generate ESS data on a large scale, but computationally efficient methods for analysing such data sets are needed.Here we present metaBEETL, a fast taxonomic classifier for environmental shotgun sequences. It uses a Burrows-Wheeler Transform (BWT) index of the sequencing reads and an indexed database of microbial reference sequences. Unlike other BWT-based tools, our method has no upper limit on the number or the total size of the reference sequences in its database. By capturing sequence relationships between strains, our reference index also allows us to classify reads which are not unique to an individual strain but are nevertheless specific to some higher phylogenetic order.Tested on datasets with known taxonomic composition, metaBEETL gave results that are competitive with existing similarity-based tools: due to normalization steps which other classifiers lack, the taxonomic profile computed by metaBEETL closely matched the true environmental profile. At the same time, its moderate running time and low memory footprint allow metaBEETL to scale well to large data sets.Code to construct the BWT indexed database and for the taxonomic classification is part of the BEETL library, available as a github repository at git@github.com:BEETL/BEETL.git.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Ander C,Schulz-Trieglaff OB,Stoye J,Cox AJdoi
10.1186/1471-2105-14-S5-S2subject
Has Abstractpub_date
2013-01-01 00:00:00pages
S2issn
1471-2105pii
1471-2105-14-S5-S2journal_volume
14 Suppl 5pub_type
杂志文章abstract:BACKGROUND:Automatic quantification of neuronal morphology from images of fluorescence microscopy plays an increasingly important role in high-content screenings. However, there exist very few freeware tools and methods which provide automatic neuronal morphology quantification for pharmacological discovery. RESULTS:T...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-230
更新日期:2011-06-08 00:00:00
abstract:BACKGROUND:Routine application of gene expression microarray technology is rapidly producing large amounts of data that necessitate new approaches of analysis. The analysis of a specific microarray experiment profits enormously from cross-comparing to other experiments. This process is generally performed by numerical ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-S4-S14
更新日期:2005-12-01 00:00:00
abstract:BACKGROUND:It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1678-9
更新日期:2017-05-18 00:00:00
abstract:BACKGROUND:We developed an extendable open-source Loop-mediated isothermal AMPlification (LAMP) signature design program called LAVA (LAMP Assay Versatile Analysis). LAVA was created in response to limitations of existing LAMP signature programs. RESULTS:LAVA identifies combinations of six primer regions for basic LAM...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-240
更新日期:2011-06-16 00:00:00
abstract:BACKGROUND:Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. The traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression tra...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0421-z
更新日期:2015-01-16 00:00:00
abstract:BACKGROUND:Automated protein function prediction methods are the only practical approach for assigning functions to genes obtained from model organisms. Many of the previously reported function annotation methods are of limited utility for fungal protein annotation. They are often trained only to one species, are not a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-215
更新日期:2010-04-29 00:00:00
abstract:BACKGROUND:Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method. However, int...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-52
更新日期:2008-01-25 00:00:00
abstract:BACKGROUND:Elucidating gene regulatory networks is crucial for understanding normal cell physiology and complex pathologic phenotypes. Existing computational methods for the genome-wide "reverse engineering" of such networks have been successful only for lower eukaryotes with simple genomes. Here we present ARACNE, a n...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-S1-S7
更新日期:2006-03-20 00:00:00
abstract:BACKGROUND:Ubiquitylation plays an important role in regulating protein functions. Recently, experimental methods were developed toward effective identification of ubiquitylation sites. To efficiently explore more undiscovered ubiquitylation sites, this study aims to develop an accurate sequence-based prediction method...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-310
更新日期:2008-07-15 00:00:00
abstract:BACKGROUND:miRNAs regulate the expression of several genes with one miRNA able to target multiple genes and with one gene able to be simultaneously targeted by more than one miRNA. Therefore, it has become indispensable to shorten the long list of miRNA-target interactions to put in the spotlight in order to gain insig...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3105-x
更新日期:2019-11-04 00:00:00
abstract:BACKGROUND:Recent discoveries of a large variety of important roles for non-coding RNAs (ncRNAs) have been reported by numerous researchers. In order to analyze ncRNAs by kernel methods including support vector machines, we propose stem kernels as an extension of string kernels for measuring the similarities between tw...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-318
更新日期:2008-07-22 00:00:00
abstract:BACKGROUND:Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging vari...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-248
更新日期:2014-07-21 00:00:00
abstract:BACKGROUND:Prioritizing genes according to their associations with a cancer allows researchers to explore genes in more informed ways. By far, Gene-centric or network-centric gene prioritization methods are predominated. Genes and their protein products carry out cellular processes in the context of functional modules....
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2216-0
更新日期:2018-06-05 00:00:00
abstract:BACKGROUND:The development of high-throughput sequencing and analysis has accelerated multi-omics studies of thousands of microbial species, metagenomes, and infectious disease pathogens. Omics studies are enabling genotype-phenotype association studies which identify genetic determinants of pathogen virulence and drug...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2580-9
更新日期:2019-01-07 00:00:00
abstract:BACKGROUND:Mechanotransduction in bone cells plays a pivotal role in osteoblast differentiation and bone remodelling. Mechanotransduction provides the link between modulation of the extracellular matrix by mechanical load and intracellular activity. By controlling the balance between the intracellular and extracellular...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3394-0
更新日期:2020-03-18 00:00:00
abstract:BACKGROUND:The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA) is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of seq...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-138
更新日期:2012-06-20 00:00:00
abstract:BACKGROUND:Single nucleotide polymorphisms (SNPs) are the most frequent type of sequence variation between individuals, and represent a promising tool for finding genetic determinants of complex diseases and understanding the differences in drug response. In this regard, it is of particular interest to study the effect...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S8-S6
更新日期:2009-08-27 00:00:00
abstract:BACKGROUND:Cluster analysis is the most common unsupervised method for finding hidden groups in data. Clustering presents two main challenges: (1) finding the optimal number of clusters, and (2) removing "outliers" among the objects being clustered. Few clustering algorithms currently deal directly with the outlier pro...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1998-9
更新日期:2018-01-08 00:00:00
abstract:BACKGROUND:Recently, DNA methylation has drawn great attention due to its strong correlation with abnormal gene activities and informative representation of the cancer status. As a number of studies focus on DNA methylation signatures in cancer, demand for utilizing publicly available methylome dataset has been increas...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3516-8
更新日期:2020-05-11 00:00:00
abstract:BACKGROUND:Protein-protein interactions (PPIs) are of great importance in cellular systems of organisms, since they are the basis of cellular structure and function and many essential cellular processes are related to that. Most proteins perform their functions by interacting with other proteins, so predicting PPIs acc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03896-6
更新日期:2020-12-16 00:00:00
abstract:BACKGROUND:Over the last two decades, an innovative technology called Tissue Microarray (TMA), which combines multi-tissue and DNA microarray concepts, has been widely used in the field of histology. It consists of a collection of several (up to 1000 or more) tissue samples that are assembled onto a single support - ty...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2111-8
更新日期:2018-04-19 00:00:00
abstract:BACKGROUND:MHC/HLA class II molecules are important components of the immune system and play a critical role in processes such as phagocytosis. Understanding peptide recognition properties of the hundreds of MHC class II alleles is essential to appreciate determinants of antigenicity and ultimately to predict epitopes....
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S1-S55
更新日期:2010-01-18 00:00:00
abstract:BACKGROUND:Human triosephosphate isomerase (HsTIM) deficiency is a genetic disease caused often by the pathogenic mutation E104D. This mutation, located at the side of an abnormally large cluster of water in the inter-subunit interface, reduces the thermostability of the enzyme. Why and how these water molecules are di...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-S16-S11
更新日期:2013-01-01 00:00:00
abstract:BACKGROUND:The Distributed Annotation System (DAS) allows merging of DNA sequence annotations from multiple sources and provides a single annotation view. A straightforward way to establish a DAS annotation server is to use the "Lightweight DAS" server (LDAS). Onto this type of server, annotations can be uploaded as fl...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-55
更新日期:2004-05-07 00:00:00
abstract:BACKGROUND:Activation of naïve B lymphocytes by extracellular ligands, e.g. antigen, lipopolysaccharide (LPS) and CD40 ligand, induces a combination of common and ligand-specific phenotypic changes through complex signal transduction pathways. For example, although all three of these ligands induce proliferation, only ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-237
更新日期:2006-05-02 00:00:00
abstract:BACKGROUND:We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, ma...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-101
更新日期:2004-07-26 00:00:00
abstract:BACKGROUND:S-glutathionylation is the formation of disulfide bonds between the tripeptide glutathione and cysteine residues of the protein, protecting them from irreversible oxidation and in some cases causing change in their functions. Regulatory glutathionylation of proteins is a controllable and reversible process a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03571-w
更新日期:2020-09-14 00:00:00
abstract:BACKGROUND:Event sequences where different types of events often occur close together arise, e.g., when studying potential transcription factor binding sites (TFBS, events) of certain transcription factors (TF, types) in a DNA sequence. These events tend to occur in bursts: in some genomic regions there are more genes ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-336
更新日期:2008-08-08 00:00:00
abstract:BACKGROUND:Microarray technologies produced large amount of data. The hierarchical clustering is commonly used to identify clusters of co-expressed genes. However, microarray datasets often contain missing values (MVs) representing a major drawback for the use of the clustering methods. Usually the MVs are not treated,...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-114
更新日期:2004-08-23 00:00:00
abstract:BACKGROUND:Deep mutational scanning is a technique to estimate the impacts of mutations on a gene by using deep sequencing to count mutations in a library of variants before and after imposing a functional selection. The impacts of mutations must be inferred from changes in their counts after selection. RESULTS:I desc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0590-4
更新日期:2015-05-20 00:00:00