Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data.

Abstract:

BACKGROUND:Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. DESCRIPTION:The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes on KEGG pathway maps and batch gene identifier conversion. CONCLUSIONS:The Algal Functional Annotation Tool aims to provide an integrated data-mining environment for algal genomics by combining data from multiple annotation databases into a centralized tool. This site is designed to expedite the process of functional annotation and the interpretation of gene lists, such as those derived from high-throughput RNA-seq experiments. The tool is publicly available at http://pathways.mcdb.ucla.edu.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Lopez D,Casero D,Cokus SJ,Merchant SS,Pellegrini M

doi

10.1186/1471-2105-12-282

subject

Has Abstract

pub_date

2011-07-12 00:00:00

pages

282

issn

1471-2105

pii

1471-2105-12-282

journal_volume

12

pub_type

杂志文章
  • Analysis of cancer metabolism with high-throughput technologies.

    abstract:BACKGROUND:Recent advances in genomics and proteomics have allowed us to study the nuances of the Warburg effect--a long-standing puzzle in cancer energy metabolism--at an unprecedented level of detail. While modern next-generation sequencing technologies are extremely powerful, the lack of appropriate data analysis to...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S10-S8

    authors: Markovets AA,Herman D

    更新日期:2011-10-18 00:00:00

  • Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies.

    abstract:BACKGROUND:The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related term...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-10

    authors: Cohen R,Elhadad M,Elhadad N

    更新日期:2013-01-16 00:00:00

  • Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information.

    abstract:BACKGROUND:High-throughput technology allows for genome-wide measurements at different molecular levels for the same patient, e.g. single nucleotide polymorphisms (SNPs) and gene expression. Correspondingly, it might be beneficial to also integrate complementary information from different molecular levels when building...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1183-6

    authors: Hieke S,Benner A,Schlenl RF,Schumacher M,Bullinger L,Binder H

    更新日期:2016-08-30 00:00:00

  • Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures.

    abstract:BACKGROUND:Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignme...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S1-S48

    authors: Saito Y,Sato K,Sakakibara Y

    更新日期:2011-02-15 00:00:00

  • A Bayesian data fusion based approach for learning genome-wide transcriptional regulatory networks.

    abstract:BACKGROUND:Reverse engineering of transcriptional regulatory networks (TRN) from genomics data has always represented a computational challenge in System Biology. The major issue is modeling the complex crosstalk among transcription factors (TFs) and their target genes, with a method able to handle both the high number...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3510-1

    authors: Sauta E,Demartini A,Vitali F,Riva A,Bellazzi R

    更新日期:2020-05-29 00:00:00

  • LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

    abstract:BACKGROUND:A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sop...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1146-y

    authors: Vanhoutreve R,Kress A,Legrand B,Gass H,Poch O,Thompson JD

    更新日期:2016-07-07 00:00:00

  • Identification and utilization of inter-species conserved (ISC) probesets on Affymetrix human GeneChip platforms for the optimization of the assessment of expression patterns in non human primate (NHP) samples.

    abstract:BACKGROUND:While researchers have utilized versions of the Affymetrix human GeneChip for the assessment of expression patterns in non human primate (NHP) samples, there has been no comprehensive sequence analysis study undertaken to demonstrate that the probe sequences designed to detect human transcripts are reliably ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-165

    authors: Wang Z,Lewis MG,Nau ME,Arnold A,Vahey MT

    更新日期:2004-10-26 00:00:00

  • Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting.

    abstract:BACKGROUND:Supervised learning and many stochastic methods for predicting protein-protein interactions require both negative and positive interactions in the training data set. Unlike positive interactions, negative interactions cannot be readily obtained from interaction data, so these must be generated. In protein-pr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S1-S57

    authors: Kim J,Huang DS,Han K

    更新日期:2009-01-30 00:00:00

  • Identifying and quantifying metabolites by scoring peaks of GC-MS data.

    abstract:BACKGROUND:Metabolomics is one of most recent omics technologies. It has been applied on fields such as food science, nutrition, drug discovery and systems biology. For this, gas chromatography-mass spectrometry (GC-MS) has been largely applied and many computational tools have been developed to support the analysis of...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0374-2

    authors: Aggio RB,Mayor A,Reade S,Probert CS,Ruggiero K

    更新日期:2014-12-10 00:00:00

  • tacg--a grep for DNA.

    abstract:BACKGROUND:Pattern matching is the core of bioinformatics; it is used in database searching, restriction enzyme mapping, and finding open reading frames. It is done repeatedly over increasingly long sequences, thus codes must be efficient and insensitive to sequence length. Such patterns of interest include simple moti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-3-8

    authors: Mangalam HJ

    更新日期:2002-01-01 00:00:00

  • Assessing stationary distributions derived from chromatin contact maps.

    abstract:BACKGROUND:The spatial configuration of chromosomes is essential to various cellular processes, notably gene regulation, while architecture related alterations, such as translocations and gene fusions, are often cancer drivers. Thus, eliciting chromatin conformation is important, yet challenging due to compaction, dyna...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3424-y

    authors: Segal MR,Fletez-Brant K

    更新日期:2020-02-24 00:00:00

  • Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest.

    abstract:BACKGROUND:In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure. The width of the Gaus...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0526-z

    authors: Lee J,Lee K,Joung I,Joo K,Brooks BR,Lee J

    更新日期:2015-03-21 00:00:00

  • REGULATOR: a database of metazoan transcription factors and maternal factors for developmental studies.

    abstract:BACKGROUND:Genes encoding transcription factors that constitute gene-regulatory networks and maternal factors accumulating in egg cytoplasm are two classes of essential genes that play crucial roles in developmental processes. Transcription factors control the expression of their downstream target genes by interacting ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0552-x

    authors: Wang K,Nishida H

    更新日期:2015-04-10 00:00:00

  • Accelerated large-scale multiple sequence alignment.

    abstract:BACKGROUND:Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations acco...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-466

    authors: Lloyd S,Snell QO

    更新日期:2011-12-07 00:00:00

  • Promoter prediction in E. coli based on SIDD profiles and Artificial Neural Networks.

    abstract:BACKGROUND:One of the major challenges in biology is the correct identification of promoter regions. Computational methods based on motif searching have been the traditional approach taken. Recent studies have shown that DNA structural properties, such as curvature, stacking energy, and stress-induced duplex destabiliz...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S6-S17

    authors: Bland C,Newsome AS,Markovets AA

    更新日期:2010-10-07 00:00:00

  • Localizing triplet periodicity in DNA and cDNA sequences.

    abstract:BACKGROUND:The protein-coding regions (coding exons) of a DNA sequence exhibit a triplet periodicity (TP) due to fact that coding exons contain a series of three nucleotide codons that encode specific amino acid residues. Such periodicity is usually not observed in introns and intergenic regions. If a DNA sequence is d...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-550

    authors: Wang L,Stein LD

    更新日期:2010-11-08 00:00:00

  • COPASAAR--a database for proteomic analysis of single amino acid repeats.

    abstract:BACKGROUND:Single amino acid repeats make up a significant proportion in all of the proteomes that have currently been determined. They have been shown to be functionally and medically significant, and are associated with cancers and neuro-degenerative diseases such as Huntington's Chorea, where a poly-glutamine repeat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-196

    authors: Depledge DP,Dalby AR

    更新日期:2005-08-03 00:00:00

  • Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient.

    abstract:BACKGROUND:Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-158

    authors: Stoltzfus A,Lapp H,Matasci N,Deus H,Sidlauskas B,Zmasek CM,Vaidya G,Pontelli E,Cranston K,Vos R,Webb CO,Harmon LJ,Pirrung M,O'Meara B,Pennell MW,Mirarab S,Rosenberg MS,Balhoff JP,Bik HM,Heath TA,Midford PE,Brown

    更新日期:2013-05-13 00:00:00

  • BLAST+: architecture and applications.

    abstract:BACKGROUND:Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-421

    authors: Camacho C,Coulouris G,Avagyan V,Ma N,Papadopoulos J,Bealer K,Madden TL

    更新日期:2009-12-15 00:00:00

  • Alternative mapping of probes to genes for Affymetrix chips.

    abstract:BACKGROUND:Short oligonucleotide arrays have several probes measuring the expression level of each target transcript. Therefore the selection of probes is a key component for the quality of measurements. However, once probes have been selected and synthesized on an array, it is still possible to re-evaluate the results...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-111

    authors: Gautier L,Møller M,Friis-Hansen L,Knudsen S

    更新日期:2004-08-14 00:00:00

  • Method to represent the distribution of QTL additive and dominance effects associated with quantitative traits in computer simulation.

    abstract:BACKGROUND:Computer simulation is a resource which can be employed to identify optimal breeding strategies to effectively and efficiently achieve specific goals in developing improved cultivars. In some instances, it is crucial to assess in silico the options as well as the impact of various crossing schemes and breedi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-0906-z

    authors: Sun X,Mumm RH

    更新日期:2016-02-06 00:00:00

  • A novel approach for predicting protein S-glutathionylation.

    abstract:BACKGROUND:S-glutathionylation is the formation of disulfide bonds between the tripeptide glutathione and cysteine residues of the protein, protecting them from irreversible oxidation and in some cases causing change in their functions. Regulatory glutathionylation of proteins is a controllable and reversible process a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03571-w

    authors: Anashkina AA,Poluektov YM,Dmitriev VA,Kuznetsov EN,Mitkevich VA,Makarov AA,Petrushanko IY

    更新日期:2020-09-14 00:00:00

  • Insertion and deletion correcting DNA barcodes based on watermarks.

    abstract:BACKGROUND:Barcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcodes for a joint sequenc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0482-7

    authors: Kracht D,Schober S

    更新日期:2015-02-18 00:00:00

  • Integrating gene expression and GO classification for PCA by preclustering.

    abstract:BACKGROUND:Gene expression data can be analyzed by summarizing groups of individual gene expression profiles based on GO annotation information. The mean expression profile per group can then be used to identify interesting GO categories in relation to the experimental settings. However, the expression profiles present...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-158

    authors: De Haan JR,Piek E,van Schaik RC,de Vlieg J,Bauerschmidt S,Buydens LM,Wehrens R

    更新日期:2010-03-26 00:00:00

  • Quick, "imputation-free" meta-analysis with proxy-SNPs.

    abstract:BACKGROUND:Meta-analysis (MA) is widely used to pool genome-wide association studies (GWASes) in order to a) increase the power to detect strong or weak genotype effects or b) as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-231

    authors: Meesters C,Leber M,Herold C,Angisch M,Mattheisen M,Drichel D,Lacour A,Becker T

    更新日期:2012-09-12 00:00:00

  • Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model.

    abstract:BACKGROUND:Accurately prioritizing candidate disease genes is an important and challenging problem. Various network-based methods have been developed to predict potential disease genes by utilizing the disease similarity network and molecular networks such as protein interaction or gene co-expression networks. Although...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1317-x

    authors: Ni J,Koyuturk M,Tong H,Haines J,Xu R,Zhang X

    更新日期:2016-11-10 00:00:00

  • Knowledge-based variable selection for learning rules from proteomic data.

    abstract:BACKGROUND:The incorporation of biological knowledge can enhance the analysis of biomedical data. We present a novel method that uses a proteomic knowledge base to enhance the performance of a rule-learning algorithm in identifying putative biomarkers of disease from high-dimensional proteomic mass spectral data. In pa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S9-S16

    authors: Lustgarten JL,Visweswaran S,Bowser RP,Hogan WR,Gopalakrishnan V

    更新日期:2009-09-17 00:00:00

  • A web services choreography scenario for interoperating bioinformatics applications.

    abstract:BACKGROUND:Very often genome-wide data analysis requires the interoperation of multiple databases and analytic tools. A large number of genome databases and bioinformatics applications are available through the web, but it is difficult to automate interoperation because: 1) the platforms on which the applications run a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-25

    authors: de Knikker R,Guo Y,Li JL,Kwan AK,Yip KY,Cheung DW,Cheung KH

    更新日期:2004-03-10 00:00:00

  • Bias detection and correction in RNA-Sequencing data.

    abstract:BACKGROUND:High throughput sequencing technology provides us unprecedented opportunities to study transcriptome dynamics. Compared to microarray-based gene expression profiling, RNA-Seq has many advantages, such as high resolution, low background, and ability to identify novel transcripts. Moreover, for genes with mult...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-290

    authors: Zheng W,Chung LM,Zhao H

    更新日期:2011-07-19 00:00:00

  • MetAssimulo: simulation of realistic NMR metabolic profiles.

    abstract:BACKGROUND:Probing the complex fusion of genetic and environmental interactions, metabolic profiling (or metabolomics/metabonomics), the study of small molecules involved in metabolic reactions, is a rapidly expanding 'omics' field. A major technique for capturing metabolite data is 1H-NMR spectroscopy and this yields ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-496

    authors: Muncey HJ,Jones R,De Iorio M,Ebbels TM

    更新日期:2010-10-06 00:00:00