Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to life span.

Abstract:

BACKGROUND:The statistical modeling of biomedical corpora could yield integrated, coarse-to-fine views of biological phenomena that complement discoveries made from analysis of molecular sequence and profiling data. Here, the potential of such modeling is demonstrated by examining the 5,225 free-text items in the Caenorhabditis Genetic Center (CGC) Bibliography using techniques from statistical information retrieval. Items in the CGC biomedical text corpus were modeled using the Latent Dirichlet Allocation (LDA) model. LDA is a hierarchical Bayesian model which represents a document as a random mixture over latent topics; each topic is characterized by a distribution over words. RESULTS:An LDA model estimated from CGC items had better predictive performance than two standard models (unigram and mixture of unigrams) trained using the same data. To illustrate the practical utility of LDA models of biomedical corpora, a trained CGC LDA model was used for a retrospective study of nematode genes known to be associated with life span modification. Corpus-, document-, and word-level LDA parameters were combined with terms from the Gene Ontology to enhance the explanatory value of the CGC LDA model, and to suggest additional candidates for age-related genes. A novel, pairwise document similarity measure based on the posterior distribution on the topic simplex was formulated and used to search the CGC database for "homologs" of a "query" document discussing the life span-modifying clk-2 gene. Inspection of these document homologs enabled and facilitated the production of hypotheses about the function and role of clk-2. CONCLUSION:Like other graphical models for genetic, genomic and other types of biological data, LDA provides a method for extracting unanticipated insights and generating predictions amenable to subsequent experimental validation.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Blei DM,Franks K,Jordan MI,Mian IS

doi

10.1186/1471-2105-7-250

subject

Has Abstract

pub_date

2006-05-08 00:00:00

pages

250

issn

1471-2105

pii

1471-2105-7-250

journal_volume

7

pub_type

杂志文章
  • Colonyzer: automated quantification of micro-organism growth characteristics on solid agar.

    abstract:BACKGROUND:High-throughput screens comparing growth rates of arrays of distinct micro-organism cultures on solid agar are useful, rapid methods of quantifying genetic interactions. Growth rate is an informative phenotype which can be estimated by measuring cell densities at one or more times after inoculation. Precise ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-287

    authors: Lawless C,Wilkinson DJ,Young A,Addinall SG,Lydall DA

    更新日期:2010-05-28 00:00:00

  • Prediction of dinucleotide-specific RNA-binding sites in proteins.

    abstract:BACKGROUND:Regulation of gene expression, protein synthesis, replication and assembly of many viruses involve RNA-protein interactions. Although some successful computational tools have been reported to recognize RNA binding sites in proteins, the problem of specificity remains poorly investigated. After the nucleotide...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S13-S5

    authors: Fernandez M,Kumagai Y,Standley DM,Sarai A,Mizuguchi K,Ahmad S

    更新日期:2011-01-01 00:00:00

  • Identification of a small optimal subset of CpG sites as bio-markers from high-throughput DNA methylation profiles.

    abstract:BACKGROUND:DNA methylation patterns have been shown to significantly correlate with different tissue types and disease states. High-throughput methylation arrays enable large-scale DNA methylation analysis to identify informative DNA methylation biomarkers. The identification of disease-specific methylation signatures ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-457

    authors: Meng H,Murrelle EL,Li G

    更新日期:2008-10-27 00:00:00

  • An automated method for rapid identification of putative gene family members in plants.

    abstract:BACKGROUND:Gene duplication events have played a significant role in genome evolution, particularly in plants. Exhaustive searches for all members of a known gene family as well as the identification of new gene families has become increasingly important. Subfunctionalization via changes in regulatory sequences followi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-S2-S19

    authors: Frank RL,Mane A,Ercal F

    更新日期:2006-09-06 00:00:00

  • Identifying target processes for microbial electrosynthesis by elementary mode analysis.

    abstract:BACKGROUND:Microbial electrosynthesis and electro fermentation are techniques that aim to optimize microbial production of chemicals and fuels by regulating the cellular redox balance via interaction with electrodes. While the concept is known for decades major knowledge gaps remain, which make it hard to evaluate its ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0410-2

    authors: Kracke F,Krömer JO

    更新日期:2014-12-30 00:00:00

  • A two-phase procedure for non-normal quantitative trait genetic association study.

    abstract:BACKGROUND:The nonparametric trend test (NPT) is well suitable for identifying the genetic variants associated with quantitative traits when the trait values do not satisfy the normal distribution assumption. If the genetic model, defined according to the mode of inheritance, is known, the NPT derived under the given g...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-0888-x

    authors: Zhang W,Li H,Li Z,Li Q

    更新日期:2016-01-28 00:00:00

  • Bayesian neural networks for detecting epistasis in genetic association studies.

    abstract:BACKGROUND:Discovering causal genetic variants from large genetic association studies poses many difficult challenges. Assessing which genetic markers are involved in determining trait status is a computationally demanding task, especially in the presence of gene-gene interactions. RESULTS:A non-parametric Bayesian ap...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0368-0

    authors: Beam AL,Motsinger-Reif A,Doyle J

    更新日期:2014-11-21 00:00:00

  • XenofilteR: computational deconvolution of mouse and human reads in tumor xenograft sequence data.

    abstract:BACKGROUND:Mouse xenografts from (patient-derived) tumors (PDX) or tumor cell lines are widely used as models to study various biological and preclinical aspects of cancer. However, analyses of their RNA and DNA profiles are challenging, because they comprise reads not only from the grafted human cancer but also from t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2353-5

    authors: Kluin RJC,Kemper K,Kuilman T,de Ruiter JR,Iyer V,Forment JV,Cornelissen-Steijger P,de Rink I,Ter Brugge P,Song JY,Klarenbeek S,McDermott U,Jonkers J,Velds A,Adams DJ,Peeper DS,Krijgsman O

    更新日期:2018-10-04 00:00:00

  • Prediction of protein structural class with Rough Sets.

    abstract:BACKGROUND:A new method for the prediction of protein structural classes is constructed based on Rough Sets algorithm, which is a rule-based data mining method. Amino acid compositions and 8 physicochemical properties data are used as conditional attributes for the construction of decision system. After reducing the de...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-20

    authors: Cao Y,Liu S,Zhang L,Qin J,Wang J,Tang K

    更新日期:2006-01-14 00:00:00

  • Bioinformatics Resource Manager: a systems biology web tool for microRNA and omics data integration.

    abstract:BACKGROUND:The Bioinformatics Resource Manager (BRM) is a web-based tool developed to facilitate identifier conversion and data integration for Homo sapiens (human), Mus musculus (mouse), Rattus norvegicus (rat), Danio rerio (zebrafish), and Macaca mulatta (macaque), as well as perform orthologous conversions among the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2805-6

    authors: Brown J,Phillips AR,Lewis DA,Mans MA,Chang Y,Tanguay RL,Peterson ES,Waters KM,Tilton SC

    更新日期:2019-05-17 00:00:00

  • WellInverter: a web application for the analysis of fluorescent reporter gene data.

    abstract:BACKGROUND:Fluorescent reporter genes have become widely used for monitoring gene expression in living cells. When a microbial strain carrying a reporter gene is grown in a microplate reader, the fluorescence and the absorbance (optical density) of the culture can be automatically measured every few minutes in a highly...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2920-4

    authors: Martin Y,Page M,Blanchet C,de Jong H

    更新日期:2019-06-11 00:00:00

  • On the comparison of regulatory sequences with multiple resolution Entropic Profiles.

    abstract:BACKGROUND:Enhancers are stretches of DNA (100-1000 bp) that play a major role in development gene expression, evolution and disease. It has been recently shown that in high-level eukaryotes enhancers rarely work alone, instead they collaborate by forming clusters of cis-regulatory modules (CRMs). Although the binding ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-0980-2

    authors: Comin M,Antonello M

    更新日期:2016-03-18 00:00:00

  • Stereochemical errors and their implications for molecular dynamics simulations.

    abstract:BACKGROUND:Biological molecules are often asymmetric with respect to stereochemistry, and correct stereochemistry is essential to their function. Molecular dynamics simulations of biomolecules have increasingly become an integral part of biophysical research. However, stereochemical errors in biomolecular structures ca...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-190

    authors: Schreiner E,Trabuco LG,Freddolino PL,Schulten K

    更新日期:2011-05-23 00:00:00

  • Deconvolution of gene expression from cell populations across the C. elegans lineage.

    abstract:BACKGROUND:Knowledge of when and in which cells each gene is expressed across multicellular organisms is critical in understanding both gene function and regulation of cell type diversity. However, methods for measuring expression typically involve a trade-off between imaging-based methods, which give the precise locat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-204

    authors: Burdick JT,Murray JI

    更新日期:2013-06-22 00:00:00

  • Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation.

    abstract:BACKGROUND:High-throughput experiments, such as with DNA microarrays, typically result in hundreds of genes potentially relevant to the process under study, rendering the interpretation of these experiments problematic. Here, we propose and evaluate an approach to find functional associations between large numbers of g...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-14

    authors: Jelier R,Jenster G,Dorssers LC,Wouters BJ,Hendriksen PJ,Mons B,Delwel R,Kors JA

    更新日期:2007-01-18 00:00:00

  • Accelerating a cross-correlation score function to search modifications using a single GPU.

    abstract:BACKGROUND:A cross-correlation (XCorr) score function is one of the most popular score functions utilized to search peptide identifications in databases, and many computer programs, such as SEQUEST, Comet, and Tide, currently use this score function. Recently, the HiXCorr algorithm was developed to speed up this score ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2559-6

    authors: Kim H,Han S,Um JH,Park K

    更新日期:2018-12-12 00:00:00

  • A study on multi-omic oscillations in Escherichia coli metabolic networks.

    abstract:BACKGROUND:Two important challenges in the analysis of molecular biology information are data (multi-omic information) integration and the detection of patterns across large scale molecular networks and sequences. They are are actually coupled beause the integration of omic information may provide better means to detec...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2175-5

    authors: Bardozzo F,Lió P,Tagliaferri R

    更新日期:2018-07-09 00:00:00

  • Measure of synonymous codon usage diversity among genes in bacteria.

    abstract:BACKGROUND:In many bacteria, intragenomic diversity in synonymous codon usage among genes has been reported. However, no quantitative attempt has been made to compare the diversity levels among different genomes. Here, we introduce a mean dissimilarity-based index (Dmean) for quantifying the level of diversity in synon...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-167

    authors: Suzuki H,Saito R,Tomita M

    更新日期:2009-06-01 00:00:00

  • FastqPuri: high-performance preprocessing of RNA-seq data.

    abstract:BACKGROUND:RNA sequencing (RNA-seq) has become the standard means of analyzing gene and transcript expression in high-throughput. While previously sequence alignment was a time demanding step, fast alignment methods and even more so transcript counting methods which avoid mapping and quantify gene and transcript expres...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2799-0

    authors: Pérez-Rubio P,Lottaz C,Engelmann JC

    更新日期:2019-05-03 00:00:00

  • Software for the analysis and visualization of deep mutational scanning data.

    abstract:BACKGROUND:Deep mutational scanning is a technique to estimate the impacts of mutations on a gene by using deep sequencing to count mutations in a library of variants before and after imposing a functional selection. The impacts of mutations must be inferred from changes in their counts after selection. RESULTS:I desc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0590-4

    authors: Bloom JD

    更新日期:2015-05-20 00:00:00

  • OscoNet: inferring oscillatory gene networks.

    abstract:BACKGROUND:Oscillatory genes, with periodic expression at the mRNA and/or protein level, have been shown to play a pivotal role in many biological contexts. However, with the exception of the circadian clock and cell cycle, only a few such genes are known. Detecting oscillatory genes from snapshot single-cell experimen...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03561-y

    authors: Cutillo L,Boukouvalas A,Marinopoulou E,Papalopulu N,Rattray M

    更新日期:2020-08-21 00:00:00

  • Computational approaches for isoform detection and estimation: good and bad news.

    abstract:BACKGROUND:The main goal of the whole transcriptome analysis is to correctly identify all expressed transcripts within a specific cell/tissue--at a particular stage and condition--to determine their structures and to measure their abundances. RNA-seq data promise to allow identification and quantification of transcript...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-135

    authors: Angelini C,De Canditiis D,De Feis I

    更新日期:2014-05-09 00:00:00

  • EGenBio: a data management system for evolutionary genomics and biodiversity.

    abstract:BACKGROUND:Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; http://egenbio.lsu.edu) to begin to address this....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-S2-S7

    authors: Nahum LA,Reynolds MT,Wang ZO,Faith JJ,Jonna R,Jiang ZJ,Meyer TJ,Pollock DD

    更新日期:2006-09-06 00:00:00

  • CellSim: a novel software to calculate cell similarity and identify their co-regulation networks.

    abstract:BACKGROUND:Cell direct reprogramming technology has been rapidly developed with its low risk of tumor risk and avoidance of ethical issues caused by stem cells, but it is still limited to specific cell types. Direct reprogramming from an original cell to target cell type needs the cell similarity and cell specific regu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2699-3

    authors: Li L,Che D,Wang X,Zhang P,Rahman SU,Zhao J,Yu J,Tao S,Lu H,Liao M

    更新日期:2019-03-04 00:00:00

  • Promoting ranking diversity for genomics search with relevance-novelty combined model.

    abstract:BACKGROUND:In the biomedical domain, the desired information of a question (query) asked by biologists usually is a list of a certain type of entities covering different aspects that are related to the question, such as genes, proteins, diseases, mutations, etc. Hence it is important for a biomedical information retrie...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S5-S8

    authors: Yin X,Li Z,Huang JX,Hu X

    更新日期:2011-01-01 00:00:00

  • Image-based classification of plant genus and family for trained and untrained plant species.

    abstract:BACKGROUND:Modern plant taxonomy reflects phylogenetic relationships among taxa based on proposed morphological and genetic similarities. However, taxonomical relation is not necessarily reflected by close overall resemblance, but rather by commonality of very specific morphological characters or similarity on the mole...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2474-x

    authors: Seeland M,Rzanny M,Boho D,Wäldchen J,Mäder P

    更新日期:2019-01-03 00:00:00

  • An ensemble framework for identifying essential proteins.

    abstract:BACKGROUND:Many centrality measures have been proposed to mine and characterize the correlations between network topological properties and protein essentiality. However, most of them show limited prediction accuracy, and the number of common predicted essential proteins by different methods is very small. RESULTS:In ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1166-7

    authors: Zhang X,Xiao W,Acencio ML,Lemke N,Wang X

    更新日期:2016-08-25 00:00:00

  • Learning statistical models for annotating proteins with function information using biomedical text.

    abstract:BACKGROUND:The BioCreative text mining evaluation investigated the application of text mining methods to the task of automatically extracting information from text in biomedical research articles. We participated in Task 2 of the evaluation. For this task, we built a system to automatically annotate a given protein wit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-S1-S18

    authors: Ray S,Craven M

    更新日期:2005-01-01 00:00:00

  • An integrated approach to the prediction of domain-domain interactions.

    abstract:BACKGROUND:The development of high-throughput technologies has produced several large scale protein interaction data sets for multiple species, and significant efforts have been made to analyze the data sets in order to understand protein activities. Considering that the basic units of protein interactions are domain i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-269

    authors: Lee H,Deng M,Sun F,Chen T

    更新日期:2006-05-25 00:00:00

  • Colony size measurement of the yeast gene deletion strains for functional genomics.

    abstract:BACKGROUND:Numerous functional genomics approaches have been developed to study the model organism yeast, Saccharomyces cerevisiae, with the aim of systematically understanding the biology of the cell. Some of these techniques are based on yeast growth differences under different conditions, such as those generated by ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-117

    authors: Memarian N,Jessulat M,Alirezaie J,Mir-Rashed N,Xu J,Zareie M,Smith M,Golshani A

    更新日期:2007-04-04 00:00:00