Promoting ranking diversity for genomics search with relevance-novelty combined model.

Abstract:

BACKGROUND:In the biomedical domain, the desired information of a question (query) asked by biologists usually is a list of a certain type of entities covering different aspects that are related to the question, such as genes, proteins, diseases, mutations, etc. Hence it is important for a biomedical information retrieval system to be able to provide comprehensive and diverse answers to fulfill biologists' information needs. However, traditional retrieval models assume that the relevance of a document is independent of the relevance of other documents. This assumption may result in high redundancy and low diversity in the retrieval ranked lists. RESULTS:In this paper, we propose a relevance-novelty combined model, named RelNov model, based on the framework of an undirected graphical model. It consists of two component models, namely the aspect-term relevance model and the aspect-term novelty model. They model the relevance of a document and the novelty of a document respectively. We show that our approach can achieve 16.4% improvement over the highest aspect level MAP reported in the TREC 2007 Genomics track, and 9.8% improvement over the highest passage level MAP reported in the TREC 2007 Genomics track. CONCLUSIONS:The proposed combination model which models aspects, terms, topic relevance and document novelty as potential functions is demonstrated to be effective in promoting ranking diversity as well as in improving relevance of ranked lists for genomics search. We also show that the use of aspect plays an important role in the model. Moreover, the proposed model can integrate various different relevance and novelty measures easily.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Yin X,Li Z,Huang JX,Hu X

doi

10.1186/1471-2105-12-S5-S8

subject

Has Abstract

pub_date

2011-01-01 00:00:00

pages

S8

issn

1471-2105

pii

1471-2105-12-S5-S8

journal_volume

12 Suppl 5

pub_type

杂志文章
  • Automated modelling of signal transduction networks.

    abstract:BACKGROUND:Intracellular signal transduction is achieved by networks of proteins and small molecules that transmit information from the cell surface to the nucleus, where they ultimately effect transcriptional changes. Understanding the mechanisms cells use to accomplish this important process requires a detailed molec...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-3-34

    authors: Steffen M,Petti A,Aach J,D'haeseleer P,Church G

    更新日期:2002-11-01 00:00:00

  • Reference-guided de novo assembly approach improves genome reconstruction for related species.

    abstract:BACKGROUND:The development of next-generation sequencing has made it possible to sequence whole genomes at a relatively low cost. However, de novo genome assemblies remain challenging due to short read length, missing data, repetitive regions, polymorphisms and sequencing errors. As more and more genomes are sequenced,...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1911-6

    authors: Lischer HEL,Shimizu KK

    更新日期:2017-11-10 00:00:00

  • 3off2: A network reconstruction algorithm based on 2-point and 3-point information statistics.

    abstract:BACKGROUND:The reconstruction of reliable graphical models from observational data is important in bioinformatics and other computational fields applying network reconstruction methods to large, yet finite datasets. The main network reconstruction approaches are either based on Bayesian scores, which enable the ranking...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0856-x

    authors: Affeldt S,Verny L,Isambert H

    更新日期:2016-01-20 00:00:00

  • A multiple-alignment based primer design algorithm for genetically highly variable DNA targets.

    abstract:BACKGROUND:Primer design for highly variable DNA sequences is difficult, and experimental success requires attention to many interacting constraints. The advent of next-generation sequencing methods allows the investigation of rare variants otherwise hidden deep in large populations, but requires attention to populatio...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-255

    authors: Brodin J,Krishnamoorthy M,Athreya G,Fischer W,Hraber P,Gleasner C,Green L,Korber B,Leitner T

    更新日期:2013-08-21 00:00:00

  • CNV Workshop: an integrated platform for high-throughput copy number variation discovery and clinical diagnostics.

    abstract:BACKGROUND:Recent studies have shown that copy number variations (CNVs) are frequent in higher eukaryotes and associated with a substantial portion of inherited and acquired risk for various human diseases. The increasing availability of high-resolution genome surveillance platforms provides opportunity for rapidly ass...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-74

    authors: Gai X,Perin JC,Murphy K,O'Hara R,D'arcy M,Wenocur A,Xie HM,Rappaport EF,Shaikh TH,White PS

    更新日期:2010-02-04 00:00:00

  • Molecular evolution of dihydrouridine synthases.

    abstract:BACKGROUND:Dihydrouridine (D) is a modified base found in conserved positions in the D-loop of tRNA in Bacteria, Eukaryota, and some Archaea. Despite the abundant occurrence of D, little is known about its biochemical roles in mediating tRNA function. It is assumed that D may destabilize the structure of tRNA and thus ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-153

    authors: Kasprzak JM,Czerwoniec A,Bujnicki JM

    更新日期:2012-06-28 00:00:00

  • Inferring topology from clustering coefficients in protein-protein interaction networks.

    abstract:BACKGROUND:Although protein-protein interaction networks determined with high-throughput methods are incomplete, they are commonly used to infer the topology of the complete interactome. These partial networks often show a scale-free behavior with only a few proteins having many and the majority having only a few conne...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-519

    authors: Friedel CC,Zimmer R

    更新日期:2006-11-30 00:00:00

  • A novel computational strategy for DNA methylation imputation using mixture regression model (MRM).

    abstract:BACKGROUND:DNA methylation is an important heritable epigenetic mark that plays a crucial role in transcriptional regulation and the pathogenesis of various human disorders. The commonly used DNA methylation measurement approaches, e.g., Illumina Infinium HumanMethylation-27 and -450 BeadChip arrays (27 K and 450 K arr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03865-z

    authors: Yu F,Xu C,Deng HW,Shen H

    更新日期:2020-12-01 00:00:00

  • Protein subcellular localization prediction based on compartment-specific features and structure conservation.

    abstract:BACKGROUND:Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction h...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-330

    authors: Su EC,Chiu HS,Lo A,Hwang JK,Sung TY,Hsu WL

    更新日期:2007-09-08 00:00:00

  • IDconverter and IDClight: conversion and annotation of gene and protein IDs.

    abstract:BACKGROUND:Researchers involved in the annotation of large numbers of gene, clone or protein identifiers are usually required to perform a one-by-one conversion for each identifier. When the field of research is one such as microarray experiments, this number may be around 30,000. RESULTS:To help researchers map acces...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-9

    authors: Alibés A,Yankilevich P,Cañada A,Díaz-Uriarte R

    更新日期:2007-01-10 00:00:00

  • A study on multi-omic oscillations in Escherichia coli metabolic networks.

    abstract:BACKGROUND:Two important challenges in the analysis of molecular biology information are data (multi-omic information) integration and the detection of patterns across large scale molecular networks and sequences. They are are actually coupled beause the integration of omic information may provide better means to detec...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2175-5

    authors: Bardozzo F,Lió P,Tagliaferri R

    更新日期:2018-07-09 00:00:00

  • BicPAMS: software for biological data analysis with pattern-based biclustering.

    abstract:BACKGROUND:Biclustering has been largely applied for the unsupervised analysis of biological data, being recognised today as a key technique to discover putative modules in both expression data (subsets of genes correlated in subsets of conditions) and network data (groups of coherently interconnected biological entiti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1493-3

    authors: Henriques R,Ferreira FL,Madeira SC

    更新日期:2017-02-02 00:00:00

  • Mining locus tags in PubMed Central to improve microbial gene annotation.

    abstract:BACKGROUND:The scientific literature contains millions of microbial gene identifiers within the full text and tables, but these annotations rarely get incorporated into public sequence databases. We propose to utilize the Open Access (OA) subset of PubMed Central (PMC) as a gene annotation database and have developed a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-43

    authors: Stubben CJ,Challacombe JF

    更新日期:2014-02-05 00:00:00

  • Predicting anatomic therapeutic chemical classification codes using tiered learning.

    abstract:BACKGROUND:The low success rate and high cost of drug discovery requires the development of new paradigms to identify molecules of therapeutic value. The Anatomical Therapeutic Chemical (ATC) Code System is a World Health Organization (WHO) proposed classification that assigns multi-level codes to compounds based on th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1660-6

    authors: Olson T,Singh R

    更新日期:2017-06-07 00:00:00

  • Predicting nucleosome positioning using a duration Hidden Markov Model.

    abstract:BACKGROUND:The nucleosome is the fundamental packing unit of DNAs in eukaryotic cells. Its detailed positioning on the genome is closely related to chromosome functions. Increasing evidence has shown that genomic DNA sequence itself is highly predictive of nucleosome positioning genome-wide. Therefore a fast software t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-346

    authors: Xi L,Fondufe-Mittendorf Y,Xia L,Flatow J,Widom J,Wang JP

    更新日期:2010-06-24 00:00:00

  • Rule-based knowledge aggregation for large-scale protein sequence analysis of influenza A viruses.

    abstract:BACKGROUND:The explosive growth of biological data provides opportunities for new statistical and comparative analyses of large information sets, such as alignments comprising tens of thousands of sequences. In such studies, sequence annotations frequently play an essential role, and reliable results depend on metadata...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S1-S7

    authors: Miotto O,Tan TW,Brusic V

    更新日期:2008-01-01 00:00:00

  • Reduction strategies for hierarchical multi-label classification in protein function prediction.

    abstract:BACKGROUND:Hierarchical Multi-Label Classification is a classification task where the classes to be predicted are hierarchically organized. Each instance can be assigned to classes belonging to more than one path in the hierarchy. This scenario is typically found in protein function prediction, considering that each pr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1232-1

    authors: Cerri R,Barros RC,P L F de Carvalho AC,Jin Y

    更新日期:2016-09-15 00:00:00

  • Probe-specific mixed-model approach to detect copy number differences using multiplex ligation-dependent probe amplification (MLPA).

    abstract:BACKGROUND:MLPA method is a potentially useful semi-quantitative method to detect copy number alterations in targeted regions. In this paper, we propose a method for the normalization procedure based on a non-linear mixed-model, as well as a new approach for determining the statistical significance of altered probes ba...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-261

    authors: González JR,Carrasco JL,Armengol L,Villatoro S,Jover L,Yasui Y,Estivill X

    更新日期:2008-06-04 00:00:00

  • A novel computational model for predicting potential LncRNA-disease associations based on both direct and indirect features of LncRNA-disease pairs.

    abstract:BACKGROUND:Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03906-7

    authors: Xiao Y,Xiao Z,Feng X,Chen Z,Kuang L,Wang L

    更新日期:2020-12-02 00:00:00

  • Benchmarking the HLA typing performance of Polysolver and Optitype in 50 Danish parental trios.

    abstract:BACKGROUND:The adaptive immune response intrinsically depends on hypervariable human leukocyte antigen (HLA) genes. Concomitantly, correct HLA phenotyping is crucial for successful donor-patient matching in organ transplantation. The cost and technical limitations of current laboratory techniques, together with advance...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2239-6

    authors: Matey-Hernandez ML,Danish Pan Genome Consortium.,Brunak S,Izarzugaza JMG

    更新日期:2018-06-25 00:00:00

  • TAMEE: data management and analysis for tissue microarrays.

    abstract:BACKGROUND:With the introduction of tissue microarrays (TMAs) researchers can investigate gene and protein expression in tissues on a high-throughput scale. TMAs generate a wealth of data calling for extended, high level data management. Enhanced data analysis and systematic data management are required for traceabilit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-81

    authors: Thallinger GG,Baumgartner K,Pirklbauer M,Uray M,Pauritsch E,Mehes G,Buck CR,Zatloukal K,Trajanoski Z

    更新日期:2007-03-07 00:00:00

  • ChemEx: information extraction system for chemical data curation.

    abstract:BACKGROUND:Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S17-S9

    authors: Tharatipyakul A,Numnark S,Wichadakul D,Ingsriswang S

    更新日期:2012-01-01 00:00:00

  • Distilling structure in Taverna scientific workflows: a refactoring approach.

    abstract:BACKGROUND:Scientific workflows management systems are increasingly used to specify and manage bioinformatics experiments. Their programming model appeals to bioinformaticians, who can use them to easily specify complex data processing pipelines. Such a model is underpinned by a graph structure, where nodes represent b...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S1-S12

    authors: Cohen-Boulakia S,Chen J,Missier P,Goble C,Williams AR,Froidevaux C

    更新日期:2014-01-01 00:00:00

  • Survival Online: a web-based service for the analysis of correlations between gene expression and clinical and follow-up data.

    abstract:BACKGROUND:Complex microarray gene expression datasets can be used for many independent analyses and are particularly interesting for the validation of potential biomarkers and multi-gene classifiers. This article presents a novel method to perform correlations between microarray gene expression data and clinico-pathol...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S12-S10

    authors: Corradi L,Mirisola V,Porro I,Torterolo L,Fato M,Romano P,Pfeffer U

    更新日期:2009-10-15 00:00:00

  • Pre-processing Agilent microarray data.

    abstract:BACKGROUND:Pre-processing methods for two-sample long oligonucleotide arrays, specifically the Agilent technology, have not been extensively studied. The goal of this study is to quantify some of the sources of error that affect measurement of expression using Agilent arrays and to compare Agilent's Feature Extraction ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-142

    authors: Zahurak M,Parmigiani G,Yu W,Scharpf RB,Berman D,Schaeffer E,Shabbeer S,Cope L

    更新日期:2007-05-01 00:00:00

  • MCA: Multiresolution Correlation Analysis, a graphical tool for subpopulation identification in single-cell gene expression data.

    abstract:BACKGROUND:Biological data often originate from samples containing mixtures of subpopulations, corresponding e.g. to distinct cellular phenotypes. However, identification of distinct subpopulations may be difficult if biological measurements yield distributions that are not easily separable. RESULTS:We present Multire...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-240

    authors: Feigelman J,Theis FJ,Marr C

    更新日期:2014-07-11 00:00:00

  • BLAST+: architecture and applications.

    abstract:BACKGROUND:Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-421

    authors: Camacho C,Coulouris G,Avagyan V,Ma N,Papadopoulos J,Bealer K,Madden TL

    更新日期:2009-12-15 00:00:00

  • PIGS: improved estimates of identity-by-descent probabilities by probabilistic IBD graph sampling.

    abstract::Identifying segments in the genome of different individuals that are identical-by-descent (IBD) is a fundamental element of genetics. IBD data is used for numerous applications including demographic inference, heritability estimation, and mapping disease loci. Simultaneous detection of IBD over multiple haplotypes has...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S5-S9

    authors: Park DS,Baran Y,Hormozdiari F,Eng C,Torgerson DG,Burchard EG,Zaitlen N

    更新日期:2015-01-01 00:00:00

  • A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships.

    abstract:BACKGROUND:Substitution matrices are key parameters for the alignment of two protein sequences, and consequently for most comparative genomics studies. The composition of biological sequences can vary importantly between species and groups of species, and classical matrices such as those in the BLOSUM series fail to ac...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-457

    authors: Lemaitre C,Barré A,Citti C,Tardy F,Thiaucourt F,Sirand-Pugnet P,Thébault P

    更新日期:2011-11-24 00:00:00

  • HTPheno: an image analysis pipeline for high-throughput plant phenotyping.

    abstract:BACKGROUND:In the last few years high-throughput analysis methods have become state-of-the-art in the life sciences. One of the latest developments is automated greenhouse systems for high-throughput plant phenotyping. Such systems allow the non-destructive screening of plants over a period of time by means of image ac...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-148

    authors: Hartmann A,Czauderna T,Hoffmann R,Stein N,Schreiber F

    更新日期:2011-05-12 00:00:00