A novel similarity-measure for the analysis of genetic data in complex phenotypes.

Abstract:

BACKGROUND:Recent technological advances in DNA sequencing and genotyping have led to the accumulation of a remarkable quantity of data on genetic polymorphisms. However, the development of new statistical and computational tools for effective processing of these data has not been equally as fast. In particular, Machine Learning literature is limited to relatively few papers which are focused on the development and application of data mining methods for the analysis of genetic variability. On the other hand, these papers apply to genetic data procedures which had been developed for a different kind of analysis and do not take into account the peculiarities of population genetics. The aim of our study was to define a new similarity measure, specifically conceived for measuring the similarity between the genetic profiles of two groups of subjects (i.e., cases and controls) taking into account that genetic profiles are usually distributed in a population group according to the Hardy Weinberg equilibrium. RESULTS:We set up a new kernel function consisting of a similarity measure between groups of subjects genotyped for numerous genetic loci. This measure weighs different genetic profiles according to the estimates of gene frequencies at Hardy-Weinberg equilibrium in the population. We named this function the "Hardy-Weinberg kernel". The effectiveness of the Hardy-Weinberg kernel was compared to the performance of the well established linear kernel. We found that the Hardy-Weinberg kernel significantly outperformed the linear kernel in a number of experiments where we used either simulated data or real data. CONCLUSION:The "Hardy-Weinberg kernel" reported here represents one of the first attempts at incorporating genetic knowledge into the definition of a kernel function designed for the analysis of genetic data. We show that the best performance of the "Hardy-Weinberg kernel" is observed when rare genotypes have different frequencies in cases and controls. The ability to capture the effect of rare genotypes on phenotypic traits might be a very important and useful feature, as most of the current statistical tools loose most of their statistical power when rare genotypes are involved in the susceptibility to the trait under study.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Lagani V,Montesanto A,Di Cianni F,Moreno V,Landi S,Conforti D,Rose G,Passarino G

doi

10.1186/1471-2105-10-S6-S24

subject

Has Abstract

pub_date

2009-06-16 00:00:00

pages

S24

issn

1471-2105

pii

1471-2105-10-S6-S24

journal_volume

10 Suppl 6

pub_type

杂志文章
  • Detecting intergene correlation changes in microarray analysis: a new approach to gene selection.

    abstract:BACKGROUND:Microarray technology is commonly used as a simple screening tool with a focus on selecting genes that exhibit extremely large differential expressions between different phenotypes. It lacks the ability to select genes that change their relationships with other genes in different biological conditions (diffe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-20

    authors: Hu R,Qiu X,Glazko G,Klebanov L,Yakovlev A

    更新日期:2009-01-15 00:00:00

  • PVT: an efficient computational procedure to speed up next-generation sequence analysis.

    abstract:BACKGROUND:High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the dif...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-167

    authors: Maji RK,Sarkar A,Khatua S,Dasgupta S,Ghosh Z

    更新日期:2014-06-04 00:00:00

  • TGF-beta signaling proteins and the Protein Ontology.

    abstract:BACKGROUND:The Protein Ontology (PRO) is designed as a formal and principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from a classification of proteins on the basis of evolutionary relationships at the homeomorphic level to the representation of the multiple protein f...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S5-S3

    authors: Arighi CN,Liu H,Natale DA,Barker WC,Drabkin H,Blake JA,Smith B,Wu CH

    更新日期:2009-05-06 00:00:00

  • DLAD4U: deriving and prioritizing disease lists from PubMed literature.

    abstract:BACKGROUND:Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies. Here we report DLAD...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2463-0

    authors: Shen J,Vasaikar S,Zhang B

    更新日期:2018-12-28 00:00:00

  • Evaluation of methods for differential expression analysis on multi-group RNA-seq count data.

    abstract:BACKGROUND:RNA-seq is a powerful tool for measuring transcriptomes, especially for identifying differentially expressed genes or transcripts (DEGs) between sample groups. A number of methods have been developed for this task, and several evaluation studies have also been reported. However, those evaluations so far have...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0794-7

    authors: Tang M,Sun J,Shimizu K,Kadota K

    更新日期:2015-11-04 00:00:00

  • DraGnET: software for storing, managing and analyzing annotated draft genome sequence data.

    abstract:BACKGROUND:New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-100

    authors: Duncan S,Sirkanungo R,Miller L,Phillips GJ

    更新日期:2010-02-22 00:00:00

  • Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling.

    abstract:BACKGROUND:Cancer progression is caused by the sequential accumulation of mutations, but not all orders of accumulation are equally likely. When the fixation of some mutations depends on the presence of previous ones, identifying restrictions in the order of accumulation of mutations can lead to the discovery of therap...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0466-7

    authors: Diaz-Uriarte R

    更新日期:2015-02-12 00:00:00

  • LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

    abstract:BACKGROUND:A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sop...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1146-y

    authors: Vanhoutreve R,Kress A,Legrand B,Gass H,Poch O,Thompson JD

    更新日期:2016-07-07 00:00:00

  • Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery.

    abstract:BACKGROUND:Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S1-S7

    authors: Han H,Li XL

    更新日期:2011-02-15 00:00:00

  • Bison: bisulfite alignment on nodes of a cluster.

    abstract:BACKGROUND:DNA methylation changes are associated with a wide array of biological processes. Bisulfite conversion of DNA followed by high-throughput sequencing is increasingly being used to assess genome-wide methylation at single-base resolution. The relative slowness of most commonly used aligners for processing such...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-337

    authors: Ryan DP,Ehninger D

    更新日期:2014-10-18 00:00:00

  • Structural alignment of protein descriptors - a combinatorial model.

    abstract:BACKGROUND:Structural alignment of proteins is one of the most challenging problems in molecular biology. The tertiary structure of a protein strictly correlates with its function and computationally predicted structures are nowadays a main premise for understanding the latter. However, computationally derived 3D model...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1237-9

    authors: Antczak M,Kasprzak M,Lukasiak P,Blazewicz J

    更新日期:2016-09-17 00:00:00

  • Integrating multiple protein-protein interaction networks to prioritize disease genes: a Bayesian regression approach.

    abstract:BACKGROUND:The identification of genes responsible for human inherited diseases is one of the most challenging tasks in human genetics. Recent studies based on phenotype similarity and gene proximity have demonstrated great success in prioritizing candidate genes for human diseases. However, most of these methods rely ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S1-S11

    authors: Zhang W,Sun F,Jiang R

    更新日期:2011-02-15 00:00:00

  • The PowerAtlas: a power and sample size atlas for microarray experimental design and research.

    abstract:BACKGROUND:Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experiments is how to estimate the sample size required for good statistical power. What is the projected sample size or number of replicate chips need...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-84

    authors: Page GP,Edwards JW,Gadbury GL,Yelisetti P,Wang J,Trivedi P,Allison DB

    更新日期:2006-02-22 00:00:00

  • Genotype calling in tetraploid species from bi-allelic marker data using mixture models.

    abstract:BACKGROUND:Automated genotype calling in tetraploid species was until recently not possible, which hampered genetic analysis. Modern genotyping assays often produce two signals, one for each allele of a bi-allelic marker. While ample software is available to obtain genotypes (homozygous for either allele, or heterozygo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-172

    authors: Voorrips RE,Gort G,Vosman B

    更新日期:2011-05-19 00:00:00

  • Recovering rearranged cancer chromosomes from karyotype graphs.

    abstract:BACKGROUND:Many cancer genomes are extensively rearranged with highly aberrant chromosomal karyotypes. Structural and copy number variations in cancer genomes can be determined via abnormal mapping of sequenced reads to the reference genome. Recently it became possible to reconcile both of these types of large-scale va...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3208-4

    authors: Aganezov S,Zban I,Aksenov V,Alexeev N,Schatz MC

    更新日期:2019-12-17 00:00:00

  • Indicators for the Data Usage Index (DUI): an incentive for publishing primary biodiversity data through global information infrastructure.

    abstract:BACKGROUND:A professional recognition mechanism is required to encourage expedited publishing of an adequate volume of 'fit-for-use' biodiversity data. As a component of such a recognition mechanism, we propose the development of the Data Usage Index (DUI) to demonstrate to data publishers that their efforts of creatin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S15-S3

    authors: Ingwersen P,Chavan V

    更新日期:2011-01-01 00:00:00

  • Pushing the accuracy limit of shape complementarity for protein-protein docking.

    abstract:BACKGROUND:Protein-protein docking is a valuable computational approach for investigating protein-protein interactions. Shape complementarity is the most basic component of a scoring function and plays an important role in protein-protein docking. Despite significant progresses, shape representation remains an open que...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3270-y

    authors: Yan Y,Huang SY

    更新日期:2019-12-24 00:00:00

  • In silico modelling of hormone response elements.

    abstract:BACKGROUND:An important step in understanding the conditions that specify gene expression is the recognition of gene regulatory elements. Due to high diversity of different types of transcription factors and their DNA binding preferences, it is a challenging problem to establish an accurate model for recognition of fun...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-S4-S27

    authors: Stepanova M,Lin F,Lin VC

    更新日期:2006-12-12 00:00:00

  • Integration of shot-gun proteomics and bioinformatics analysis to explore plant hormone responses.

    abstract:BACKGROUND:Multidimensional protein identification technology (MudPIT)-based shot-gun proteomics has been proven to be an effective platform for functional proteomics. In particular, the various sample preparation methods and bioinformatics tools can be integrated to improve the proteomics platform for applications lik...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S15-S8

    authors: Zhang Y,Liu S,Dai SY,Yuan JS

    更新日期:2012-01-01 00:00:00

  • OmicsARules: a R package for integration of multi-omics datasets via association rules mining.

    abstract:BACKGROUND:The improvements of high throughput technologies have produced large amounts of multi-omics experiments datasets. Initial analysis of these data has revealed many concurrent gene alterations within single dataset or/and among multiple omics datasets. Although powerful bioinformatics pipelines have been devel...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3171-0

    authors: Chen D,Zhang F,Zhao Q,Xu J

    更新日期:2019-11-08 00:00:00

  • R2R--software to speed the depiction of aesthetic consensus RNA secondary structures.

    abstract:BACKGROUND:With continuing identification of novel structured noncoding RNAs, there is an increasing need to create schematic diagrams showing the consensus features of these molecules. RNA structural diagrams are typically made either with general-purpose drawing programs like Adobe Illustrator, or with automated or i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-3

    authors: Weinberg Z,Breaker RR

    更新日期:2011-01-04 00:00:00

  • Visualizing complex feature interactions and feature sharing in genomic deep neural networks.

    abstract:BACKGROUND:Visualization tools for deep learning models typically focus on discovering key input features without considering how such low level features are combined in intermediate layers to make decisions. Moreover, many of these methods examine a network's response to specific input examples that may be insufficien...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2957-4

    authors: Liu G,Zeng H,Gifford DK

    更新日期:2019-07-19 00:00:00

  • Estimating the individualized HIV-1 genetic barrier to resistance using a nelfinavir fitness landscape.

    abstract:BACKGROUND:Failure on Highly Active Anti-Retroviral Treatment is often accompanied with development of antiviral resistance to one or more drugs included in the treatment. In general, the virus is more likely to develop resistance to drugs with a lower genetic barrier. Previously, we developed a method to reverse engin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-409

    authors: Theys K,Deforche K,Beheydt G,Moreau Y,van Laethem K,Lemey P,Camacho RJ,Rhee SY,Shafer RW,Van Wijngaerden E,Vandamme AM

    更新日期:2010-08-03 00:00:00

  • Inferring latent task structure for Multitask Learning by Multiple Kernel Learning.

    abstract:BACKGROUND:The lack of sufficient training data is the limiting factor for many Machine Learning applications in Computational Biology. If data is available for several different but related problem domains, Multitask Learning algorithms can be used to learn a model based on all available information. In Bioinformatics...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S8-S5

    authors: Widmer C,Toussaint NC,Altun Y,Rätsch G

    更新日期:2010-10-26 00:00:00

  • ProLego: tool for extracting and visualizing topological modules in protein structures.

    abstract:BACKGROUND:In protein design, correct use of topology is among the initial and most critical feature. Meticulous selection of backbone topology aids in drastically reducing the structure search space. With ProLego, we present a server application to explore the component aspect of protein structures and provide an intu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2171-9

    authors: Khan T,Panday SK,Ghosh I

    更新日期:2018-05-04 00:00:00

  • BLAST+: architecture and applications.

    abstract:BACKGROUND:Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-421

    authors: Camacho C,Coulouris G,Avagyan V,Ma N,Papadopoulos J,Bealer K,Madden TL

    更新日期:2009-12-15 00:00:00

  • Graph based fusion of miRNA and mRNA expression data improves clinical outcome prediction in prostate cancer.

    abstract:BACKGROUND:One of the main goals in cancer studies including high-throughput microRNA (miRNA) and mRNA data is to find and assess prognostic signatures capable of predicting clinical outcome. Both mRNA and miRNA expression changes in cancer diseases are described to reflect clinical characteristics like staging and pro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-488

    authors: Gade S,Porzelius C,Fälth M,Brase JC,Wuttig D,Kuner R,Binder H,Sültmann H,Beissbarth T

    更新日期:2011-12-21 00:00:00

  • Francisella tularensis novicida proteomic and transcriptomic data integration and annotation based on semantic web technologies.

    abstract:BACKGROUND:This paper summarises the lessons and experiences gained from a case study of the application of semantic web technologies to the integration of data from the bacterial species Francisella tularensis novicida (Fn). Fn data sources are disparate and heterogeneous, as multiple laboratories across the world, us...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S10-S3

    authors: Anwar N,Hunt E

    更新日期:2009-10-01 00:00:00

  • Prediction of virus-host infectious association by supervised learning methods.

    abstract:BACKGROUND:The study of virus-host infectious association is important for understanding the functions and dynamics of microbial communities. Both cellular and fractionated viral metagenomic data generate a large number of viral contigs with missing host information. Although relative simple methods based on the simila...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1473-7

    authors: Zhang M,Yang L,Ren J,Ahlgren NA,Fuhrman JA,Sun F

    更新日期:2017-03-14 00:00:00

  • Robustness of signal detection in cryo-electron microscopy via a bi-objective-function approach.

    abstract:BACKGROUND:The detection of weak signals and selection of single particles from low-contrast micrographs of frozen hydrated biomolecules by cryo-electron microscopy (cryo-EM) represents a major practical bottleneck in cryo-EM data analysis. Template-based particle picking by an objective function using fast local corre...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2714-8

    authors: Wang WL,Yu Z,Castillo-Menendez LR,Sodroski J,Mao Y

    更新日期:2019-04-03 00:00:00