Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data.

Abstract:

BACKGROUND:The detection of genomic copy number alterations (CNA) in cancer based on SNP arrays requires methods that take into account tumour specific factors such as normal cell contamination and tumour heterogeneity. A number of tools have been recently developed but their performance needs yet to be thoroughly assessed. To this aim, a comprehensive model that integrates the factors of normal cell contamination and intra-tumour heterogeneity and that can be translated to synthetic data on which to perform benchmarks is indispensable. RESULTS:We propose such model and implement it in an R package called CnaGen to synthetically generate a wide range of alterations under different normal cell contamination levels. Six recently published methods for CNA and loss of heterozygosity (LOH) detection on tumour samples were assessed on this synthetic data and on a dilution series of a breast cancer cell-line: ASCAT, GAP, GenoCNA, GPHMM, MixHMM and OncoSNP. We report the recall rates in terms of normal cell contamination levels and alteration characteristics: length, copy number and LOH state, as well as the false discovery rate distribution for each copy number under different normal cell contamination levels.Assessed methods are in general better at detecting alterations with low copy number and under a little normal cell contamination levels. All methods except GPHMM, which failed to recognize the alteration pattern in the cell-line samples, provided similar results for the synthetic and cell-line sample sets. MixHMM and GenoCNA are the poorliest performing methods, while GAP generally performed better. This supports the viability of approaches other than the common hidden Markov model (HMM)-based. CONCLUSIONS:We devised and implemented a comprehensive model to generate data that simulate tumoural samples genotyped using SNP arrays. The validity of the model is supported by the similarity of the results obtained with synthetic and real data. Based on these results and on the software implementation of the methods, we recommend GAP for advanced users and GPHMM for a fully driven analysis.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Mosén-Ansorena D,Aransay AM,Rodríguez-Ezpeleta N

doi

10.1186/1471-2105-13-192

subject

Has Abstract

pub_date

2012-08-07 00:00:00

pages

192

issn

1471-2105

pii

1471-2105-13-192

journal_volume

13

pub_type

杂志文章
  • In silico modelling of hormone response elements.

    abstract:BACKGROUND:An important step in understanding the conditions that specify gene expression is the recognition of gene regulatory elements. Due to high diversity of different types of transcription factors and their DNA binding preferences, it is a challenging problem to establish an accurate model for recognition of fun...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-S4-S27

    authors: Stepanova M,Lin F,Lin VC

    更新日期:2006-12-12 00:00:00

  • Identifying cancer prognostic modules by module network analysis.

    abstract:BACKGROUND:The identification of prognostic genes that can distinguish the prognostic risks of cancer patients remains a significant challenge. Previous works have proven that functional gene sets were more reliable for this task than the gene signature. However, few works have considered the cross-talk among functiona...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2674-z

    authors: Zhou XH,Chu XY,Xue G,Xiong JH,Zhang HY

    更新日期:2019-02-18 00:00:00

  • mSpecs: a software tool for the administration and editing of mass spectral libraries in the field of metabolomics.

    abstract:BACKGROUND:Metabolome analysis with GC/MS has meanwhile been established as one of the "omics" techniques. Compound identification is done by comparison of the MS data with compound libraries. Mass spectral libraries in the field of metabolomics ought to connect the relevant mass traces of the metabolites to other rele...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-229

    authors: Thielen B,Heinen S,Schomburg D

    更新日期:2009-07-22 00:00:00

  • Scoredist: a simple and robust protein sequence distance estimator.

    abstract:BACKGROUND:Distance-based methods are popular for reconstructing evolutionary trees thanks to their speed and generality. A number of methods exist for estimating distances from sequence alignments, which often involves some sort of correction for multiple substitutions. The problem is to accurately estimate the number...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-108

    authors: Sonnhammer EL,Hollich V

    更新日期:2005-04-27 00:00:00

  • Simple adjustment of the sequence weight algorithm remarkably enhances PSI-BLAST performance.

    abstract:BACKGROUND:PSI-BLAST, an extremely popular tool for sequence similarity search, features the utilization of Position-Specific Scoring Matrix (PSSM) constructed from a multiple sequence alignment (MSA). PSSM allows the detection of more distant homologs than a general amino acid substitution matrix does. An accurate est...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1686-9

    authors: Oda T,Lim K,Tomii K

    更新日期:2017-06-02 00:00:00

  • Anatomy of enzyme channels.

    abstract:BACKGROUND:Enzyme active sites can be connected to the exterior environment by one or more channels passing through the protein. Despite our current knowledge of enzyme structure and function, surprisingly little is known about how often channels are present or about any structural features such channels may have in co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0379-x

    authors: Pravda L,Berka K,Svobodová Vařeková R,Sehnal D,Banáš P,Laskowski RA,Koča J,Otyepka M

    更新日期:2014-11-18 00:00:00

  • Evaluation of absolute quantitation by nonlinear regression in probe-based real-time PCR.

    abstract:BACKGROUND:In real-time PCR data analysis, the cycle threshold (CT) method is currently the gold standard. This method is based on an assumption of equal PCR efficiency in all reactions, and precision may suffer if this condition is not met. Nonlinear regression analysis (NLR) or curve fitting has therefore been sugges...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-107

    authors: Goll R,Olsen T,Cui G,Florholmen J

    更新日期:2006-03-03 00:00:00

  • InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes.

    abstract:BACKGROUND:Although many genomic features have been used in the prediction of protein-protein interactions (PPIs), frequently only one is used in a computational method. After realizing the limited power in the prediction using only one genomic feature, investigators are now moving toward integration. So far, there hav...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-414

    authors: Sun J,Sun Y,Ding G,Liu Q,Wang C,He Y,Shi T,Li Y,Zhao Z

    更新日期:2007-10-26 00:00:00

  • A stepwise framework for the normalization of array CGH data.

    abstract:BACKGROUND:In two-channel competitive genomic hybridization microarray experiments, the ratio of the two fluorescent signal intensities at each spot on the microarray is commonly used to infer the relative amounts of the test and reference sample DNA levels. This ratio may be influenced by systematic measurement effect...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-274

    authors: Khojasteh M,Lam WL,Ward RK,MacAulay C

    更新日期:2005-11-18 00:00:00

  • DNAscan: personal computer compatible NGS analysis, annotation and visualisation.

    abstract:BACKGROUND:Next Generation Sequencing (NGS) is a commonly used technology for studying the genetic basis of biological processes and it underpins the aspirations of precision medicine. However, there are significant challenges when dealing with NGS data. Firstly, a huge number of bioinformatics tools for a wide range o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2791-8

    authors: Iacoangeli A,Al Khleifat A,Sproviero W,Shatunov A,Jones AR,Morgan SL,Pittman A,Dobson RJ,Newhouse SJ,Al-Chalabi A

    更新日期:2019-04-27 00:00:00

  • Extracting predictors for lung adenocarcinoma based on Granger causality test and stepwise character selection.

    abstract:BACKGROUND:Lung adenocarcinoma is the most common type of lung cancer, with high mortality worldwide. Its occurrence and development were thoroughly studied by high-throughput expression microarray, which produced abundant data on gene expression, DNA methylation, and miRNA quantification. However, the hub genes, which...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2739-z

    authors: Fan X,Wang Y,Tang XQ

    更新日期:2019-05-01 00:00:00

  • XLPM: efficient algorithm for the analysis of protein-protein contacts using chemical cross-linking mass spectrometry.

    abstract:BACKGROUND:Chemical cross-linking is used for protein-protein contacts mapping and for structural analysis. One of the difficulties in cross-linking studies is the analysis of mass-spectrometry data and the assignment of the site of cross-link incorporation. The difficulties are due to higher charges of fragment ions, ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S11-S16

    authors: Jaiswal M,Crabtree N,Bauer MA,Hall R,Raney KD,Zybailov BL

    更新日期:2014-01-01 00:00:00

  • Software for the analysis and visualization of deep mutational scanning data.

    abstract:BACKGROUND:Deep mutational scanning is a technique to estimate the impacts of mutations on a gene by using deep sequencing to count mutations in a library of variants before and after imposing a functional selection. The impacts of mutations must be inferred from changes in their counts after selection. RESULTS:I desc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0590-4

    authors: Bloom JD

    更新日期:2015-05-20 00:00:00

  • Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information.

    abstract:BACKGROUND:High-throughput technology allows for genome-wide measurements at different molecular levels for the same patient, e.g. single nucleotide polymorphisms (SNPs) and gene expression. Correspondingly, it might be beneficial to also integrate complementary information from different molecular levels when building...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1183-6

    authors: Hieke S,Benner A,Schlenl RF,Schumacher M,Bullinger L,Binder H

    更新日期:2016-08-30 00:00:00

  • ORdensity: user-friendly R package to identify differentially expressed genes.

    abstract:BACKGROUND:Microarray technology provides the expression level of many genes. Nowadays, an important issue is to select a small number of informative differentially expressed genes that provide biological knowledge and may be key elements for a disease. With the increasing volume of data generated by modern biomedical ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3463-4

    authors: Martínez-Otzeta JM,Irigoien I,Sierra B,Arenas C

    更新日期:2020-04-07 00:00:00

  • Simultaneous phylogeny reconstruction and multiple sequence alignment.

    abstract:BACKGROUND:A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S1-S11

    authors: Yue F,Shi J,Tang J

    更新日期:2009-01-30 00:00:00

  • LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network.

    abstract:BACKGROUND:Cancer is a complex disease which is characterized by the accumulation of genetic alterations during the patient's lifetime. With the development of the next-generation sequencing technology, multiple omics data, such as cancer genomic, epigenomic and transcriptomic data etc., can be measured from each indiv...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1332-y

    authors: Wei PJ,Zhang D,Xia J,Zheng CH

    更新日期:2016-12-23 00:00:00

  • Functionally specified protein signatures distinctive for each of the different blue copper proteins.

    abstract:BACKGROUND:Proteins having similar functions from different sources can be identified by the occurrence in their sequences, a conserved cluster of amino acids referred to as pattern, motif, signature or fingerprint. The wide usage of protein sequence analysis in par with the growth of databases signifies the importance...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-127

    authors: Giri AV,Anishetty S,Gautam P

    更新日期:2004-09-09 00:00:00

  • A multiple-alignment based primer design algorithm for genetically highly variable DNA targets.

    abstract:BACKGROUND:Primer design for highly variable DNA sequences is difficult, and experimental success requires attention to many interacting constraints. The advent of next-generation sequencing methods allows the investigation of rare variants otherwise hidden deep in large populations, but requires attention to populatio...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-255

    authors: Brodin J,Krishnamoorthy M,Athreya G,Fischer W,Hraber P,Gleasner C,Green L,Korber B,Leitner T

    更新日期:2013-08-21 00:00:00

  • Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts.

    abstract:BACKGROUND:The amount of scientific information about MicroRNAs (miRNAs) is growing exponentially, making it difficult for researchers to interpret experimental results. In this study, we present an automated text mining approach using Latent Semantic Indexing (LSI) for prioritization, clustering and functional annotat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1223-2

    authors: Roy S,Curry BC,Madahian B,Homayouni R

    更新日期:2016-10-06 00:00:00

  • SpectralNET--an application for spectral graph analysis and visualization.

    abstract:BACKGROUND:Graph theory provides a computational framework for modeling a variety of datasets including those emerging from genomics, proteomics, and chemical genetics. Networks of genes, proteins, small molecules, or other objects of study can be represented as graphs of nodes (vertices) and interactions (edges) that ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-260

    authors: Forman JJ,Clemons PA,Schreiber SL,Haggarty SJ

    更新日期:2005-10-19 00:00:00

  • On the consistency of orthology relationships.

    abstract:BACKGROUND:Orthologs inference is the starting point of most comparative genomics studies, and a plethora of methods have been designed in the last decade to address this challenging task. In this paper we focus on the problems of deciding consistency with a species tree (known or not) of a partial set of orthology/par...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1267-3

    authors: Jones M,Paul C,Scornavacca C

    更新日期:2016-11-11 00:00:00

  • Method to represent the distribution of QTL additive and dominance effects associated with quantitative traits in computer simulation.

    abstract:BACKGROUND:Computer simulation is a resource which can be employed to identify optimal breeding strategies to effectively and efficiently achieve specific goals in developing improved cultivars. In some instances, it is crucial to assess in silico the options as well as the impact of various crossing schemes and breedi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-0906-z

    authors: Sun X,Mumm RH

    更新日期:2016-02-06 00:00:00

  • tcR: an R package for T cell receptor repertoire advanced data analysis.

    abstract:BACKGROUND:The Immunoglobulins (IG) and the T cell receptors (TR) play the key role in antigen recognition during the adaptive immune response. Recent progress in next-generation sequencing technologies has provided an opportunity for the deep T cell receptor repertoire profiling. However, a specialised software is req...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0613-1

    authors: Nazarov VI,Pogorelyy MV,Komech EA,Zvyagin IV,Bolotin DA,Shugay M,Chudakov DM,Lebedev YB,Mamedov IZ

    更新日期:2015-05-28 00:00:00

  • A novel method to identify cooperative functional modules: study of module coordination in the Saccharomyces cerevisiae cell cycle.

    abstract:BACKGROUND:Identifying key components in biological processes and their associations is critical for deciphering cellular functions. Recently, numerous gene expression and molecular interaction experiments have been reported in Saccharomyces cerevisiae, and these have enabled systematic studies. Although a number of ap...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-281

    authors: Hsu JT,Peng CH,Hsieh WP,Lan CY,Tang CY

    更新日期:2011-07-12 00:00:00

  • Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis.

    abstract:BACKGROUND:In mass spectrometry (MS) based proteomic data analysis, peak detection is an essential step for subsequent analysis. Recently, there has been significant progress in the development of various peak detection algorithms. However, neither a comprehensive survey nor an experimental comparison of these algorith...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-4

    authors: Yang C,He Z,Yu W

    更新日期:2009-01-06 00:00:00

  • Evaluating eukaryotic secreted protein prediction.

    abstract:BACKGROUND:Improvements in protein sequence annotation and an increase in the number of annotated protein databases has fueled development of an increasing number of software tools to predict secreted proteins. Six software programs capable of high throughput and employing a wide range of prediction methods, SignalP 3....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-256

    authors: Klee EW,Ellis LB

    更新日期:2005-10-14 00:00:00

  • Identifying module biomarker in type 2 diabetes mellitus by discriminative area of functional activity.

    abstract:BACKGROUND:Identifying diagnosis and prognosis biomarkers from expression profiling data is of great significance for achieving personalized medicine and designing therapeutic strategy in complex diseases. However, the reproducibility of identified biomarkers across tissues and experiments is still a challenge for this...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0519-y

    authors: Zhang X,Gao L,Liu ZP,Chen L

    更新日期:2015-03-18 00:00:00

  • A global optimization algorithm for protein surface alignment.

    abstract:BACKGROUND:A relevant problem in drug design is the comparison and recognition of protein binding sites. Binding sites recognition is generally based on geometry often combined with physico-chemical properties of the site since the conformation, size and chemical composition of the protein surface are all relevant for ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-488

    authors: Bertolazzi P,Guerra C,Liuzzi G

    更新日期:2010-09-29 00:00:00

  • CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies.

    abstract:BACKGROUND:Current taxonomic classification tools use exact string matching algorithms that are effective to tackle the data from the next generation sequencing technology. However, the unique error patterns in the third generation sequencing (TGS) technologies could reduce the accuracy of these programs. RESULTS:We d...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03777-y

    authors: Bui VK,Wei C

    更新日期:2020-10-20 00:00:00