Learning smoothing models of copy number profiles using breakpoint annotations.

Abstract:

BACKGROUND:Many models have been proposed to detect copy number alterations in chromosomal copy number profiles, but it is usually not obvious to decide which is most effective for a given data set. Furthermore, most methods have a smoothing parameter that determines the number of breakpoints and must be chosen using various heuristics. RESULTS:We present three contributions for copy number profile smoothing model selection. First, we propose to select the model and degree of smoothness that maximizes agreement with visual breakpoint region annotations. Second, we develop cross-validation procedures to estimate the error of the trained models. Third, we apply these methods to compare 17 smoothing models on a new database of 575 annotated neuroblastoma copy number profiles, which we make available as a public benchmark for testing new algorithms. CONCLUSIONS:Whereas previous studies have been qualitative or limited to simulated data, our annotation-guided approach is quantitative and suggests which algorithms are fastest and most accurate in practice on real data. In the neuroblastoma data, the equivalent pelt.n and cghseg.k methods were the best breakpoint detectors, and exhibited reasonable computation times.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Hocking TD,Schleiermacher G,Janoueix-Lerosey I,Boeva V,Cappo J,Delattre O,Bach F,Vert JP

doi

10.1186/1471-2105-14-164

subject

Has Abstract

pub_date

2013-05-22 00:00:00

pages

164

issn

1471-2105

pii

1471-2105-14-164

journal_volume

14

pub_type

杂志文章
  • scDC: single cell differential composition analysis.

    abstract:BACKGROUND:Differences in cell-type composition across subjects and conditions often carry biological significance. Recent advancements in single cell sequencing technologies enable cell-types to be identified at the single cell level, and as a result, cell-type composition of tissues can now be studied in exquisite de...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3211-9

    authors: Cao Y,Lin Y,Ormerod JT,Yang P,Yang JYH,Lo KK

    更新日期:2019-12-24 00:00:00

  • Uncovering packaging features of co-regulated modules based on human protein interaction and transcriptional regulatory networks.

    abstract:BACKGROUND:Network co-regulated modules are believed to have the functionality of packaging multiple biological entities, and can thus be assumed to coordinate many biological functions in their network neighbouring regions. RESULTS:Here, we weighted edges of a human protein interaction network and a transcriptional r...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-392

    authors: Chen L,Wang H,Zhang L,Li W,Wang Q,Shang Y,He Y,He W,Li X,Tai J,Li X

    更新日期:2010-07-22 00:00:00

  • CNN-based ranking for biomedical entity normalization.

    abstract:BACKGROUND:Most state-of-the-art biomedical entity normalization systems, such as rule-based systems, merely rely on morphological information of entity mentions, but rarely consider their semantic information. In this paper, we introduce a novel convolutional neural network (CNN) architecture that regards biomedical e...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1805-7

    authors: Li H,Chen Q,Tang B,Wang X,Xu H,Wang B,Huang D

    更新日期:2017-10-03 00:00:00

  • Improving ontologies by automatic reasoning and evaluation of logical definitions.

    abstract:BACKGROUND:Ontologies are widely used to represent knowledge in biomedicine. Systematic approaches for detecting errors and disagreements are needed for large ontologies with hundreds or thousands of terms and semantic relationships. A recent approach of defining terms using logical definitions is now increasingly bein...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-418

    authors: Köhler S,Bauer S,Mungall CJ,Carletti G,Smith CL,Schofield P,Gkoutos GV,Robinson PN

    更新日期:2011-10-27 00:00:00

  • Homology modeling, molecular docking, and molecular dynamics simulations elucidated α-fetoprotein binding modes.

    abstract:BACKGROUND:An important mechanism of endocrine activity is chemicals entering target cells via transport proteins and then interacting with hormone receptors such as the estrogen receptor (ER). α-Fetoprotein (AFP) is a major transport protein in rodent serum that can bind and sequester estrogens, thus preventing entry ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S14-S6

    authors: Shen J,Zhang W,Fang H,Perkins R,Tong W,Hong H

    更新日期:2013-01-01 00:00:00

  • 2D electrophoresis image brightness correction based on gradient interval histogram.

    abstract:BACKGROUND:Two-dimensional electrophoresis (2DE) is one of the most widely applied techniques in comparative proteomics. The basic task of 2DE is to identify differential protein expression by quantitative analysis of 2DE images. To reduce the errors of spot quantification in 2DE images, a novel brightness correction m...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3432-y

    authors: Ou Q,Xiao J,Yu L,Wu K,Xiong B

    更新日期:2020-03-19 00:00:00

  • PseUI: Pseudouridine sites identification based on RNA sequence information.

    abstract:BACKGROUND:Pseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefit for understanding t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2321-0

    authors: He J,Fang T,Zhang Z,Huang B,Zhu X,Xiong Y

    更新日期:2018-08-29 00:00:00

  • Enhanced JBrowse plugins for epigenomics data visualization.

    abstract:BACKGROUND:New sequencing techniques require new visualization strategies, as is the case for epigenomics data such as DNA base modifications, small non-coding RNAs, and histone modifications. RESULTS:We present a set of plugins for the genome browser JBrowse that are targeted for epigenomics visualizations. Specifica...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2160-z

    authors: Hofmeister BT,Schmitz RJ

    更新日期:2018-04-25 00:00:00

  • Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics.

    abstract:BACKGROUND:The development of high-throughput experimentation has led to astronomical growth in biologically relevant lipids and lipid derivatives identified, screened, and deposited in numerous online databases. Unfortunately, efforts to annotate, classify, and analyze these chemical entities have largely remained in ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-303

    authors: Chepelev LL,Riazanov A,Kouznetsov A,Low HS,Dumontier M,Baker CJ

    更新日期:2011-07-26 00:00:00

  • Correction to: Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage.

    abstract::Following publication of the original article [1], the author reported that there are several errors in the original article. ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章,已发布勘误

    doi:10.1186/s12859-019-3318-z

    authors: Ranjard L,Wong TKF,Rodrigo AG

    更新日期:2020-01-22 00:00:00

  • Analyzing gene expression data for pediatric and adult cancer diagnosis using logic learning machine and standard supervised methods.

    abstract:BACKGROUND:Logic Learning Machine (LLM) is an innovative method of supervised analysis capable of constructing models based on simple and intelligible rules. In this investigation the performance of LLM in classifying patients with cancer was evaluated using a set of eight publicly available gene expression databases f...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2953-8

    authors: Verda D,Parodi S,Ferrari E,Muselli M

    更新日期:2019-11-22 00:00:00

  • Building blocks for automated elucidation of metabolites: natural product-likeness for candidate ranking.

    abstract:BACKGROUND:In metabolomics experiments, spectral fingerprints of metabolites with no known structural identity are detected routinely. Computer-assisted structure elucidation (CASE) has been used to determine the structural identities of unknown compounds. It is generally accepted that a single 1D NMR spectrum or mass ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-234

    authors: Jayaseelan KV,Steinbeck C

    更新日期:2014-07-05 00:00:00

  • Extended analysis of benchmark datasets for Agilent two-color microarrays.

    abstract:BACKGROUND:As part of its broad and ambitious mission, the MicroArray Quality Control (MAQC) project reported the results of experiments using External RNA Controls (ERCs) on five microarray platforms. For most platforms, several different methods of data processing were considered. However, there was no similar consid...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-371

    authors: Kerr KF

    更新日期:2007-10-03 00:00:00

  • VKCDB: voltage-gated potassium channel database.

    abstract:BACKGROUND:The family of voltage-gated potassium channels comprises a functionally diverse group of membrane proteins. They help maintain and regulate the potassium ion-based component of the membrane potential and are thus central to many critical physiological processes. VKCDB (Voltage-gated potassium [K] Channel Dat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1186/1471-2105-5-3

    authors: Li B,Gallin WJ

    更新日期:2004-01-09 00:00:00

  • Sequence-structure relations of pseudoknot RNA.

    abstract:BACKGROUND:The analysis of sequence-structure relations of RNA is based on a specific notion and folding of RNA structure. The notion of coarse grained structure employed here is that of canonical RNA pseudoknot contact-structures with at most two mutually crossing bonds (3-noncrossing). These structures are folded by ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S1-S39

    authors: Huang FW,Li LY,Reidys CM

    更新日期:2009-01-30 00:00:00

  • Graph based fusion of miRNA and mRNA expression data improves clinical outcome prediction in prostate cancer.

    abstract:BACKGROUND:One of the main goals in cancer studies including high-throughput microRNA (miRNA) and mRNA data is to find and assess prognostic signatures capable of predicting clinical outcome. Both mRNA and miRNA expression changes in cancer diseases are described to reflect clinical characteristics like staging and pro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-488

    authors: Gade S,Porzelius C,Fälth M,Brase JC,Wuttig D,Kuner R,Binder H,Sültmann H,Beissbarth T

    更新日期:2011-12-21 00:00:00

  • SPECS: a non-parametric method to identify tissue-specific molecular features for unbalanced sample groups.

    abstract:BACKGROUND:To understand biology and differences among various tissues or cell types, one typically searches for molecular features that display characteristic abundance patterns. Several specificity metrics have been introduced to identify tissue-specific molecular features, but these either require an equal number of...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3407-z

    authors: Everaert C,Volders PJ,Morlion A,Thas O,Mestdagh P

    更新日期:2020-02-17 00:00:00

  • Linear predictive coding representation of correlated mutation for protein sequence alignment.

    abstract:BACKGROUND:Although both conservation and correlated mutation (CM) are important information reflecting the different sorts of context in multiple sequence alignment, most of alignment methods use sequence profiles that only represent conservation. There is no general way to represent correlated mutation and incorporat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S2-S2

    authors: Jeong CS,Kim D

    更新日期:2010-04-16 00:00:00

  • A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    abstract:BACKGROUND:Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1142-2

    authors: Thakur S,Guttman DS

    更新日期:2016-06-30 00:00:00

  • A computational evaluation of over-representation of regulatory motifs in the promoter regions of differentially expressed genes.

    abstract:BACKGROUND:Observed co-expression of a group of genes is frequently attributed to co-regulation by shared transcription factors. This assumption has led to the hypothesis that promoters of co-expressed genes should share common regulatory motifs, which forms the basis for numerous computational tools that search for th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-267

    authors: Meng G,Mosig A,Vingron M

    更新日期:2010-05-20 00:00:00

  • Development and tuning of an original search engine for patent libraries in medicinal chemistry.

    abstract:BACKGROUND:The large increase in the size of patent collections has led to the need of efficient search strategies. But the development of advanced text-mining applications dedicated to patents of the biomedical field remains rare, in particular to address the needs of the pharmaceutical & biotech industry, which inten...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S1-S15

    authors: Pasche E,Gobeill J,Kreim O,Oezdemir-Zaech F,Vachon T,Lovis C,Ruch P

    更新日期:2014-01-01 00:00:00

  • Nonparametric relevance-shifted multiple testing procedures for the analysis of high-dimensional multivariate data with small sample sizes.

    abstract:BACKGROUND:In many research areas it is necessary to find differences between treatment groups with several variables. For example, studies of microarray data seek to find a significant difference in location parameters from zero or one for ratios thereof for each variable. However, in some studies a significant deviat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-54

    authors: Frömke C,Hothorn LA,Kropf S

    更新日期:2008-01-27 00:00:00

  • tcR: an R package for T cell receptor repertoire advanced data analysis.

    abstract:BACKGROUND:The Immunoglobulins (IG) and the T cell receptors (TR) play the key role in antigen recognition during the adaptive immune response. Recent progress in next-generation sequencing technologies has provided an opportunity for the deep T cell receptor repertoire profiling. However, a specialised software is req...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0613-1

    authors: Nazarov VI,Pogorelyy MV,Komech EA,Zvyagin IV,Bolotin DA,Shugay M,Chudakov DM,Lebedev YB,Mamedov IZ

    更新日期:2015-05-28 00:00:00

  • An improved classification of G-protein-coupled receptors using sequence-derived features.

    abstract:BACKGROUND:G-protein-coupled receptors (GPCRs) play a key role in diverse physiological processes and are the targets of almost two-thirds of the marketed drugs. The 3 D structures of GPCRs are largely unavailable; however, a large number of GPCR primary sequences are known. To facilitate the identification and charact...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-420

    authors: Peng ZL,Yang JY,Chen X

    更新日期:2010-08-09 00:00:00

  • The effect of rare variants on inflation of the test statistics in case-control analyses.

    abstract:BACKGROUND:The detection of bias due to cryptic population structure is an important step in the evaluation of findings of genetic association studies. The standard method of measuring this bias in a genetic association study is to compare the observed median association test statistic to the expected median test stati...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0496-1

    authors: Pirie A,Wood A,Lush M,Tyrer J,Pharoah PD

    更新日期:2015-02-20 00:00:00

  • GraphCrunch: a tool for large network analyses.

    abstract:BACKGROUND:The recent explosion in biological and other real-world network data has created the need for improved tools for large network analyses. In addition to well established global network properties, several new mathematical techniques for analyzing local structural properties of large networks have been develop...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-70

    authors: Milenković T,Lai J,Przulj N

    更新日期:2008-01-30 00:00:00

  • Sample size calculation while controlling false discovery rate for differential expression analysis with RNA-sequencing experiments.

    abstract:BACKGROUND:RNA-Sequencing (RNA-seq) experiments have been popularly applied to transcriptome studies in recent years. Such experiments are still relatively costly. As a result, RNA-seq experiments often employ a small number of replicates. Power analysis and sample size calculation are challenging in the context of dif...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-0994-9

    authors: Bi R,Liu P

    更新日期:2016-03-31 00:00:00

  • Toward an interactive article: integrating journals and biological databases.

    abstract:BACKGROUND:Journal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the rise over the last five...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-175

    authors: Rangarajan A,Schedl T,Yook K,Chan J,Haenel S,Otis L,Faelten S,DePellegrin-Connelly T,Isaacson R,Skrzypek MS,Marygold SJ,Stefancsik R,Cherry JM,Sternberg PW,Müller HM

    更新日期:2011-05-19 00:00:00

  • Model based heritability scores for high-throughput sequencing data.

    abstract:BACKGROUND:Heritability of a phenotypic or molecular trait measures the proportion of variance that is attributable to genotypic variance. It is an important concept in breeding and genetics. Few methods are available for calculating heritability for traits derived from high-throughput sequencing. RESULTS:We propose s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1539-6

    authors: Rudra P,Shi WJ,Vestal B,Russell PH,Odell A,Dowell RD,Radcliffe RA,Saba LM,Kechris K

    更新日期:2017-03-02 00:00:00

  • Cascaded classifiers for confidence-based chemical named entity recognition.

    abstract:BACKGROUND:Chemical named entities represent an important facet of biomedical text. RESULTS:We have developed a system to use character-based n-grams, Maximum Entropy Markov Models and rescoring to recognise chemical names and other such entities, and to make confidence estimates for the extracted entities. An adjusta...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S11-S4

    authors: Corbett P,Copestake A

    更新日期:2008-11-19 00:00:00