Simple binary segmentation frameworks for identifying variation in DNA copy number.

Abstract:

BACKGROUND:Variation in DNA copy number, due to gains and losses of chromosome segments, is common. A first step for analyzing DNA copy number data is to identify amplified or deleted regions in individuals. To locate such regions, we propose a circular binary segmentation procedure, which is based on a sequence of nested hypothesis tests, each using the Bayesian information criterion. RESULTS:Our procedure is convenient for analyzing DNA copy number in two general situations: (1) when using data from multiple sources and (2) when using cohort analysis of multiple patients suffering from the same type of cancer. In the first case, data from multiple sources such as different platforms, labs, or preprocessing methods are used to study variation in copy number in the same individual. Combining these sources provides a higher resolution, which leads to a more detailed genome-wide survey of the individual. In this case, we provide a simple statistical framework to derive a consensus molecular signature. In the framework, the multiple sequences from various sources are integrated into a single sequence, and then the proposed segmentation procedure is applied to this sequence to detect aberrant regions. In the second case, cohort analysis of multiple patients is carried out to derive overall molecular signatures for the cohort. For this case, we provide another simple statistical framework in which data across multiple profiles is standardized before segmentation. The proposed segmentation procedure is then applied to the standardized profiles one at a time to detect aberrant regions. Any such regions that are common across two or more profiles are probably real and may play important roles in the cancer pathogenesis process. CONCLUSIONS:The main advantages of the proposed procedure are flexibility and simplicity.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Yang TY

doi

10.1186/1471-2105-13-277

subject

Has Abstract

pub_date

2012-10-30 00:00:00

pages

277

issn

1471-2105

pii

1471-2105-13-277

journal_volume

13

pub_type

杂志文章
  • Exploring matrix factorization techniques for significant genes identification of Alzheimer's disease microarray gene expression data.

    abstract:BACKGROUND:The wide use of high-throughput DNA microarray technology provide an increasingly detailed view of human transcriptome from hundreds to thousands of genes. Although biomedical researchers typically design microarray experiments to explore specific biological contexts, the relationships between genes are hard...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S5-S7

    authors: Kong W,Mou X,Hu X

    更新日期:2011-01-01 00:00:00

  • Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data.

    abstract:BACKGROUND:Microarray technology has become very popular for globally evaluating gene expression in biological samples. However, non-linear variation associated with the technology can make data interpretation unreliable. Therefore, methods to correct this kind of technical variation are critical. Here we consider a me...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-520

    authors: Pelz CR,Kulesz-Martin M,Bagby G,Sears RC

    更新日期:2008-12-04 00:00:00

  • Pairwise protein expression classifier for candidate biomarker discovery for early detection of human disease prognosis.

    abstract:BACKGROUND:An approach to molecular classification based on the comparative expression of protein pairs is presented. The method overcomes some of the present limitations in using peptide intensity data for class prediction for problems such as the detection of a disease, disease prognosis, or for predicting treatment ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-191

    authors: Kaur P,Schlatzer D,Cooke K,Chance MR

    更新日期:2012-08-07 00:00:00

  • Method to represent the distribution of QTL additive and dominance effects associated with quantitative traits in computer simulation.

    abstract:BACKGROUND:Computer simulation is a resource which can be employed to identify optimal breeding strategies to effectively and efficiently achieve specific goals in developing improved cultivars. In some instances, it is crucial to assess in silico the options as well as the impact of various crossing schemes and breedi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-0906-z

    authors: Sun X,Mumm RH

    更新日期:2016-02-06 00:00:00

  • NOXclass: prediction of protein-protein interaction types.

    abstract:BACKGROUND:Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investiga...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-27

    authors: Zhu H,Domingues FS,Sommer I,Lengauer T

    更新日期:2006-01-19 00:00:00

  • A MATLAB tool for pathway enrichment using a topology-based pathway regulation score.

    abstract:BACKGROUND:Handling the vast amount of gene expression data generated by genome-wide transcriptional profiling techniques is a challenging task, demanding an informed combination of pre-processing, filtering and analysis methods if meaningful biological conclusions are to be drawn. For example, a range of traditional s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0358-2

    authors: Ibrahim M,Jassim S,Cawthorne MA,Langlands K

    更新日期:2014-11-04 00:00:00

  • Enhanced CellClassifier: a multi-class classification tool for microscopy images.

    abstract:BACKGROUND:Light microscopy is of central importance in cell biology. The recent introduction of automated high content screening has expanded this technology towards automation of experiments and performing large scale perturbation assays. Nevertheless, evaluation of microscopy data continues to be a bottleneck in man...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-30

    authors: Misselwitz B,Strittmatter G,Periaswamy B,Schlumberger MC,Rout S,Horvath P,Kozak K,Hardt WD

    更新日期:2010-01-14 00:00:00

  • Amino acid sequence associated with bacteriophage recombination site helps to reveal genes potentially acquired through horizontal gene transfer.

    abstract:BACKGROUND:Horizontal gene transfer, i.e. the acquisition of genetic material from nonparent organism, is considered an important force driving species evolution. Many cases of horizontal gene transfer from prokaryotes to eukaryotes have been registered, but no transfer mechanism has been deciphered so far, although vi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03599-y

    authors: Daugavet MA,Shabelnikov SV,Podgornaya OI

    更新日期:2020-07-24 00:00:00

  • NEAT: an efficient network enrichment analysis test.

    abstract:BACKGROUND:Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1203-6

    authors: Signorelli M,Vinciotti V,Wit EC

    更新日期:2016-09-05 00:00:00

  • CNN-based ranking for biomedical entity normalization.

    abstract:BACKGROUND:Most state-of-the-art biomedical entity normalization systems, such as rule-based systems, merely rely on morphological information of entity mentions, but rarely consider their semantic information. In this paper, we introduce a novel convolutional neural network (CNN) architecture that regards biomedical e...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1805-7

    authors: Li H,Chen Q,Tang B,Wang X,Xu H,Wang B,Huang D

    更新日期:2017-10-03 00:00:00

  • LDpop: an interactive online tool to calculate and visualize geographic LD patterns.

    abstract:BACKGROUND:Linkage disequilibrium (LD)-the non-random association of alleles at different loci-defines population-specific haplotypes which vary by genomic ancestry. Assessment of allelic frequencies and LD patterns from a variety of ancestral populations enables researchers to better understand population histories as...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3340-1

    authors: Alexander TA,Machiela MJ

    更新日期:2020-01-10 00:00:00

  • Colony size measurement of the yeast gene deletion strains for functional genomics.

    abstract:BACKGROUND:Numerous functional genomics approaches have been developed to study the model organism yeast, Saccharomyces cerevisiae, with the aim of systematically understanding the biology of the cell. Some of these techniques are based on yeast growth differences under different conditions, such as those generated by ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-117

    authors: Memarian N,Jessulat M,Alirezaie J,Mir-Rashed N,Xu J,Zareie M,Smith M,Golshani A

    更新日期:2007-04-04 00:00:00

  • Large scale tissue histopathology image classification, segmentation, and visualization via deep convolutional activation features.

    abstract:BACKGROUND:Histopathology image analysis is a gold standard for cancer recognition and diagnosis. Automatic analysis of histopathology images can help pathologists diagnose tumor and cancer subtypes, alleviating the workload of pathologists. There are two basic types of tasks in digital histopathology image analysis: i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1685-x

    authors: Xu Y,Jia Z,Wang LB,Ai Y,Zhang F,Lai M,Chang EI

    更新日期:2017-05-26 00:00:00

  • Tandem repeats discovery service (TReaDS) applied to finding novel cis-acting factors in repeat expansion diseases.

    abstract:BACKGROUND:Tandem repeats are multiple duplications of substrings in the DNA that occur contiguously, or at a short distance, and may involve some mutations (such as substitutions, insertions, and deletions). Tandem repeats have been extensively studied also for their association with the class of repeat expansion dise...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S4-S3

    authors: Pellegrini M,Renda ME,Vecchio A

    更新日期:2012-03-28 00:00:00

  • SPECS: a non-parametric method to identify tissue-specific molecular features for unbalanced sample groups.

    abstract:BACKGROUND:To understand biology and differences among various tissues or cell types, one typically searches for molecular features that display characteristic abundance patterns. Several specificity metrics have been introduced to identify tissue-specific molecular features, but these either require an equal number of...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3407-z

    authors: Everaert C,Volders PJ,Morlion A,Thas O,Mestdagh P

    更新日期:2020-02-17 00:00:00

  • Natural computation meta-heuristics for the in silico optimization of microbial strains.

    abstract:BACKGROUND:One of the greatest challenges in Metabolic Engineering is to develop quantitative models and algorithms to identify a set of genetic manipulations that will result in a microbial strain with a desirable metabolic phenotype which typically means having a high yield/productivity. This challenge is not only du...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-499

    authors: Rocha M,Maia P,Mendes R,Pinto JP,Ferreira EC,Nielsen J,Patil KR,Rocha I

    更新日期:2008-11-27 00:00:00

  • A multiple-alignment based primer design algorithm for genetically highly variable DNA targets.

    abstract:BACKGROUND:Primer design for highly variable DNA sequences is difficult, and experimental success requires attention to many interacting constraints. The advent of next-generation sequencing methods allows the investigation of rare variants otherwise hidden deep in large populations, but requires attention to populatio...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-255

    authors: Brodin J,Krishnamoorthy M,Athreya G,Fischer W,Hraber P,Gleasner C,Green L,Korber B,Leitner T

    更新日期:2013-08-21 00:00:00

  • The ontology of biological sequences.

    abstract:BACKGROUND:Biological sequences play a major role in molecular and computational biology. They are studied as information-bearing entities that make up DNA, RNA or proteins. The Sequence Ontology, which is part of the OBO Foundry, contains descriptions and definitions of sequences and their properties. Yet the most bas...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-377

    authors: Hoehndorf R,Kelso J,Herre H

    更新日期:2009-11-18 00:00:00

  • An integrative method to normalize RNA-Seq data.

    abstract:BACKGROUND:Transcriptome sequencing is a powerful tool for measuring gene expression, but as well as some other technologies, various artifacts and biases affect the quantification. In order to correct some of them, several normalization approaches have emerged, differing both in the statistical strategy employed and i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-188

    authors: Filloux C,Cédric M,Romain P,Lionel F,Christophe K,Dominique R,Abderrahman M,Daniel P

    更新日期:2014-06-14 00:00:00

  • Integrated olfactory receptor and microarray gene expression databases.

    abstract:BACKGROUND:Gene expression patterns of olfactory receptors (ORs) are an important component of the signal encoding mechanism in the olfactory system since they determine the interactions between odorant ligands and sensory neurons. We have developed the Olfactory Receptor Microarray Database (ORMD) to house OR gene exp...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-231

    authors: Liu N,Crasto CJ,Ma M

    更新日期:2007-06-30 00:00:00

  • Using mechanistic Bayesian networks to identify downstream targets of the sonic hedgehog pathway.

    abstract:BACKGROUND:The topology of a biological pathway provides clues as to how a pathway operates, but rationally using this topology information with observed gene expression data remains a challenge. RESULTS:We introduce a new general-purpose analytic method called Mechanistic Bayesian Networks (MBNs) that allows for the ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-433

    authors: Shah A,Tenzen T,McMahon AP,Woolf PJ

    更新日期:2009-12-18 00:00:00

  • Variable cellular decision-making behavior in a constant synthetic network topology.

    abstract:BACKGROUND:Modules of interacting components arranged in specific network topologies have evolved to perform a diverse array of cellular functions. For a network with a constant topological structure, its function within a cell may still be tuned by changing the number of instances of a particular component (e.g., gene...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2866-6

    authors: Shah NA,Sarkar CA

    更新日期:2019-05-14 00:00:00

  • Visualization methods for statistical analysis of microarray clusters.

    abstract:BACKGROUND:The most common method of identifying groups of functionally related genes in microarray data is to apply a clustering algorithm. However, it is impossible to determine which clustering algorithm is most appropriate to apply, and it is difficult to verify the results of any algorithm due to the lack of a gol...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-115

    authors: Hibbs MA,Dirksen NC,Li K,Troyanskaya OG

    更新日期:2005-05-12 00:00:00

  • An algorithm for automated closure during assembly.

    abstract:BACKGROUND:Finishing is the process of improving the quality and utility of draft genome sequences generated by shotgun sequencing and computational assembly. Finishing can involve targeted sequencing. Finishing reads may be incorporated by manual or automated means. One automated method uses targeted addition by local...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-457

    authors: Koren S,Miller JR,Walenz BP,Sutton G

    更新日期:2010-09-10 00:00:00

  • CollapsABEL: an R library for detecting compound heterozygote alleles in genome-wide association studies.

    abstract:BACKGROUND:Compound Heterozygosity (CH) in classical genetics is the presence of two different recessive mutations at a particular gene locus. A relaxed form of CH alleles may account for an essential proportion of the missing heritability, i.e. heritability of phenotypes so far not accounted for by single genetic vari...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1006-9

    authors: Zhong K,Karssen LC,Kayser M,Liu F

    更新日期:2016-04-08 00:00:00

  • Identification of discriminative characteristics for clusters from biologic data with InforBIO software.

    abstract:BACKGROUND:There are a number of different methods for generation of trees and algorithms for phylogenetic analysis in the study of bacterial taxonomy. Genotypic information, such as SSU rRNA gene sequences, now plays a more prominent role in microbial systematics than does phenotypic information. However, the integrat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-281

    authors: Tanaka N,Uchino M,Miyazaki S,Sugawara H

    更新日期:2007-08-02 00:00:00

  • Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data.

    abstract:BACKGROUND:A fundamental fact in biology states that genes do not operate in isolation, and yet, methods that infer regulatory networks for single cell gene expression data have been slow to emerge. With single cell sequencing methods now becoming accessible, general network inference algorithms that were initially dev...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2217-z

    authors: Chen S,Mar JC

    更新日期:2018-06-19 00:00:00

  • Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations.

    abstract:BACKGROUND:Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be generated from a continuous...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1790-x

    authors: Nguyen LH,Holmes S

    更新日期:2017-09-13 00:00:00

  • Genomic prediction of tuberculosis drug-resistance: benchmarking existing databases and prediction algorithms.

    abstract:BACKGROUND:It is possible to predict whether a tuberculosis (TB) patient will fail to respond to specific antibiotics by sequencing the genome of the infecting Mycobacterium tuberculosis (Mtb) and observing whether the pathogen carries specific mutations at drug-resistance sites. This advancement has led to the collati...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2658-z

    authors: Ngo TM,Teo YY

    更新日期:2019-02-08 00:00:00

  • Attenuating dependence on structural data in computing protein energy landscapes.

    abstract:BACKGROUND:Nearly all cellular processes involve proteins structurally rearranging to accommodate molecular partners. The energy landscape underscores the inherent nature of proteins as dynamic molecules interconverting between structures with varying energies. In principle, reconstructing a protein's energy landscape ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2822-5

    authors: Morris D,Maximova T,Plaku E,Shehu A

    更新日期:2019-06-06 00:00:00