XenofilteR: computational deconvolution of mouse and human reads in tumor xenograft sequence data.

Abstract:

BACKGROUND:Mouse xenografts from (patient-derived) tumors (PDX) or tumor cell lines are widely used as models to study various biological and preclinical aspects of cancer. However, analyses of their RNA and DNA profiles are challenging, because they comprise reads not only from the grafted human cancer but also from the murine host. The reads of murine origin result in false positives in mutation analysis of DNA samples and obscure gene expression levels when sequencing RNA. However, currently available algorithms are limited and improvements in accuracy and ease of use are necessary. RESULTS:We developed the R-package XenofilteR, which separates mouse from human sequence reads based on the edit-distance between a sequence read and reference genome. To assess the accuracy of XenofilteR, we generated sequence data by in silico mixing of mouse and human DNA sequence data. These analyses revealed that XenofilteR removes > 99.9% of sequence reads of mouse origin while retaining human sequences. This allowed for mutation analysis of xenograft samples with accurate variant allele frequencies, and retrieved all non-synonymous somatic tumor mutations. CONCLUSIONS:XenofilteR accurately dissects RNA and DNA sequences from mouse and human origin, thereby outperforming currently available tools. XenofilteR is open source and available at https://github.com/PeeperLab/XenofilteR .

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Kluin RJC,Kemper K,Kuilman T,de Ruiter JR,Iyer V,Forment JV,Cornelissen-Steijger P,de Rink I,Ter Brugge P,Song JY,Klarenbeek S,McDermott U,Jonkers J,Velds A,Adams DJ,Peeper DS,Krijgsman O

doi

10.1186/s12859-018-2353-5

subject

Has Abstract

pub_date

2018-10-04 00:00:00

pages

366

issue

1

issn

1471-2105

pii

10.1186/s12859-018-2353-5

journal_volume

19

pub_type

杂志文章
  • Predicting protein functions by relaxation labelling protein interaction network.

    abstract:BACKGROUND:One of key issues in the post-genomic era is to assign functions to uncharacterized proteins. Since proteins seldom act alone; rather, they must interact with other biomolecular units to execute their functions. Thus, the functions of unknown proteins may be discovered through studying their interactions wit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S1-S64

    authors: Hu P,Jiang H,Emili A

    更新日期:2010-01-18 00:00:00

  • Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.

    abstract:BACKGROUND:Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing inf...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0418-7

    authors: Bansal V,Libiger O

    更新日期:2015-01-16 00:00:00

  • Fregene: simulation of realistic sequence-level data in populations and ascertained samples.

    abstract:BACKGROUND:FREGENE simulates sequence-level data over large genomic regions in large populations. Because, unlike coalescent simulators, it works forwards through time, it allows complex scenarios of selection, demography, and recombination to be modelled simultaneously. Detailed tracking of sites under selection is im...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-364

    authors: Chadeau-Hyam M,Hoggart CJ,O'Reilly PF,Whittaker JC,De Iorio M,Balding DJ

    更新日期:2008-09-08 00:00:00

  • Mutation status coupled with RNA-sequencing data can efficiently identify important non-significantly mutated genes serving as diagnostic biomarkers of endometrial cancer.

    abstract:BACKGROUND:Endometrial cancers (ECs) are one of the most common types of malignant tumor in females. Substantial efforts had been made to identify significantly mutated genes (SMGs) in ECs and use them as biomarkers for the classification of histological subtypes and the prediction of clinical outcomes. However, the im...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1891-6

    authors: Liu K,He L,Liu Z,Xu J,Liu Y,Kuang Q,Wen Z,Li M

    更新日期:2017-12-28 00:00:00

  • The GMOseek matrix: a decision support tool for optimizing the detection of genetically modified plants.

    abstract:BACKGROUND:Since their first commercialization, the diversity of taxa and the genetic composition of transgene sequences in genetically modified plants (GMOs) are constantly increasing. To date, the detection of GMOs and derived products is commonly performed by PCR-based methods targeting specific DNA sequences introd...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-256

    authors: Block A,Debode F,Grohmann L,Hulin J,Taverniers I,Kluga L,Barbau-Piednoir E,Broeders S,Huber I,Van den Bulcke M,Heinze P,Berben G,Busch U,Roosens N,Janssen E,Žel J,Gruden K,Morisset D

    更新日期:2013-08-22 00:00:00

  • PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics.

    abstract:BACKGROUND:The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-53

    authors: von Grotthuss M,Plewczynski D,Ginalski K,Rychlewski L,Shakhnovich EI

    更新日期:2006-02-06 00:00:00

  • A comparative study of conservation and variation scores.

    abstract:BACKGROUND:Conservation and variation scores are used when evaluating sites in a multiple sequence alignment, in order to identify residues critical for structure or function. A variety of scores are available today but it is not clear how different scores relate to each other. RESULTS:We applied 25 conservation and v...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-388

    authors: Johansson F,Toh H

    更新日期:2010-07-21 00:00:00

  • Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis.

    abstract:BACKGROUND:In mass spectrometry (MS) based proteomic data analysis, peak detection is an essential step for subsequent analysis. Recently, there has been significant progress in the development of various peak detection algorithms. However, neither a comprehensive survey nor an experimental comparison of these algorith...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-4

    authors: Yang C,He Z,Yu W

    更新日期:2009-01-06 00:00:00

  • Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens.

    abstract:BACKGROUND:The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologist...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-264

    authors: Yin Z,Zhou X,Bakal C,Li F,Sun Y,Perrimon N,Wong ST

    更新日期:2008-06-05 00:00:00

  • Hierarchical structure and modules in the Escherichia coli transcriptional regulatory network revealed by a new top-down approach.

    abstract:BACKGROUND:Cellular functions are coordinately carried out by groups of genes forming functional modules. Identifying such modules in the transcriptional regulatory network (TRN) of organisms is important for understanding the structure and function of these fundamental cellular networks and essential for the emerging ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-199

    authors: Ma HW,Buer J,Zeng AP

    更新日期:2004-12-16 00:00:00

  • MergeAlign: improving multiple sequence alignment performance by dynamic reconstruction of consensus multiple sequence alignments.

    abstract:BACKGROUND:The generation of multiple sequence alignments (MSAs) is a crucial step for many bioinformatic analyses. Thus improving MSA accuracy and identifying potential errors in MSAs is important for a wide range of post-genomic research. We present a novel method called MergeAlign which constructs consensus MSAs fro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-117

    authors: Collingridge PW,Kelly S

    更新日期:2012-05-30 00:00:00

  • Structural characterization of genomes by large scale sequence-structure threading: application of reliability analysis in structural genomics.

    abstract:BACKGROUND:We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, ma...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-101

    authors: Cherkasov A,Ho Sui SJ,Brunham RC,Jones SJ

    更新日期:2004-07-26 00:00:00

  • A theorem proving approach for automatically synthesizing visualizations of flow cytometry data.

    abstract:BACKGROUND:Polychromatic flow cytometry is a popular technique that has wide usage in the medical sciences, especially for studying phenotypic properties of cells. The high-dimensionality of data generated by flow cytometry usually makes it difficult to visualize. The naive solution of simply plotting two-dimensional g...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1662-4

    authors: Raj S,Hussain F,Husein Z,Torosdagli N,Turgut D,Deo N,Pattanaik S,Chang CJ,Jha SK

    更新日期:2017-06-07 00:00:00

  • In silico design of targeted SRM-based experiments.

    abstract::Selected reaction monitoring (SRM)-based proteomics approaches enable highly sensitive and reproducible assays for profiling of thousands of peptides in one experiment. The development of such assays involves the determination of retention time, detectability and fragmentation properties of peptides, followed by an op...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S16-S8

    authors: Nahnsen S,Kohlbacher O

    更新日期:2012-01-01 00:00:00

  • Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis.

    abstract:BACKGROUND:A rapidly increasing flow of genomic data requires the development of efficient methods for obtaining its compact representation. Feature extraction facilitates classification, clustering and model analysis for testing and refining biological hypotheses. "Shotgun" metagenome is an analytically challenging ty...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0875-7

    authors: Dubinkina VB,Ischenko DS,Ulyantsev VI,Tyakht AV,Alexeev DG

    更新日期:2016-01-16 00:00:00

  • Widespread evidence of viral miRNAs targeting host pathways.

    abstract:BACKGROUND:MicroRNAs (miRNA) are regulatory genes that target and repress other RNA molecules via sequence-specific binding. Several biological processes are regulated across many organisms by evolutionarily conserved miRNAs. Plants and invertebrates employ their miRNA in defense against viruses by targeting and degrad...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S2-S3

    authors: Carl JW Jr,Trgovcich J,Hannenhalli S

    更新日期:2013-01-01 00:00:00

  • MetaMIS: a metagenomic microbial interaction simulator based on microbial community profiles.

    abstract:BACKGROUND:The complexity and dynamics of microbial communities are major factors in the ecology of a system. With the NGS technique, metagenomics data provides a new way to explore microbial interactions. Lotka-Volterra models, which have been widely used to infer animal interactions in dynamic systems, have recently ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1359-0

    authors: Shaw GT,Pao YY,Wang D

    更新日期:2016-11-25 00:00:00

  • Inferring functional modules of protein families with probabilistic topic models.

    abstract:BACKGROUND:Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context. RESU...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-141

    authors: Konietzny SG,Dietz L,McHardy AC

    更新日期:2011-05-09 00:00:00

  • Detection of transposable elements by their compositional bias.

    abstract:BACKGROUND:Transposable elements (TE) are mobile genetic entities present in nearly all genomes. Previous work has shown that TEs tend to have a different nucleotide composition than the host genes, either considering codon usage bias or dinucleotide frequencies. We show here how these compositional differences can be ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-94

    authors: Andrieu O,Fiston AS,Anxolabéhère D,Quesneville H

    更新日期:2004-07-13 00:00:00

  • MGC: a metagenomic gene caller.

    abstract:BACKGROUND:Computational gene finding algorithms have proven their robustness in identifying genes in complete genomes. However, metagenomic sequencing has presented new challenges due to the incomplete and fragmented nature of the data. During the last few years, attempts have been made to extract complete and incompl...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S9-S6

    authors: El Allali A,Rose JR

    更新日期:2013-01-01 00:00:00

  • Using affinity propagation for identifying subspecies among clonal organisms: lessons from M. tuberculosis.

    abstract:BACKGROUND:Classification and naming is a key step in the analysis, understanding and adequate management of living organisms. However, where to set limits between groups can be puzzling especially in clonal organisms. Within the Mycobacterium tuberculosis complex (MTC), the etiological agent of tuberculosis (TB), expe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-224

    authors: Borile C,Labarre M,Franz S,Sola C,Refrégier G

    更新日期:2011-06-02 00:00:00

  • NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model.

    abstract:BACKGROUND:PacBio sequencing platform offers longer read lengths than the second-generation sequencing technologies. It has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. Due to its extremely wide range of application areas, fast sequencing simulation syste...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2208-0

    authors: Wei ZG,Zhang SW

    更新日期:2018-05-22 00:00:00

  • A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    abstract:BACKGROUND:Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1142-2

    authors: Thakur S,Guttman DS

    更新日期:2016-06-30 00:00:00

  • Natural computation meta-heuristics for the in silico optimization of microbial strains.

    abstract:BACKGROUND:One of the greatest challenges in Metabolic Engineering is to develop quantitative models and algorithms to identify a set of genetic manipulations that will result in a microbial strain with a desirable metabolic phenotype which typically means having a high yield/productivity. This challenge is not only du...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-499

    authors: Rocha M,Maia P,Mendes R,Pinto JP,Ferreira EC,Nielsen J,Patil KR,Rocha I

    更新日期:2008-11-27 00:00:00

  • Colony size measurement of the yeast gene deletion strains for functional genomics.

    abstract:BACKGROUND:Numerous functional genomics approaches have been developed to study the model organism yeast, Saccharomyces cerevisiae, with the aim of systematically understanding the biology of the cell. Some of these techniques are based on yeast growth differences under different conditions, such as those generated by ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-117

    authors: Memarian N,Jessulat M,Alirezaie J,Mir-Rashed N,Xu J,Zareie M,Smith M,Golshani A

    更新日期:2007-04-04 00:00:00

  • Evolutionary Pareto-optimization of stably folding peptides.

    abstract:BACKGROUND:As a rule, peptides are more flexible and unstructured than proteins with their substantial stabilizing hydrophobic cores. Nevertheless, a few stably folding peptides have been discovered. This raises the question whether there may be more such peptides that are unknown as yet. These molecules could be helpf...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-109

    authors: Gronwald W,Hohm T,Hoffmann D

    更新日期:2008-02-19 00:00:00

  • Functional clustering of yeast proteins from the protein-protein interaction network.

    abstract:BACKGROUND:The abundant data available for protein interaction networks have not yet been fully understood. New types of analyses are needed to reveal organizational principles of these networks to investigate the details of functional and regulatory clusters of proteins. RESULTS:In the present work, individual cluste...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-355

    authors: Sen TZ,Kloczkowski A,Jernigan RL

    更新日期:2006-07-24 00:00:00

  • Repliscan: a tool for classifying replication timing regions.

    abstract:BACKGROUND:Replication timing experiments that use label incorporation and high throughput sequencing produce peaked data similar to ChIP-Seq experiments. However, the differences in experimental design, coverage density, and possible results make traditional ChIP-Seq analysis methods inappropriate for use with replica...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1774-x

    authors: Zynda GJ,Song J,Concia L,Wear EE,Hanley-Bowdoin L,Thompson WF,Vaughn MW

    更新日期:2017-08-07 00:00:00

  • A global optimization algorithm for protein surface alignment.

    abstract:BACKGROUND:A relevant problem in drug design is the comparison and recognition of protein binding sites. Binding sites recognition is generally based on geometry often combined with physico-chemical properties of the site since the conformation, size and chemical composition of the protein surface are all relevant for ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-488

    authors: Bertolazzi P,Guerra C,Liuzzi G

    更新日期:2010-09-29 00:00:00

  • Identification of discriminative characteristics for clusters from biologic data with InforBIO software.

    abstract:BACKGROUND:There are a number of different methods for generation of trees and algorithms for phylogenetic analysis in the study of bacterial taxonomy. Genotypic information, such as SSU rRNA gene sequences, now plays a more prominent role in microbial systematics than does phenotypic information. However, the integrat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-281

    authors: Tanaka N,Uchino M,Miyazaki S,Sugawara H

    更新日期:2007-08-02 00:00:00