XenofilteR: computational deconvolution of mouse and human reads in tumor xenograft sequence data.


BACKGROUND:Mouse xenografts from (patient-derived) tumors (PDX) or tumor cell lines are widely used as models to study various biological and preclinical aspects of cancer. However, analyses of their RNA and DNA profiles are challenging, because they comprise reads not only from the grafted human cancer but also from the murine host. The reads of murine origin result in false positives in mutation analysis of DNA samples and obscure gene expression levels when sequencing RNA. However, currently available algorithms are limited and improvements in accuracy and ease of use are necessary. RESULTS:We developed the R-package XenofilteR, which separates mouse from human sequence reads based on the edit-distance between a sequence read and reference genome. To assess the accuracy of XenofilteR, we generated sequence data by in silico mixing of mouse and human DNA sequence data. These analyses revealed that XenofilteR removes > 99.9% of sequence reads of mouse origin while retaining human sequences. This allowed for mutation analysis of xenograft samples with accurate variant allele frequencies, and retrieved all non-synonymous somatic tumor mutations. CONCLUSIONS:XenofilteR accurately dissects RNA and DNA sequences from mouse and human origin, thereby outperforming currently available tools. XenofilteR is open source and available at https://github.com/PeeperLab/XenofilteR .


BMC Bioinformatics


BMC bioinformatics


Kluin RJC,Kemper K,Kuilman T,de Ruiter JR,Iyer V,Forment JV,Cornelissen-Steijger P,de Rink I,Ter Brugge P,Song JY,Klarenbeek S,McDermott U,Jonkers J,Velds A,Adams DJ,Peeper DS,Krijgsman O




Has Abstract


2018-10-04 00:00:00












  • BicPAMS: software for biological data analysis with pattern-based biclustering.

    abstract:BACKGROUND:Biclustering has been largely applied for the unsupervised analysis of biological data, being recognised today as a key technique to discover putative modules in both expression data (subsets of genes correlated in subsets of conditions) and network data (groups of coherently interconnected biological entiti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Henriques R,Ferreira FL,Madeira SC

    更新日期:2017-02-02 00:00:00

  • Approaching the taxonomic affiliation of unidentified sequences in public databases--an example from the mycorrhizal fungi.

    abstract:BACKGROUND:During the last few years, DNA sequence analysis has become one of the primary means of taxonomic identification of species, particularly so for species that are minute or otherwise lack distinct, readily obtainable morphological characters. Although the number of sequences available for comparison in public...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Nilsson RH,Kristiansson E,Ryberg M,Larsson KH

    更新日期:2005-07-18 00:00:00

  • InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes.

    abstract:BACKGROUND:Although many genomic features have been used in the prediction of protein-protein interactions (PPIs), frequently only one is used in a computational method. After realizing the limited power in the prediction using only one genomic feature, investigators are now moving toward integration. So far, there hav...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Sun J,Sun Y,Ding G,Liu Q,Wang C,He Y,Shi T,Li Y,Zhao Z

    更新日期:2007-10-26 00:00:00

  • Deconvolution of gene expression from cell populations across the C. elegans lineage.

    abstract:BACKGROUND:Knowledge of when and in which cells each gene is expressed across multicellular organisms is critical in understanding both gene function and regulation of cell type diversity. However, methods for measuring expression typically involve a trade-off between imaging-based methods, which give the precise locat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Burdick JT,Murray JI

    更新日期:2013-06-22 00:00:00

  • The effect of rare variants on inflation of the test statistics in case-control analyses.

    abstract:BACKGROUND:The detection of bias due to cryptic population structure is an important step in the evaluation of findings of genetic association studies. The standard method of measuring this bias in a genetic association study is to compare the observed median association test statistic to the expected median test stati...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Pirie A,Wood A,Lush M,Tyrer J,Pharoah PD

    更新日期:2015-02-20 00:00:00

  • Kavosh: a new algorithm for finding network motifs.

    abstract:BACKGROUND:Complex networks are studied across many fields of science and are particularly important to understand biological processes. Motifs in networks are small connected sub-graphs that occur significantly in higher frequencies than in random networks. They have recently gathered much attention as a useful concep...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Kashani ZR,Ahrabian H,Elahi E,Nowzari-Dalini A,Ansari ES,Asadi S,Mohammadi S,Schreiber F,Masoudi-Nejad A

    更新日期:2009-10-04 00:00:00

  • Biotite: a unifying open source computational biology framework in Python.

    abstract:BACKGROUND:As molecular biology is creating an increasing amount of sequence and structure data, the multitude of software to analyze this data is also rising. Most of the programs are made for a specific task, hence the user often needs to combine multiple programs in order to reach a goal. This can make the data proc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Kunzmann P,Hamacher K

    更新日期:2018-10-01 00:00:00

  • Application of whole genome data for in silico evaluation of primers and probes routinely employed for the detection of viral species by RT-qPCR using dengue virus as a case study.

    abstract:BACKGROUND:Viral infection by dengue virus is a major public health problem in tropical countries. Early diagnosis and detection are increasingly based on quantitative reverse transcriptase real-time polymerase chain reaction (RT-qPCR) directed against genomic regions conserved between different isolates. Genetic varia...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Vanneste K,Garlant L,Broeders S,Van Gucht S,Roosens NH

    更新日期:2018-09-04 00:00:00

  • Correction to: Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage.

    abstract::Following publication of the original article [1], the author reported that there are several errors in the original article. ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章,已发布勘误


    authors: Ranjard L,Wong TKF,Rodrigo AG

    更新日期:2020-01-22 00:00:00

  • Pripper: prediction of caspase cleavage sites from whole proteomes.

    abstract:BACKGROUND:Caspases are a family of proteases that have central functions in programmed cell death (apoptosis) and inflammation. Caspases mediate their effects through aspartate-specific cleavage of their target proteins, and at present almost 400 caspase substrates are known. There are several methods developed to pre...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Piippo M,Lietzén N,Nevalainen OS,Salmi J,Nyman TA

    更新日期:2010-06-15 00:00:00

  • Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method.

    abstract:BACKGROUND:Many processes in molecular biology involve the recognition of short sequences of nucleic-or amino acids, such as the binding of immunogenic peptides to major histocompatibility complex (MHC) molecules. From experimental data, a model of the sequence specificity of these processes can be constructed, such as...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Peters B,Sette A

    更新日期:2005-05-31 00:00:00

  • The exploration of disease-specific gene regulatory networks in esophageal carcinoma and stomach adenocarcinoma.

    abstract:BACKGROUND:Feed-forward loops (FFLs), consisting of miRNAs, transcription factors (TFs) and their common target genes, have been validated to be important for the initialization and development of complex diseases, including cancer. Esophageal Carcinoma (ESCA) and Stomach Adenocarcinoma (STAD) are two types of malignan...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Qin G,Yang L,Ma Y,Liu J,Huo Q

    更新日期:2019-12-30 00:00:00

  • A semi-parametric statistical model for integrating gene expression profiles across different platforms.

    abstract:BACKGROUND:Determining differentially expressed genes (DEGs) between biological samples is the key to understand how genotype gives rise to phenotype. RNA-seq and microarray are two main technologies for profiling gene expression levels. However, considerable discrepancy has been found between DEGs detected using the t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Lyu Y,Li Q

    更新日期:2016-01-11 00:00:00

  • De novo profile generation based on sequence context specificity with the long short-term memory network.

    abstract:BACKGROUND:Long short-term memory (LSTM) is one of the most attractive deep learning methods to learn time series or contexts of input data. Increasing studies, including biological sequence analyses in bioinformatics, utilize this architecture. Amino acid sequence profiles are widely used for bioinformatics studies, s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Yamada KD,Kinoshita K

    更新日期:2018-07-18 00:00:00

  • TPMS: a set of utilities for querying collections of gene trees.

    abstract:BACKGROUND:The information in large collections of phylogenetic trees is useful for many comparative genomic studies. Therefore, there is a need for flexible tools that allow exploration of such collections in order to retrieve relevant data as quickly as possible. RESULTS:In this paper, we present TPMS (Tree Pattern-...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Bigot T,Daubin V,Lassalle F,Perrière G

    更新日期:2013-03-27 00:00:00

  • An SVD-based comparison of nine whole eukaryotic genomes supports a coelomate rather than ecdysozoan lineage.

    abstract:BACKGROUND:Eukaryotic whole genome sequences are accumulating at an impressive rate. Effective methods for comparing multiple whole eukaryotic genomes on a large scale are needed. Most attempted solutions involve the production of large scale alignments, and many of these require a high stringency pre-screen for putati...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Stuart GW,Berry MW

    更新日期:2004-12-17 00:00:00

  • MultiDCoX: Multi-factor analysis of differential co-expression.

    abstract:BACKGROUND:Differential co-expression (DCX) signifies change in degree of co-expression of a set of genes among different biological conditions. It has been used to identify differential co-expression networks or interactomes. Many algorithms have been developed for single-factor differential co-expression analysis and...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Liany H,Rajapakse JC,Karuturi RKM

    更新日期:2017-12-28 00:00:00

  • Robustness of signal detection in cryo-electron microscopy via a bi-objective-function approach.

    abstract:BACKGROUND:The detection of weak signals and selection of single particles from low-contrast micrographs of frozen hydrated biomolecules by cryo-electron microscopy (cryo-EM) represents a major practical bottleneck in cryo-EM data analysis. Template-based particle picking by an objective function using fast local corre...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Wang WL,Yu Z,Castillo-Menendez LR,Sodroski J,Mao Y

    更新日期:2019-04-03 00:00:00

  • Reporting and connecting cell type names and gating definitions through ontologies.

    abstract:BACKGROUND:Human immunology studies often rely on the isolation and quantification of cell populations from an input sample based on flow cytometry and related techniques. Such techniques classify cells into populations based on the detection of a pattern of markers. The description of the cell populations targeted in ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Overton JA,Vita R,Dunn P,Burel JG,Bukhari SAC,Cheung KH,Kleinstein SH,Diehl AD,Peters B

    更新日期:2019-04-25 00:00:00

  • EGenBio: a data management system for evolutionary genomics and biodiversity.

    abstract:BACKGROUND:Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; http://egenbio.lsu.edu) to begin to address this....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Nahum LA,Reynolds MT,Wang ZO,Faith JJ,Jonna R,Jiang ZJ,Meyer TJ,Pollock DD

    更新日期:2006-09-06 00:00:00

  • Decoding HMMs using the k best paths: algorithms and applications.

    abstract:BACKGROUND:Traditional algorithms for hidden Markov model decoding seek to maximize either the probability of a state path or the number of positions of a sequence assigned to the correct state. These algorithms provide only a single answer and in practice do not produce good results. RESULTS:We explore an alternative...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Brown DG,Golod D

    更新日期:2010-01-18 00:00:00

  • Ontology driven integration platform for clinical and translational research.

    abstract::Semantic Web technologies offer a promising framework for integration of disparate biomedical data. In this paper we present the semantic information integration platform under development at the Center for Clinical and Translational Sciences (CCTS) at the University of Texas Health Science Center at Houston (UTHSC-H)...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Mirhaji P,Zhu M,Vagnoni M,Bernstam EV,Zhang J,Smith JW

    更新日期:2009-02-05 00:00:00

  • ImiRP: a computational approach to microRNA target site mutation.

    abstract:BACKGROUND:MicroRNAs (miRNAs) are small ~22 nucleotide non-coding RNAs that function as post-transcriptional regulators of messenger RNA (mRNA) through base-pairing to 6-8 nucleotide long target sites, usually located within the mRNA 3' untranslated region. A common approach to validate and probe microRNA-mRNA interact...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Ryan BC,Werner TS,Howard PL,Chow RL

    更新日期:2016-04-27 00:00:00

  • Privacy-preserving search for chemical compound databases.

    abstract:BACKGROUND:Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the databas...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Shimizu K,Nuida K,Arai H,Mitsunari S,Attrapadung N,Hamada M,Tsuda K,Hirokawa T,Sakuma J,Hanaoka G,Asai K

    更新日期:2015-01-01 00:00:00

  • MapMi: automated mapping of microRNA loci.

    abstract:BACKGROUND:A large effort to discover microRNAs (miRNAs) has been under way. Currently miRBase is their primary repository, providing annotations of primary sequences, precursors and probable genomic loci. In many cases miRNAs are identical or very similar between related (or in some cases more distant) species. Howeve...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Guerra-Assunção JA,Enright AJ

    更新日期:2010-03-16 00:00:00

  • MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data.

    abstract:BACKGROUND:Mass spectrometry (MS) coupled with online separation methods is commonly applied for differential and quantitative profiling of biological samples in metabolomic as well as proteomic research. Such approaches are used for systems biology, functional genomics, and biomarker discovery, among others. An ongoin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Pluskal T,Castillo S,Villar-Briones A,Oresic M

    更新日期:2010-07-23 00:00:00

  • GraphCrunch: a tool for large network analyses.

    abstract:BACKGROUND:The recent explosion in biological and other real-world network data has created the need for improved tools for large network analyses. In addition to well established global network properties, several new mathematical techniques for analyzing local structural properties of large networks have been develop...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Milenković T,Lai J,Przulj N

    更新日期:2008-01-30 00:00:00

  • SplicerAV: a tool for mining microarray expression data for changes in RNA processing.

    abstract:BACKGROUND:Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but h...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Robinson TJ,Dinan MA,Dewhirst M,Garcia-Blanco MA,Pearson JL

    更新日期:2010-02-25 00:00:00

  • Alternative mapping of probes to genes for Affymetrix chips.

    abstract:BACKGROUND:Short oligonucleotide arrays have several probes measuring the expression level of each target transcript. Therefore the selection of probes is a key component for the quality of measurements. However, once probes have been selected and synthesized on an array, it is still possible to re-evaluate the results...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Gautier L,Møller M,Friis-Hansen L,Knudsen S

    更新日期:2004-08-14 00:00:00

  • Prediction of protein structural class with Rough Sets.

    abstract:BACKGROUND:A new method for the prediction of protein structural classes is constructed based on Rough Sets algorithm, which is a rule-based data mining method. Amino acid compositions and 8 physicochemical properties data are used as conditional attributes for the construction of decision system. After reducing the de...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Cao Y,Liu S,Zhang L,Qin J,Wang J,Tang K

    更新日期:2006-01-14 00:00:00