NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data.

Abstract:

BACKGROUND:Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers. RESULTS:In this study, we developed NextSV, a meta-caller to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purposes. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 score) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5 to 94.1% for deletions and 87.9 to 93.2% for insertions, indicating that ~10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset. CONCLUSIONS:Our results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Fang L,Hu J,Wang D,Wang K

doi

10.1186/s12859-018-2207-1

subject

Has Abstract

pub_date

2018-05-23 00:00:00

pages

180

issue

1

issn

1471-2105

pii

10.1186/s12859-018-2207-1

journal_volume

19

pub_type

杂志文章
  • Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines.

    abstract:BACKGROUND:Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-537

    authors: González AJ,Liao L

    更新日期:2010-10-29 00:00:00

  • Pre-processing Agilent microarray data.

    abstract:BACKGROUND:Pre-processing methods for two-sample long oligonucleotide arrays, specifically the Agilent technology, have not been extensively studied. The goal of this study is to quantify some of the sources of error that affect measurement of expression using Agilent arrays and to compare Agilent's Feature Extraction ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-142

    authors: Zahurak M,Parmigiani G,Yu W,Scharpf RB,Berman D,Schaeffer E,Shabbeer S,Cope L

    更新日期:2007-05-01 00:00:00

  • GOmotif: A web server for investigating the biological role of protein sequence motifs.

    abstract:BACKGROUND:Many proteins contain conserved sequence patterns (motifs) that contribute to their functionality. The process of experimentally identifying and validating novel protein motifs can be difficult, expensive, and time consuming. A means for helping to identify in advance the possible function of a novel motif i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-379

    authors: Bristow F,He R,Van Domselaar G

    更新日期:2011-09-26 00:00:00

  • Integration of shot-gun proteomics and bioinformatics analysis to explore plant hormone responses.

    abstract:BACKGROUND:Multidimensional protein identification technology (MudPIT)-based shot-gun proteomics has been proven to be an effective platform for functional proteomics. In particular, the various sample preparation methods and bioinformatics tools can be integrated to improve the proteomics platform for applications lik...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S15-S8

    authors: Zhang Y,Liu S,Dai SY,Yuan JS

    更新日期:2012-01-01 00:00:00

  • DraGnET: software for storing, managing and analyzing annotated draft genome sequence data.

    abstract:BACKGROUND:New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-100

    authors: Duncan S,Sirkanungo R,Miller L,Phillips GJ

    更新日期:2010-02-22 00:00:00

  • A simple method for assessing sample sizes in microarray experiments.

    abstract:BACKGROUND:In this short article, we discuss a simple method for assessing sample size requirements in microarray experiments. RESULTS:Our method starts with the output from a permutation-based analysis for a set of pilot data, e.g. from the SAM package. Then for a given hypothesized mean difference and various sample...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-106

    authors: Tibshirani R

    更新日期:2006-03-02 00:00:00

  • Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens.

    abstract:BACKGROUND:The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologist...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-264

    authors: Yin Z,Zhou X,Bakal C,Li F,Sun Y,Perrimon N,Wong ST

    更新日期:2008-06-05 00:00:00

  • GC/MS based metabolomics: development of a data mining system for metabolite identification by using soft independent modeling of class analogy (SIMCA).

    abstract:BACKGROUND:The goal of metabolomics analyses is a comprehensive and systematic understanding of all metabolites in biological samples. Many useful platforms have been developed to achieve this goal. Gas chromatography coupled to mass spectrometry (GC/MS) is a well-established analytical method in metabolomics study, an...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-131

    authors: Tsugawa H,Tsujimoto Y,Arita M,Bamba T,Fukusaki E

    更新日期:2011-05-04 00:00:00

  • Automated peptide mapping and protein-topographical annotation of proteomics data.

    abstract:BACKGROUND:In quantitative proteomics, peptide mapping is a valuable approach to combine positional quantitative information with topographical and domain information of proteins. Quantitative proteomic analysis of cell surface shedding is an exemplary application area of this approach. RESULTS:We developed ImproViser...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-207

    authors: Videm P,Gunasekaran D,Schröder B,Mayer B,Biniossek ML,Schilling O

    更新日期:2014-06-19 00:00:00

  • PIGS: improved estimates of identity-by-descent probabilities by probabilistic IBD graph sampling.

    abstract::Identifying segments in the genome of different individuals that are identical-by-descent (IBD) is a fundamental element of genetics. IBD data is used for numerous applications including demographic inference, heritability estimation, and mapping disease loci. Simultaneous detection of IBD over multiple haplotypes has...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S5-S9

    authors: Park DS,Baran Y,Hormozdiari F,Eng C,Torgerson DG,Burchard EG,Zaitlen N

    更新日期:2015-01-01 00:00:00

  • μHEM for identification of differentially expressed miRNAs using hypercuboid equivalence partition matrix.

    abstract:BACKGROUND:The miRNAs, a class of short approximately 22-nucleotide non-coding RNAs, often act post-transcriptionally to inhibit mRNA expression. In effect, they control gene expression by targeting mRNA. They also help in carrying out normal functioning of a cell as they play an important role in various cellular proc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-266

    authors: Paul S,Maji P

    更新日期:2013-09-04 00:00:00

  • Extracting transcription factor binding sites from unaligned gene sequences with statistical models.

    abstract:BACKGROUND:Transcription factor binding sites (TFBSs) are crucial in the regulation of gene transcription. Recently, chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP-chip array) has been used to identify potential regulatory sequences, but the procedure can only map the probable protein-DNA...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S12-S7

    authors: Lu CC,Yuan WH,Chen TM

    更新日期:2008-12-12 00:00:00

  • From SNPs to pathways: integration of functional effect of sequence variations on models of cell signalling pathways.

    abstract:BACKGROUND:Single nucleotide polymorphisms (SNPs) are the most frequent type of sequence variation between individuals, and represent a promising tool for finding genetic determinants of complex diseases and understanding the differences in drug response. In this regard, it is of particular interest to study the effect...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S8-S6

    authors: Bauer-Mehren A,Furlong LI,Rautschka M,Sanz F

    更新日期:2009-08-27 00:00:00

  • Benchmarking the HLA typing performance of Polysolver and Optitype in 50 Danish parental trios.

    abstract:BACKGROUND:The adaptive immune response intrinsically depends on hypervariable human leukocyte antigen (HLA) genes. Concomitantly, correct HLA phenotyping is crucial for successful donor-patient matching in organ transplantation. The cost and technical limitations of current laboratory techniques, together with advance...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2239-6

    authors: Matey-Hernandez ML,Danish Pan Genome Consortium.,Brunak S,Izarzugaza JMG

    更新日期:2018-06-25 00:00:00

  • An assessment of catalytic residue 3D ensembles for the prediction of enzyme function.

    abstract:BACKGROUND:The central element of each enzyme is the catalytic site, which commonly catalyzes a single biochemical reaction with high specificity. It was unclear to us how often sites that catalyze the same or highly similar reactions evolved on different, i. e. non-homologous protein folds and how similar their 3D pos...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0807-6

    authors: Žváček C,Friedrichs G,Heizinger L,Merkl R

    更新日期:2015-11-04 00:00:00

  • Detecting broad domains and narrow peaks in ChIP-seq data with hiddenDomains.

    abstract:BACKGROUND:Correctly identifying genomic regions enriched with histone modifications and transcription factors is key to understanding their regulatory and developmental roles. Conceptually, these regions are divided into two categories, narrow peaks and broad domains, and different algorithms are used to identify each...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-0991-z

    authors: Starmer J,Magnuson T

    更新日期:2016-03-24 00:00:00

  • Graph-representation of oxidative folding pathways.

    abstract:BACKGROUND:The process of oxidative folding combines the formation of native disulfide bond with conformational folding resulting in the native three-dimensional fold. Oxidative folding pathways can be described in terms of disulfide intermediate species (DIS) which can also be isolated and characterized. Each DIS corr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-19

    authors: Agoston V,Cemazar M,Kaján L,Pongor S

    更新日期:2005-01-27 00:00:00

  • Accurate determination of node and arc multiplicities in de bruijn graphs using conditional random fields.

    abstract:BACKGROUND:De Bruijn graphs are key data structures for the analysis of next-generation sequencing data. They efficiently represent the overlap between reads and hence, also the underlying genome sequence. However, sequencing errors and repeated subsequences render the identification of the true underlying sequence dif...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03740-x

    authors: Steyaert A,Audenaert P,Fostier J

    更新日期:2020-09-14 00:00:00

  • Predicting the interactome of Xanthomonas oryzae pathovar oryzae for target selection and DB service.

    abstract:BACKGROUND:Protein-protein interactions (PPIs) play key roles in various cellular functions. In addition, some critical inter-species interactions such as host-pathogen interactions and pathogenicity occur through PPIs. Phytopathogenic bacteria infect hosts through attachment to host tissue, enzyme secretion, exopolysa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-41

    authors: Kim JG,Park D,Kim BC,Cho SW,Kim YT,Park YJ,Cho HJ,Park H,Kim KB,Yoon KO,Park SJ,Lee BM,Bhak J

    更新日期:2008-01-24 00:00:00

  • FITBAR: a web tool for the robust prediction of prokaryotic regulons.

    abstract:BACKGROUND:The binding of regulatory proteins to their specific DNA targets determines the accurate expression of the neighboring genes. The in silico prediction of new binding sites in completely sequenced genomes is a key aspect in the deeper understanding of gene regulatory networks. Several algorithms have been des...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-554

    authors: Oberto J

    更新日期:2010-11-11 00:00:00

  • Detecting disease-associated genotype patterns.

    abstract:BACKGROUND:In addition to single-locus (main) effects of disease variants, there is a growing consensus that gene-gene and gene-environment interactions may play important roles in disease etiology. However, for the very large numbers of genetic markers currently in use, it has proven difficult to develop suitable and ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S1-S75

    authors: Long Q,Zhang Q,Ott J

    更新日期:2009-01-30 00:00:00

  • Bounded search for de novo identification of degenerate cis-regulatory elements.

    abstract:BACKGROUND:The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-cou...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-254

    authors: Carlson JM,Chakravarty A,Khetani RS,Gross RH

    更新日期:2006-05-15 00:00:00

  • Epiviz: a view inside the design of an integrated visual analysis software for genomics.

    abstract:BACKGROUND:Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S11-S4

    authors: Chelaru F,Corrada Bravo H

    更新日期:2015-01-01 00:00:00

  • Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes.

    abstract:BACKGROUND:A major goal of the analysis of high-dimensional RNA expression data from tumor tissue is to identify prognostic signatures for discriminating patient subgroups. For this purpose genome-wide identification of bimodally expressed genes from gene array data is relevant because distinguishability of high and lo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-276

    authors: Hellwig B,Hengstler JG,Schmidt M,Gehrmann MC,Schormann W,Rahnenführer J

    更新日期:2010-05-25 00:00:00

  • Metabolite coupling in genome-scale metabolic networks.

    abstract:BACKGROUND:Biochemically detailed stoichiometric matrices have now been reconstructed for various bacteria, yeast, and for the human cardiac mitochondrion based on genomic and proteomic data. These networks have been manually curated based on legacy data and elementally and charge balanced. Comparative analysis of thes...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-111

    authors: Becker SA,Price ND,Palsson BØ

    更新日期:2006-03-06 00:00:00

  • BioIMAX: a Web 2.0 approach for easy exploratory and collaborative access to multivariate bioimage data.

    abstract:BACKGROUND:Innovations in biological and biomedical imaging produce complex high-content and multivariate image data. For decision-making and generation of hypotheses, scientists need novel information technology tools that enable them to visually explore and analyze the data and to discuss and communicate results or f...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-297

    authors: Loyek C,Rajpoot NM,Khan M,Nattkemper TW

    更新日期:2011-07-21 00:00:00

  • Predicting peptide presentation by major histocompatibility complex class I: an improved machine learning approach to the immunopeptidome.

    abstract:BACKGROUND:To further our understanding of immunopeptidomics, improved tools are needed to identify peptides presented by major histocompatibility complex class I (MHC-I). Many existing tools are limited by their reliance upon chemical affinity data, which is less biologically relevant than sampling by mass spectrometr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2561-z

    authors: Boehm KM,Bhinder B,Raja VJ,Dephoure N,Elemento O

    更新日期:2019-01-05 00:00:00

  • GeneLibrarian: an effective gene-information summarization and visualization system.

    abstract:BACKGROUND:Abundant information about gene products is stored in online searchable databases such as annotation or literature. To efficiently obtain and digest such information, there is a pressing need for automated information-summarization and functional-similarity clustering of genes. RESULTS:We have developed a n...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-392

    authors: Chiang JH,Shin JW,Liu HH,Chin CL

    更新日期:2006-08-29 00:00:00

  • Computational algorithms to predict Gene Ontology annotations.

    abstract:BACKGROUND:Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biologi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S6-S4

    authors: Pinoli P,Chicco D,Masseroli M

    更新日期:2015-01-01 00:00:00

  • The impact of quantitative optimization of hybridization conditions on gene expression analysis.

    abstract:BACKGROUND:With the growing availability of entire genome sequences, an increasing number of scientists can exploit oligonucleotide microarrays for genome-scale expression studies. While probe-design is a major research area, relatively little work has been reported on the optimization of microarray protocols. RESULTS...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-73

    authors: Sykacek P,Kreil DP,Meadows LA,Auburn RP,Fischer B,Russell S,Micklem G

    更新日期:2011-03-14 00:00:00