SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines.

Abstract:

BACKGROUND:Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for other species. Objectives of this work included: a) Creating an automated, standardized pipeline for SV prediction. b) Identifying the best tool(s) for SV prediction through benchmarking. c) Providing a statistically sound method for merging SV calls. RESULTS:The SV-AUTOPILOT meta-tool platform is an automated pipeline for standardization of SV prediction and SV tool development in paired-end next-generation sequencing (NGS) analysis. SV-AUTOPILOT comes in the form of a virtual machine, which includes all datasets, tools and algorithms presented here. The virtual machine easily allows one to add, replace and update genomes, SV callers and post-processing routines and therefore provides an easy, out-of-the-box environment for complex SV discovery tasks. SV-AUTOPILOT was used to make a direct comparison between 7 popular SV tools on the Arabidopsis thaliana genome using the Landsberg (Ler) ecotype as a standardized dataset. Recall and precision measurements suggest that Pindel and Clever were the most adaptable to this dataset across all size ranges while Delly performed well for SVs larger than 250 nucleotides. A novel, statistically-sound merging process, which can control the false discovery rate, reduced the false positive rate on the Arabidopsis benchmark dataset used here by >60%. CONCLUSION:SV-AUTOPILOT provides a meta-tool platform for future SV tool development and the benchmarking of tools on other genomes using a standardized pipeline. It optimizes detection of SVs in non-human genomes using statistically robust merging. The benchmarking in this study has demonstrated the power of 7 different SV tools for analyzing different size classes and types of structural variants. The optional merge feature enriches the call set and reduces false positives providing added benefit to researchers planning to validate SVs. SV-AUTOPILOT is a powerful, new meta-tool for biologists as well as SV tool developers.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Leung WY,Marschall T,Paudel Y,Falquet L,Mei H,Schönhuth A,Maoz Moss TY

doi

10.1186/s12864-015-1376-9

subject

Has Abstract

pub_date

2015-03-25 00:00:00

pages

238

issn

1471-2164

journal_volume

16

pub_type

杂志文章
  • Tracking the adoption of bread wheat varieties in Afghanistan using DNA fingerprinting.

    abstract:BACKGROUND:Wheat is the most important staple crop in Afghanistan and accounts for the main part of cereal production. However, wheat production has been unstable during the last decades and the country depends on seed imports. Wheat research in Afghanistan has emphasized releases of new, high-yielding and disease resi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6015-4

    authors: Dreisigacker S,Sharma RK,Huttner E,Karimov A,Obaidi MQ,Singh PK,Sansaloni C,Shrestha R,Sonder K,Braun HJ

    更新日期:2019-08-19 00:00:00

  • Comparison of gene expression of Paramecium bursaria with and without Chlorella variabilis symbionts.

    abstract:BACKGROUND:The ciliate Paramecium bursaria harbors several hundred cells of the green-alga Chlorella sp. in their cytoplasm. Irrespective of the mutual relation between P. bursaria and the symbiotic algae, both cells retain the ability to grow without the partner. They can easily reestablish endosymbiosis when put in c...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-183

    authors: Kodama Y,Suzuki H,Dohra H,Sugii M,Kitazume T,Yamaguchi K,Shigenobu S,Fujishima M

    更新日期:2014-03-10 00:00:00

  • Identification of transcription factors potential related to brown planthopper resistance in rice via microarray expression profiling.

    abstract:BACKGROUND:Brown planthopper (BPH), Nilaparvata lugens Stål, is one of the most destructive insect pests of rice. The molecular responses of plants to sucking insects resemble responses to pathogen infection. However, the molecular mechanism of BPH-resistance in rice remains unclear. Transcription factors (TF) are up-s...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-687

    authors: Wang Y,Guo H,Li H,Zhang H,Miao X

    更新日期:2012-12-10 00:00:00

  • Genome sequence of an aflatoxigenic pathogen of Argentinian peanut, Aspergillus arachidicola.

    abstract:BACKGROUND:Aspergillus arachidicola is an aflatoxigenic fungal species, first isolated from the leaves of a wild peanut species native to Argentina. It has since been reported in maize, Brazil nut and human sputum samples. This aflatoxigenic species is capable of secreting both B and G aflatoxins, similar to A. parasit...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4576-2

    authors: Moore GG,Mack BM,Beltz SB,Puel O

    更新日期:2018-03-09 00:00:00

  • Tandem repeats derived from centromeric retrotransposons.

    abstract:BACKGROUND:Tandem repeats are ubiquitous and abundant in higher eukaryotic genomes and constitute, along with transposable elements, much of DNA underlying centromeres and other heterochromatic domains. In maize, centromeric satellite repeat (CentC) and centromeric retrotransposons (CR), a class of Ty3/gypsy retrotrans...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-142

    authors: Sharma A,Wolfgruber TK,Presting GG

    更新日期:2013-03-04 00:00:00

  • Transcriptome sequencing of a keystone aquatic herbivore yields insights on the temperature-dependent metabolism of essential lipids.

    abstract:BACKGROUND:Nutritional quality of phytoplankton is a major determinant of the trophic transfer efficiency at the plant-herbivore interface in freshwater food webs. In particular, the phytoplankton's content of the essential polyunsaturated omega-3 fatty acid eicosapentaenoic acid (EPA) has been repeatedly shown to limi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6268-y

    authors: Windisch HS,Fink P

    更新日期:2019-11-21 00:00:00

  • Estimating the total genome length of a metagenomic sample using k-mers.

    abstract:BACKGROUND:Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5467-x

    authors: Hua K,Zhang X

    更新日期:2019-04-04 00:00:00

  • Insights into the Musa genome: syntenic relationships to rice and between Musa species.

    abstract:BACKGROUND:Musa species (Zingiberaceae, Zingiberales) including bananas and plantains are collectively the fourth most important crop in developing countries. Knowledge concerning Musa genome structure and the origin of distinct cultivars has greatly increased over the last few years. Until now, however, no large-scale...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-58

    authors: Lescot M,Piffanelli P,Ciampi AY,Ruiz M,Blanc G,Leebens-Mack J,da Silva FR,Santos CM,D'Hont A,Garsmeur O,Vilarinhos AD,Kanamori H,Matsumoto T,Ronning CM,Cheung F,Haas BJ,Althoff R,Arbogast T,Hine E,Pappas GJ Jr,Sas

    更新日期:2008-01-30 00:00:00

  • Differential gene expression and alternative splicing in insect immune specificity.

    abstract:BACKGROUND:Ecological studies routinely show genotype-genotype interactions between insects and their parasites. The mechanisms behind these interactions are not clearly understood. Using the bumblebee Bombus terrestris/trypanosome Crithidia bombi model system (two bumblebee colonies by two Crithidia strains), we have ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-1031

    authors: Riddell CE,Lobaton Garces JD,Adams S,Barribeau SM,Twell D,Mallon EB

    更新日期:2014-11-27 00:00:00

  • Celiac disease T-cell epitopes from gamma-gliadins: immunoreactivity depends on the genome of origin, transcript frequency, and flanking protein variation.

    abstract:BACKGROUND:Celiac disease (CD) is caused by an uncontrolled immune response to gluten, a heterogeneous mixture of wheat storage proteins. The CD-toxicity of these proteins and their derived peptides is depending on the presence of specific T-cell epitopes (9-mer peptides; CD epitopes) that mediate the stimulation of HL...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-277

    authors: Salentijn EM,Mitea DC,Goryunova SV,van der Meer IM,Padioleau I,Gilissen LJ,Koning F,Smulders MJ

    更新日期:2012-06-22 00:00:00

  • Gene expression patterns that predict sensitivity to epidermal growth factor receptor tyrosine kinase inhibitors in lung cancer cell lines and human lung tumors.

    abstract:BACKGROUND:Increased focus surrounds identifying patients with advanced non-small cell lung cancer (NSCLC) who will benefit from treatment with epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKI). EGFR mutation, gene copy number, coexpression of ErbB proteins and ligands, and epithelial to mesenchy...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-7-289

    authors: Balko JM,Potti A,Saunders C,Stromberg A,Haura EB,Black EP

    更新日期:2006-11-10 00:00:00

  • Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia.

    abstract:BACKGROUND:This paper presents a retrospective statistical study on the newly-released data set by the Stanley Neuropathology Consortium on gene expression in bipolar disorder and schizophrenia. This data set contains gene expression data as well as limited demographic and clinical data for each subject. Previous studi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-531

    authors: Struyf J,Dobrin S,Page D

    更新日期:2008-11-07 00:00:00

  • KSP: an integrated method for predicting catalyzing kinases of phosphorylation sites in proteins.

    abstract:BACKGROUND:Protein phosphorylation by kinases plays crucial roles in various biological processes including signal transduction and tumorigenesis, thus a better understanding of protein phosphorylation events in cells is fundamental for studying protein functions and designing drugs to treat diseases caused by the malf...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-06895-2

    authors: Ma H,Li G,Su Z

    更新日期:2020-08-04 00:00:00

  • Genome-wide survey of two-component signal transduction systems in the plant growth-promoting bacterium Azospirillum.

    abstract:BACKGROUND:Two-component systems (TCS) play critical roles in sensing and responding to environmental cues. Azospirillum is a plant growth-promoting rhizobacterium living in the rhizosphere of many important crops. Despite numerous studies about its plant beneficial properties, little is known about how the bacterium s...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1962-x

    authors: Borland S,Oudart A,Prigent-Combaret C,Brochier-Armanet C,Wisniewski-Dyé F

    更新日期:2015-10-22 00:00:00

  • SCUD: Saccharomyces cerevisiae ubiquitination database.

    abstract:BACKGROUND:Ubiquitination is an important post-translational modification involved in diverse biological processes. Therefore, genomewide representation of the ubiquitination system for a species is important. DESCRIPTION:SCUD is a web-based database for the ubiquitination system in Saccharomyces cerevisiae (Baker's y...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-440

    authors: Lee WC,Lee M,Jung JW,Kim KP,Kim D

    更新日期:2008-09-24 00:00:00

  • Investigation of protein secretion and secretion stress in Ashbya gossypii.

    abstract:BACKGROUND:Ashbya gossypii is a filamentous Saccharomycete used for the industrial production of riboflavin that has been recently explored as a host system for recombinant protein production. To gain insight into the protein secretory pathway of this biotechnologically relevant fungus, we undertook genome-wide analyse...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-1137

    authors: Aguiar TQ,Ribeiro O,Arvas M,Wiebe MG,Penttilä M,Domingues L

    更新日期:2014-12-18 00:00:00

  • Microarray analysis of response of Salmonella during infection of HLA-B27- transfected human macrophage-like U937 cells.

    abstract:BACKGROUND:Human leukocyte antigen (HLA)-B27 is strongly associated with the development of reactive arthritis (ReA) in humans after salmonellosis. Human monocytic U937 cells transfected with HLA-B27 are less able to eliminate intracellular Salmonella enterica serovar Enteritidis than those transfected with control HLA...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-456

    authors: Ge S,Danino V,He Q,Hinton JC,Granfors K

    更新日期:2010-07-30 00:00:00

  • Genomic analyses of pneumococci reveal a wide diversity of bacteriocins - including pneumocyclicin, a novel circular bacteriocin.

    abstract:BACKGROUND:One of the most important global pathogens infecting all age groups is Streptococcus pneumoniae (the 'pneumococcus'). Pneumococci reside in the paediatric nasopharynx, where they compete for space and resources, and one competition strategy is to produce a bacteriocin (antimicrobial peptide or protein) to at...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1729-4

    authors: Bogaardt C,van Tonder AJ,Brueggemann AB

    更新日期:2015-07-28 00:00:00

  • Mitochondrial genome deletions and minicircles are common in lice (Insecta: Phthiraptera).

    abstract:BACKGROUND:The gene composition, gene order and structure of the mitochondrial genome are remarkably stable across bilaterian animals. Lice (Insecta: Phthiraptera) are a major exception to this genomic stability in that the canonical single chromosome with 37 genes found in almost all other bilaterians has been lost in...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-394

    authors: Cameron SL,Yoshizawa K,Mizukoshi A,Whiting MF,Johnson KP

    更新日期:2011-08-04 00:00:00

  • Proteome-wide analysis of Anopheles culicifacies mosquito midgut: new insights into the mechanism of refractoriness.

    abstract:BACKGROUND:Midgut invasion, a major bottleneck for malaria parasites transmission is considered as a potential target for vector-parasite interaction studies. New intervention strategies are required to explore the midgut proteins and their potential role in refractoriness for malaria control in Anopheles mosquitoes. T...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4729-3

    authors: Vijay S,Rawal R,Kadian K,Singh J,Adak T,Sharma A

    更新日期:2018-05-08 00:00:00

  • Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing.

    abstract:BACKGROUND:Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearra...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-561

    authors: Jo YD,Choi Y,Kim DH,Kim BD,Kang BC

    更新日期:2014-07-04 00:00:00

  • Fine mapping and RNA-Seq unravels candidate genes for a major QTL controlling multiple fiber quality traits at the T1 region in upland cotton.

    abstract:BACKGROUND:Improving fiber quality is a major challenge in cotton breeding, since the molecular basis of fiber quality traits is poorly understood. Fine mapping and candidate gene prediction of quantitative trait loci (QTL) controlling cotton fiber quality traits can help to elucidate the molecular basis of fiber quali...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2605-6

    authors: Liu D,Zhang J,Liu X,Wang W,Liu D,Teng Z,Fang X,Tan Z,Tang S,Yang J,Zhong J,Zhang Z

    更新日期:2016-04-19 00:00:00

  • Genome-wide structural and evolutionary analysis of the P450 monooxygenase genes (P450ome) in the white rot fungus Phanerochaete chrysosporium: evidence for gene duplications and extensive gene clustering.

    abstract:BACKGROUND:Phanerochaete chrysosporium, the model white rot basidiomycetous fungus, has the extraordinary ability to mineralize (to CO2) lignin and detoxify a variety of chemical pollutants. Its cytochrome P450 monooxygenases have recently been implied in several of these biotransformations. Our initial P450 cloning ef...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-6-92

    authors: Doddapaneni H,Chakraborty R,Yadav JS

    更新日期:2005-06-14 00:00:00

  • A deconvolution method and its application in analyzing the cellular fractions in acute myeloid leukemia samples.

    abstract:BACKGROUND:The identification of cell type-specific genes (markers) is an essential step for the deconvolution of the cellular fractions, primarily, from the gene expression data of a bulk sample. However, the genes with significant changes identified by pair-wise comparisons cannot indeed represent the specificity of ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-06888-1

    authors: Li H,Sharma A,Ming W,Sun X,Liu H

    更新日期:2020-09-23 00:00:00

  • Automatic B cell lymphoma detection using flow cytometry data.

    abstract:BACKGROUND:Flow cytometry has been widely used for the diagnosis of various hematopoietic diseases. Although there have been advances in the number of biomarkers that can be analyzed simultaneously and technologies that enable fast performance, the diagnostic data are still interpreted by a manual gating strategy. The ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-S7-S1

    authors: Shih MC,Huang SH,Donohue R,Chang CC,Zu Y

    更新日期:2013-01-01 00:00:00

  • Identification and analysis of long non-coding RNAs and mRNAs in chicken macrophages infected with avian infectious bronchitis coronavirus.

    abstract:BACKGROUND:Avian infectious bronchitis virus (IBV) is a gamma coronavirus that severely affects the poultry industry worldwide. Long non-coding RNAs (lncRNAs), a subset of non-coding RNAs with a length of more than 200 nucleotides, have been recently recognized as pivotal factors in the pathogenesis of viral infections...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-07359-3

    authors: Li H,Cui P,Fu X,Zhang L,Yan W,Zhai Y,Lei C,Wang H,Yang X

    更新日期:2021-01-20 00:00:00

  • A fast detection of fusion genes from paired-end RNA-seq data.

    abstract:BACKGROUND:Fusion genes are known to be drivers of many common cancers, so they are potential markers for diagnosis, prognosis or therapy response. The advent of paired-end RNA sequencing enhances our ability to discover fusion genes. While there are available methods, routine analyses of large number of samples are st...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5156-1

    authors: Vu TN,Deng W,Trac QT,Calza S,Hwang W,Pawitan Y

    更新日期:2018-11-01 00:00:00

  • Genome-based analysis for the identification of genes involved in o-xylene degradation in Rhodococcus opacus R7.

    abstract:BACKGROUND:Bacteria belonging to the Rhodococcus genus play an important role in the degradation of many contaminants, including methylbenzenes. These bacteria, widely distributed in the environment, are known to be a powerhouse of numerous degradation functions, due to their ability to metabolize a wide range of organ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4965-6

    authors: Di Canito A,Zampolli J,Orro A,D'Ursi P,Milanesi L,Sello G,Steinbüchel A,Di Gennaro P

    更新日期:2018-08-06 00:00:00

  • Selective constraint, background selection, and mutation accumulation variability within and between human populations.

    abstract:BACKGROUND:Regions of the genome that are under evolutionary constraint across multiple species have previously been used to identify functional sequences in the human genome. Furthermore, it is known that there is an inverse relationship between evolutionary constraint and the allele frequency of a mutation segregatin...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-495

    authors: Hodgkinson A,Casals F,Idaghdour Y,Grenier JC,Hernandez RD,Awadalla P

    更新日期:2013-07-23 00:00:00

  • GiSAO.db: a database for ageing research.

    abstract:BACKGROUND:Age-related gene expression patterns of Homo sapiens as well as of model organisms such as Mus musculus, Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster are a basis for understanding the genetic mechanisms of ageing. For an effective analysis and interpretation of expression prof...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-262

    authors: Hofer E,Laschober GT,Hackl M,Thallinger GG,Lepperdinger G,Grillari J,Jansen-Dürr P,Trajanoski Z

    更新日期:2011-05-24 00:00:00