UMI-Gen: A UMI-based read simulator for variant calling evaluation in paired-end sequencing NGS libraries.

Abstract:

Motivation:With Next Generation Sequencing becoming more affordable every year, NGS technologies asserted themselves as the fastest and most reliable way to detect Single Nucleotide Variants (SNV) and Copy Number Variations (CNV) in cancer patients. These technologies can be used to sequence DNA at very high depths thus allowing to detect abnormalities in tumor cells with very low frequencies. Multiple variant callers are publicly available and are usually efficient at calling out variants. However, when frequencies begin to drop under 1%, the specificity of these tools suffers greatly as true variants at very low frequencies can be easily confused with sequencing or PCR artifacts. The recent use of Unique Molecular Identifiers (UMI) in NGS experiments has offered a way to accurately separate true variants from artifacts. UMI-based variant callers are slowly replacing raw-read based variant callers as the standard method for an accurate detection of variants at very low frequencies. However, benchmarking done in the tools publication are usually realized on real biological data in which real variants are not known, making it difficult to assess their accuracy. Results:We present UMI-Gen, a UMI-based read simulator for targeted sequencing paired-end data. UMI-Gen generates reference reads covering the targeted regions at a user customizable depth. After that, using a number of control files, it estimates the background error rate at each position and then modifies the generated reads to mimic real biological data. Finally, it will insert real variants in the reads from a list provided by the user. Availability:The entire pipeline is available at https://gitlab.com/vincent-sater/umigen under MIT license.

authors

Sater V,Viailly PJ,Lecroq T,Ruminy P,Bérard C,Prieur-Gaston É,Jardin F

doi

10.1016/j.csbj.2020.08.011

subject

Has Abstract

pub_date

2020-08-27 00:00:00

pages

2270-2280

issn

2001-0370

pii

S2001-0370(20)30364-0

journal_volume

18

pub_type

杂志文章
  • Asymmetric Spontaneous Intercalation of Lutein into a Phospholipid Bilayer, a Computational Study.

    abstract::Lutein, a hydroxylated carotenoid, is a pigment synthesised by plants and bacteria. Animals are unable to synthesise lutein, nevertheless, it is present in animal tissues, where its only source is dietary intake. Both in plants and animals, carotenoids are associated mainly with membranes where they carry out importan...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2019.04.001

    authors: Makuch K,Markiewicz M,Pasenkiewicz-Gierula M

    更新日期:2019-04-06 00:00:00

  • Causal inference for the effect of environmental chemicals on chronic kidney disease.

    abstract::The impacts of environmental chemicals on the decline of kidney function have been suggested by a limited number of statistical and animal studies. Thus, those exposures may be modifiable risk factors for chronic kidney disease. Some of the chemicals, such as Perfluoroalkyl acid (PFA), are pervasive throughout our env...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2019.12.001

    authors: Zhao J,Hinton P,Chen J,Jiang J

    更新日期:2019-12-17 00:00:00

  • Alternative splicing: Human disease and quantitative analysis from high-throughput sequencing.

    abstract::Alternative splicing contributes to the majority of protein diversity in higher eukaryotes by allowing one gene to generate multiple distinct protein isoforms. It adds another regulation layer of gene expression. Up to 95% of human multi-exon genes undergo alternative splicing to encode proteins with different functio...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章,评审

    doi:10.1016/j.csbj.2020.12.009

    authors: Jiang W,Chen L

    更新日期:2020-12-24 00:00:00

  • Disease, Models, Variants and Altered Pathways-Journeying RGD Through the Magnifying Glass.

    abstract::Understanding the pathogenesis of disease is instrumental in delineating its progression mechanisms and for envisioning ways to counteract it. In the process, animal models represent invaluable tools for identifying disease-related loci and their genetic components. Amongst them, the laboratory rat is used extensively...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2015.11.006

    authors: Petri V,Hayman GT,Tutaj M,Smith JR,Laulederkind S,Wang SJ,Nigam R,De Pons J,Shimoyama M,Dwinell MR

    更新日期:2015-11-26 00:00:00

  • Statistical methods for the analysis of high-throughput metabolomics data.

    abstract::Metabolomics is a relatively new high-throughput technology that aims at measuring all endogenous metabolites within a biological sample in an unbiased fashion. The resulting metabolic profiles may be regarded as functional signatures of the physiological state, and have been shown to comprise effects of genetic regul...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章,评审

    doi:10.5936/csbj.201301009

    authors: Bartel J,Krumsiek J,Theis FJ

    更新日期:2013-03-22 00:00:00

  • ASJA: A Program for Assembling Splice Junctions Analysis.

    abstract::RNA splicing may generate different kinds of splice junctions, such as linear, back-splice and fusion junctions. Only a limited number of programs are available for detection and quantification of splice junctions. Here, we present Assembling Splice Junctions Analysis (ASJA), a software package that identifies and cha...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2019.08.001

    authors: Zhao J,Li Q,Li Y,He X,Zheng Q,Huang S

    更新日期:2019-08-07 00:00:00

  • Engineering microbes for plant polyketide biosynthesis.

    abstract::Polyketides are an important group of secondary metabolites, many of which have important industrial applications in the food and pharmaceutical industries. Polyketides are synthesized from one of three classes of enzymes differentiated by their biochemical features and product structure: type I, type II or type III p...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章,评审

    doi:10.5936/csbj.201210020

    authors: Lussier FX,Colatriano D,Wiltshire Z,Page JE,Martin VJ

    更新日期:2013-02-22 00:00:00

  • Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions.

    abstract::The long noncoding RNAs (lncRNAs) are ubiquitous in organisms and play crucial role in a variety of biological processes and complex diseases. Emerging evidences suggest that lncRNAs interact with corresponding proteins to perform their regulatory functions. Therefore, identifying interacting lncRNA-protein pairs is t...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2019.11.004

    authors: Yi HC,You ZH,Cheng L,Zhou X,Jiang TH,Li X,Wang YB

    更新日期:2019-11-30 00:00:00

  • Virulence factor-related gut microbiota genes and immunoglobulin A levels as novel markers for machine learning-based classification of autism spectrum disorder.

    abstract::Autism spectrum disorder (ASD) is a neurodevelopmental condition for which early identification and intervention is crucial for optimum prognosis. Our previous work showed gut Immunoglobulin A (IgA) to be significantly elevated in the gut lumen of children with ASD compared to typically developing (TD) children. Gut m...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2020.12.012

    authors: Wang M,Doenyas C,Wan J,Zeng S,Cai C,Zhou J,Liu Y,Yin Z,Zhou W

    更新日期:2020-12-29 00:00:00

  • Nasal microbiome research in ANCA-associated vasculitis: Strengths, limitations, and future directions.

    abstract::The human nasal microbiome is characterized by biodiversity and undergoes changes during the span of life. In granulomatosis with polyangiitis (GPA), the persistent nasal colonization by Staphylococcus aureus (S. aureus) assessed by culture-based detection methods has been associated with increased relapse frequency. ...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章,评审

    doi:10.1016/j.csbj.2020.12.031

    authors: Kronbichler A,Harrison EM,Wagner J

    更新日期:2020-12-27 00:00:00

  • Transcriptomics in the tropics: Total RNA-based profiling of Costa Rican bromeliad-associated communities.

    abstract::RNA-Seq was used to examine the microbial, eukaryotic, and viral communities in water catchments ('tanks') formed by tropical bromeliads from Costa Rica. In total, transcripts with taxonomic affiliation to a wide array of bacteria, archaea, and eukaryotes, were observed, as well as RNA-viruses that appeared related to...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2014.12.001

    authors: Goffredi SK,Jang GE,Haroon MF

    更新日期:2014-12-13 00:00:00

  • On fusion methods for knowledge discovery from multi-omics datasets.

    abstract::Recent years have witnessed the tendency of measuring a biological sample on multiple omics scales for a comprehensive understanding of how biological activities on varying levels are perturbed by genetic variants, environments, and their interactions. This new trend raises substantial challenges to data integration a...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章,评审

    doi:10.1016/j.csbj.2020.02.011

    authors: Baldwin E,Han J,Luo W,Zhou J,An L,Liu J,Zhang HH,Li H

    更新日期:2020-03-05 00:00:00

  • Directional Switching Mechanism of the Bacterial Flagellar Motor.

    abstract::Bacteria sense temporal changes in extracellular stimuli via sensory signal transducers and move by rotating flagella towards into a favorable environment for their survival. Each flagellum is a supramolecular motility machine consisting of a bi-directional rotary motor, a universal joint and a helical propeller. The ...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章,评审

    doi:10.1016/j.csbj.2019.07.020

    authors: Minamino T,Kinoshita M,Namba K

    更新日期:2019-07-31 00:00:00

  • Discovery of a novel (R)-selective bacterial hydroxynitrile lyase from Acidobacterium capsulatum.

    abstract::Hydroxynitrile lyases (HNLs) are powerful carbon-carbon bond forming enzymes. The reverse of their natural reaction - the stereoselective addition of hydrogen cyanide (HCN) to carbonyls - yields chiral cyanohydrins, versatile building blocks for the pharmaceutical and chemical industry. Recently, bacterial HNLs have b...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2014.07.002

    authors: Wiedner R,Gruber-Khadjawi M,Schwab H,Steiner K

    更新日期:2014-07-08 00:00:00

  • Of mice and men: Dissecting the interaction between Listeria monocytogenes Internalin A and E-cadherin.

    abstract::We report a study of the interaction between internalin A (inlA) and human or murine E-cadherin (Ecad). inlA is used by Listeria monocytogenes to internalize itself into host cell, but the bacterium is unable to invade murine cells, which has been attributed to the difference in sequence between hEcad and mEcad. Using...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.5936/csbj.201303022

    authors: Genheden S,Eriksson LA

    更新日期:2013-12-15 00:00:00

  • Reactive oxygen species: A generalist in regulating development and pathogenicity of phytopathogenic fungi.

    abstract::Reactive oxygen species (ROS) are small molecules with high oxidative activity, and are usually produced as byproducts of metabolic processes in organisms. ROS play an important role during the interaction between plant hosts and pathogenic fungi. Phytopathogenic fungi have evolved sophisticated ROS producing and scav...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章,评审

    doi:10.1016/j.csbj.2020.10.024

    authors: Zhang Z,Chen Y,Li B,Chen T,Tian S

    更新日期:2020-11-04 00:00:00

  • Classification and substrate head-group specificity of membrane fatty acid desaturases.

    abstract::Membrane fatty acid desaturases are a diverse superfamily of enzymes that catalyze the introduction of double bonds into fatty acids. They are essential in a range of metabolic processes, such as the production of omega-3 fatty acids. However, our structure-function understanding of this superfamily is still developin...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2016.08.003

    authors: Li D,Moorman R,Vanhercke T,Petrie J,Singh S,Jackson CJ

    更新日期:2016-09-12 00:00:00

  • Prediction of Ligand Transport along Hydrophobic Enzyme Nanochannels.

    abstract::Buried active sites of enzymes are connected to the bulk solvent through a network of hydrophobic channels. We developed a discretized model that can accurately predict ligand transport along hydrophobic channels up to six orders of magnitude faster than any other existing method. The non-dimensional nature of the mod...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2019.06.001

    authors: Escalante DE,Aksan A

    更新日期:2019-06-11 00:00:00

  • In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models.

    abstract::Industrial bioreactors range from 10.000 to 700.000 L and characteristically show different zones of substrate availabilities, dissolved gas concentrations and pH values reflecting physical, technical and economic constraints of scale-up. Microbial producers are fluctuating inside the bioreactors thereby experiencing ...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章,评审

    doi:10.1016/j.csbj.2018.06.002

    authors: Zieringer J,Takors R

    更新日期:2018-07-06 00:00:00

  • Internal Transcribed Spacer 1 (ITS1) based sequence typing reveals phylogenetically distinct Ascaris population.

    abstract::Taxonomic differentiation among morphologically identical Ascaris species is a debatable scientific issue in the context of Ascariasis epidemiology. To explain the disease epidemiology and also the taxonomic position of different Ascaris species, genome information of infecting strains from endemic areas throughout th...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2015.08.006

    authors: Das K,Chowdhury P,Ganguly S

    更新日期:2015-09-04 00:00:00

  • Catch me if you can: Leukemia Escape after CD19-Directed T Cell Immunotherapies.

    abstract::Immunotherapy is the revolution in cancer treatment of this last decade. Among multiple approaches able to harness the power of the immune system against cancer, T cell based immunotherapies represent one of the most successful examples. In particular, biotechnological engineering of protein structures, like the T cel...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章,评审

    doi:10.1016/j.csbj.2016.09.003

    authors: Ruella M,Maus MV

    更新日期:2016-09-28 00:00:00

  • Leveraging biological replicates to improve analysis in ChIP-seq experiments.

    abstract::ChIP-seq experiments identify genome-wide profiles of DNA-binding molecules including transcription factors, enzymes and epigenetic marks. Biological replicates are critical for reliable site discovery and are required for the deposition of data in the ENCODE and modENCODE projects. While early reports suggested two r...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.5936/csbj.201401002

    authors: Yang Y,Fear J,Hu J,Haecker I,Zhou L,Renne R,Bloom D,McIntyre LM

    更新日期:2014-01-31 00:00:00

  • Effect of mutations on the thermostability of Aspergillus aculeatus β-1,4-galactanase.

    abstract::New variants of β-1,4-galactanase from the mesophilic organism Aspergillus aculeatus were designed using the structure of β-1,4-galactanase from the thermophile organism Myceliophthora thermophila as a template. Some of the variants were generated using PROPKA 3.0, a validated pKa prediction tool, to test its usefulne...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2015.03.010

    authors: Torpenholt S,De Maria L,Olsson MH,Christensen LH,Skjøt M,Westh P,Jensen JH,Lo Leggio L

    更新日期:2015-04-09 00:00:00

  • Computational drug repurposing for inflammatory bowel disease using genetic information.

    abstract::As knowledge of the genetics behind inflammatory bowel disease (IBD) has continually improved, there has been a demand for methods that can use this data in a clinically significant way. Genome-wide association analyses for IBD have identified 232 risk genetic loci for the disorder. While identification of these risk ...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2019.01.001

    authors: Grenier L,Hu P

    更新日期:2019-01-07 00:00:00

  • Network analysis of human post-mortem microarrays reveals novel genes, microRNAs, and mechanistic scenarios of potential importance in fighting huntington's disease.

    abstract::Huntington's disease is a progressive neurodegenerative disorder characterized by motor disturbances, cognitive decline, and neuropsychiatric symptoms. In this study, we utilized network-based analysis in an attempt to explore and understand the underlying molecular mechanism and to identify critical molecular players...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2016.02.001

    authors: Chandrasekaran S,Bonchev D

    更新日期:2016-02-10 00:00:00

  • An Artificial Neural Network Integrated Pipeline for Biomarker Discovery Using Alzheimer's Disease as a Case Study.

    abstract::The field of machine learning has allowed researchers to generate and analyse vast amounts of data using a wide variety of methodologies. Artificial Neural Networks (ANN) are some of the most commonly used statistical models and have been successful in biomarker discovery studies in multiple disease types. This review...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章,评审

    doi:10.1016/j.csbj.2018.02.001

    authors: Zafeiris D,Rutella S,Ball GR

    更新日期:2018-02-21 00:00:00

  • Managing children with brain tumors during the COVID-19 era: Don't stop the care!

    abstract::The COVID-19 pandemic has substantially stressed health care systems globally, subsequently reducing cancer care services and delaying treatments. Pediatric populations infected by COVID-19 have shown mild clinical symptoms compared to adults, perhaps due to decreased susceptibility. Several scientific societies and g...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章,评审

    doi:10.1016/j.csbj.2021.01.005

    authors: Capozza MA,Triarico S,Attinà G,Romano A,Mastrangelo S,Maurizi P,Frassanito P,Bianchi F,Verdolotti T,Gessi M,Balducci M,Massimi L,Tamburrini G,Ruggiero A,Gemelli Pediatric Neuro-Oncology Tumor Board.

    更新日期:2021-01-12 00:00:00

  • SNP2Structure: A Public and Versatile Resource for Mapping and Three-Dimensional Modeling of Missense SNPs on Human Protein Structures.

    abstract::One of the long-standing challenges in biology is to understand how non-synonymous single nucleotide polymorphisms (nsSNPs) change protein structure and further affect their function. While it is impractical to solve all the mutated protein structures experimentally, it is quite feasible to model the mutated structure...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2015.09.002

    authors: Wang D,Song L,Singh V,Rao S,An L,Madhavan S

    更新日期:2015-09-30 00:00:00

  • Combining Ramachandran plot and molecular dynamics simulation for structural-based variant classification: Using TP53 variants as model.

    abstract::The wide application of new DNA sequencing technologies is generating vast quantities of genetic variation data at unprecedented speed. Developing methodologies to decode the pathogenicity of the variants is imperatively demanding. We hypothesized that as deleterious variants may function through disturbing structural...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章

    doi:10.1016/j.csbj.2020.11.041

    authors: Tam B,Sinha S,Wang SM

    更新日期:2020-12-02 00:00:00

  • Protein electrostatics: From computational and structural analysis to discovery of functional fingerprints and biotechnological design.

    abstract::Computationally driven engineering of proteins aims to allow them to withstand an extended range of conditions and to mediate modified or novel functions. Therefore, it is crucial to the biotechnological industry, to biomedicine and to afford new challenges in environmental sciences, such as biocatalysis for green che...

    journal_title:Computational and structural biotechnology journal

    pub_type: 杂志文章,评审

    doi:10.1016/j.csbj.2020.06.029

    authors: Vascon F,Gasparotto M,Giacomello M,Cendron L,Bergantino E,Filippini F,Righetto I

    更新日期:2020-06-30 00:00:00