Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers.

Abstract:

BACKGROUND:RNA-seq and small RNA-seq are powerful, quantitative tools to study gene regulation and function. Common high-throughput sequencing methods rely on polymerase chain reaction (PCR) to expand the starting material, but not every molecule amplifies equally, causing some to be overrepresented. Unique molecular identifiers (UMIs) can be used to distinguish undesirable PCR duplicates derived from a single molecule and identical but biologically meaningful reads from different molecules. RESULTS:We have incorporated UMIs into RNA-seq and small RNA-seq protocols and developed tools to analyze the resulting data. Our UMIs contain stretches of random nucleotides whose lengths sufficiently capture diverse molecule species in both RNA-seq and small RNA-seq libraries generated from mouse testis. Our approach yields high-quality data while allowing unique tagging of all molecules in high-depth libraries. CONCLUSIONS:Using simulated and real datasets, we demonstrate that our methods increase the reproducibility of RNA-seq and small RNA-seq data. Notably, we find that the amount of starting material and sequencing depth, but not the number of PCR cycles, determine PCR duplicate frequency. Finally, we show that computational removal of PCR duplicates based only on their mapping coordinates introduces substantial bias into data analysis.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Fu Y,Wu PH,Beane T,Zamore PD,Weng Z

doi

10.1186/s12864-018-4933-1

subject

Has Abstract

pub_date

2018-07-13 00:00:00

pages

531

issue

1

issn

1471-2164

pii

10.1186/s12864-018-4933-1

journal_volume

19

pub_type

杂志文章
  • Identification of small RNAs in Francisella tularensis.

    abstract:BACKGROUND:Regulation of bacterial gene expression by small RNAs (sRNAs) have proved to be important for many biological processes. Francisella tularensis is a highly pathogenic Gram-negative bacterium that causes the disease tularaemia in humans and animals. Relatively little is known about the regulatory networks exi...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-625

    authors: Postic G,Frapy E,Dupuis M,Dubail I,Livny J,Charbit A,Meibom KL

    更新日期:2010-11-10 00:00:00

  • Urinary proteomic and non-prefractionation quantitative phosphoproteomic analysis during pregnancy and non-pregnancy.

    abstract:BACKGROUND:Progress in the fields of protein separation and identification technologies has accelerated research into biofluids proteomics for protein biomarker discovery. Urine has become an ideal and rich source of biomarkers in clinical proteomics. Here we performed a proteomic analysis of urine samples from pregnan...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-777

    authors: Zheng J,Liu L,Wang J,Jin Q

    更新日期:2013-11-11 00:00:00

  • In silico and in vivo splicing analysis of MLH1 and MSH2 missense mutations shows exon- and tissue-specific effects.

    abstract:BACKGROUND:Abnormalities of pre-mRNA splicing are increasingly recognized as an important mechanism through which gene mutations cause disease. However, apart from the mutations in the donor and acceptor sites, the effects on splicing of other sequence variations are difficult to predict. Loosely defined exonic and int...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-7-243

    authors: Lastella P,Surdo NC,Resta N,Guanti G,Stella A

    更新日期:2006-09-22 00:00:00

  • Comparing Mycobacterium tuberculosis genomes using genome topology networks.

    abstract:BACKGROUND:Over the last decade, emerging research methods, such as comparative genomic analysis and phylogenetic study, have yielded new insights into genotypes and phenotypes of closely related bacterial strains. Several findings have revealed that genomic structural variations (SVs), including gene gain/loss, gene d...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1259-0

    authors: Jiang J,Gu J,Zhang L,Zhang C,Deng X,Dou T,Zhao G,Zhou Y

    更新日期:2015-02-14 00:00:00

  • A high throughput screen for active human transposable elements.

    abstract:BACKGROUND:Transposable elements (TEs) are mobile genetic sequences that randomly propagate within their host's genome. This mobility has the potential to affect gene transcription and cause disease. However, TEs are technically challenging to identify, which complicates efforts to assess the impact of TE insertions on...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4485-4

    authors: Kvikstad EM,Piazza P,Taylor JC,Lunter G

    更新日期:2018-02-01 00:00:00

  • Host specialization of the blast fungus Magnaporthe oryzae is associated with dynamic gain and loss of genes linked to transposable elements.

    abstract:BACKGROUND:Magnaporthe oryzae (anamorph Pyricularia oryzae) is the causal agent of blast disease of Poaceae crops and their wild relatives. To understand the genetic mechanisms that drive host specialization of M. oryzae, we carried out whole genome resequencing of four M. oryzae isolates from rice (Oryza sativa), one ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2690-6

    authors: Yoshida K,Saunders DG,Mitsuoka C,Natsume S,Kosugi S,Saitoh H,Inoue Y,Chuma I,Tosa Y,Cano LM,Kamoun S,Terauchi R

    更新日期:2016-05-18 00:00:00

  • Transcriptome analysis of a respiratory Saccharomyces cerevisiae strain suggests the expression of its phenotype is glucose insensitive and predominantly controlled by Hap4, Cat8 and Mig1.

    abstract:BACKGROUND:We previously described the first respiratory Saccharomyces cerevisiae strain, KOY.TM6*P, by integrating the gene encoding a chimeric hexose transporter, Tm6*, into the genome of an hxt null yeast. Subsequently we transferred this respiratory phenotype in the presence of up to 50 g/L glucose to a yeast strai...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-365

    authors: Bonander N,Ferndahl C,Mostad P,Wilks MD,Chang C,Showe L,Gustafsson L,Larsson C,Bill RM

    更新日期:2008-07-31 00:00:00

  • Genomic and systems evolution in Vibrionaceae species.

    abstract:BACKGROUND:The steadily increasing number of prokaryotic genomes has accelerated the study of genome evolution; in particular, the availability of sets of genomes from closely related bacteria has facilitated the exploration of the mechanisms underlying genome plasticity. The family Vibrionaceae is found in the Gammapr...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-S1-S11

    authors: Gu J,Neary J,Cai H,Moshfeghian A,Rodriguez SA,Lilburn TG,Wang Y

    更新日期:2009-07-07 00:00:00

  • Benchmarking subcellular localization and variant tolerance predictors on membrane proteins.

    abstract:BACKGROUND:Membrane proteins constitute up to 30% of the human proteome. These proteins have special properties because the transmembrane segments are embedded into lipid bilayer while extramembranous parts are in different environments. Membrane proteins have several functions and are involved in numerous diseases. A ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-5865-0

    authors: Orioli T,Vihinen M

    更新日期:2019-07-16 00:00:00

  • Comparative genomic, transcriptomic, and proteomic reannotation of human herpesvirus 6.

    abstract:BACKGROUND:Human herpesvirus-6A and -6B (HHV-6) are betaherpesviruses that reach > 90% seroprevalence in the adult population. Unique among human herpesviruses, HHV-6 can integrate into the subtelomeric regions of human chromosomes; when this occurs in germ line cells it causes a condition called inherited chromosomall...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4604-2

    authors: Greninger AL,Knudsen GM,Roychoudhury P,Hanson DJ,Sedlak RH,Xie H,Guan J,Nguyen T,Peddu V,Boeckh M,Huang ML,Cook L,Depledge DP,Zerr DM,Koelle DM,Gantt S,Yoshikawa T,Caserta M,Hill JA,Jerome KR

    更新日期:2018-03-20 00:00:00

  • InvBFM: finding genomic inversions from high-throughput sequence data based on feature mining.

    abstract:BACKGROUND:Genomic inversion is one type of structural variations (SVs) and is known to play an important biological role. An established problem in sequence data analysis is calling inversions from high-throughput sequence data. It is more difficult to detect inversions because they are surrounded by duplication or ot...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-6585-1

    authors: Wu Z,Wu Y,Gao J

    更新日期:2020-03-05 00:00:00

  • RNA sequencing for global gene expression associated with muscle growth in a single male modern broiler line compared to a foundational Barred Plymouth Rock chicken line.

    abstract:BACKGROUND:Modern broiler chickens exhibit very rapid growth and high feed efficiency compared to unselected chicken breeds. The improved production efficiency in modern broiler chickens was achieved by the intensive genetic selection for meat production. This study was designed to investigate the genetic alterations a...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3471-y

    authors: Kong BW,Hudson N,Seo D,Lee S,Khatri B,Lassiter K,Cook D,Piekarski A,Dridi S,Anthony N,Bottje W

    更新日期:2017-01-13 00:00:00

  • Construction of a highly flexible and comprehensive gene collection representing the ORFeome of the human pathogen Chlamydia pneumoniae.

    abstract:BACKGROUND:The Gram-negative bacterium Chlamydia pneumoniae (Cpn) is the leading intracellular human pathogen responsible for respiratory infections such as pneumonia and bronchitis. Basic and applied research in pathogen biology, especially the elaboration of new mechanism-based anti-pathogen strategies, target discov...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-632

    authors: Maier CJ,Maier RH,Virok DP,Maass M,Hintner H,Bauer JW,Onder K

    更新日期:2012-11-16 00:00:00

  • The host-pathogen interaction between wheat and yellow rust induces temporally coordinated waves of gene expression.

    abstract:BACKGROUND:Understanding how plants and pathogens modulate gene expression during the host-pathogen interaction is key to uncovering the molecular mechanisms that regulate disease progression. Recent advances in sequencing technologies have provided new opportunities to decode the complexity of such interactions. In th...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2684-4

    authors: Dobon A,Bunting DC,Cabrera-Quio LE,Uauy C,Saunders DG

    更新日期:2016-05-20 00:00:00

  • Horizontal transfer of OC1 transposons in the Tasmanian devil.

    abstract:BACKGROUND:There is growing recognition that horizontal DNA transfer, a process known to be common in prokaryotes, is also a significant source of genomic variation in eukaryotes. Horizontal transfer of transposable elements (HTT) may be especially prevalent in eukaryotes given the inherent mobility, widespread occurre...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-134

    authors: Gilbert C,Waters P,Feschotte C,Schaack S

    更新日期:2013-02-27 00:00:00

  • A transcription map of the 6p22.3 reading disability locus identifying candidate genes.

    abstract:BACKGROUND:Reading disability (RD) is a common syndrome with a large genetic component. Chromosome 6 has been identified in several linkage studies as playing a significant role. A more recent study identified a peak of transmission disequilibrium to marker JA04 (G72384) on chromosome 6p22.3, suggesting that a gene is ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-4-25

    authors: Londin ER,Meng H,Gruen JR

    更新日期:2003-06-30 00:00:00

  • A comprehensive study on cellular RNA editing activity in response to infections with different subtypes of influenza a viruses.

    abstract:BACKGROUND:RNA editing is an important mechanism that expands the diversity and complexity of genetic codes. The conversions of adenosine (A) to inosine (I) and cytosine (C) to uridine (U) are two prominent types of RNA editing in animals. The roles of RNA editing events have been implicated in important biological pat...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4330-1

    authors: Cao Y,Cao R,Huang Y,Zhou H,Liu Y,Li X,Zhong W,Hao P

    更新日期:2018-01-19 00:00:00

  • Genome-wide association analysis identified splicing single nucleotide polymorphism in CFLAR predictive of triptolide chemo-sensitivity.

    abstract:BACKGROUND:Triptolide is a therapeutic diterpenoid derived from the Chinese herb Tripterygium wilfordii Hook f. Triptolide has been shown to induce apoptosis by activation of pro-apoptotic proteins, inhibiting NFkB and c-KIT pathways, suppressing the Jak2 transcription, activating MAPK8/JNK signaling and modulating the...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1614-1

    authors: Chauhan L,Jenkins GD,Bhise N,Feldberg T,Mitra-Ghosh T,Fridley BL,Lamba JK

    更新日期:2015-06-30 00:00:00

  • Genome-wide expression profiling shows transcriptional reprogramming in Fusarium graminearum by Fusarium graminearum virus 1-DK21 infection.

    abstract:BACKGROUND:Fusarium graminearum virus 1 strain-DK21 (FgV1-DK21) is a mycovirus that confers hypovirulence to F. graminearum, which is the primary phytopathogenic fungus that causes Fusarium head blight (FHB) disease in many cereals. Understanding the interaction between mycoviruses and plant pathogenic fungi is necessa...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-173

    authors: Cho WK,Yu J,Lee KM,Son M,Min K,Lee YW,Kim KH

    更新日期:2012-05-06 00:00:00

  • Single nucleotide polymorphism discovery from expressed sequence tags in the waterflea Daphnia magna.

    abstract:BACKGROUND:Daphnia (Crustacea: Cladocera) plays a central role in standing aquatic ecosystems, has a well known ecology and is widely used in population studies and environmental risk assessments. Daphnia magna is, especially in Europe, intensively used to study stress responses of natural populations to pollutants, cl...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-309

    authors: Orsini L,Jansen M,Souche EL,Geldof S,De Meester L

    更新日期:2011-06-13 00:00:00

  • Glycogenome expression dynamics during mouse C2C12 myoblast differentiation suggests a sequential reorganization of membrane glycoconjugates.

    abstract:BACKGROUND:Several global transcriptomic and proteomic approaches have been applied in order to obtain new molecular insights on skeletal myogenesis, but none has generated any specific data on glycogenome expression, and thus on the role of glycan structures in this process, despite the involvement of glycoconjugates ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-483

    authors: Janot M,Audfray A,Loriol C,Germot A,Maftah A,Dupuy F

    更新日期:2009-10-20 00:00:00

  • Differences in transcription between free-living and CO2-activated third-stage larvae of Haemonchus contortus.

    abstract:BACKGROUND:The disease caused by Haemonchus contortus, a blood-feeding nematode of small ruminants, is of major economic importance worldwide. The infective third-stage larva (L3) of this gastric nematode is enclosed in a cuticle (sheath) and, once ingested with herbage by the host, undergoes an exsheathment process th...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-266

    authors: Cantacessi C,Campbell BE,Young ND,Jex AR,Hall RS,Presidente PJ,Zawadzki JL,Zhong W,Aleman-Meza B,Loukas A,Sternberg PW,Gasser RB

    更新日期:2010-04-27 00:00:00

  • Gene expression profiling of lymphoblastoid cell lines from monozygotic twins discordant in severity of autism reveals differential regulation of neurologically relevant genes.

    abstract:BACKGROUND:The autism spectrum encompasses a set of complex multigenic developmental disorders that severely impact the development of language, non-verbal communication, and social skills, and are associated with odd, stereotyped, repetitive behavior and restricted interests. To date, diagnosis of these neurologically...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-7-118

    authors: Hu VW,Frank BC,Heine S,Lee NH,Quackenbush J

    更新日期:2006-05-18 00:00:00

  • Xylem transcription profiles indicate potential metabolic responses for economically relevant characteristics of Eucalyptus species.

    abstract:BACKGROUND:Eucalyptus is one of the most important sources of industrial cellulose. Three species of this botanical group are intensively used in breeding programs: E. globulus, E. grandis and E. urophylla. E. globulus is adapted to subtropical/temperate areas and is considered a source of high-quality cellulose; E. gr...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-201

    authors: Salazar MM,Nascimento LC,Camargo EL,Gonçalves DC,Lepikson Neto J,Marques WL,Teixeira PJ,Mieczkowski P,Mondego JM,Carazzolle MF,Deckmann AC,Pereira GA

    更新日期:2013-03-22 00:00:00

  • Reverse transcriptional profiling: non-correspondence of transcript level variation and proximal promoter polymorphism.

    abstract:BACKGROUND:Variation in gene expression between two Drosophila melanogaster strains, as revealed by transcriptional profiling, seldom corresponded to variation in proximal promoter sequence for 34 genes analyzed. Two sets of protein-coding genes were selected from pre-existing microarray data: (1) those whose expressio...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-6-110

    authors: Brown RP,Feder ME

    更新日期:2005-08-17 00:00:00

  • Comparative analysis of function and interaction of transcription factors in nematodes: extensive conservation of orthology coupled to rapid sequence evolution.

    abstract:BACKGROUND:Much of the morphological diversity in eukaryotes results from differential regulation of gene expression in which transcription factors (TFs) play a central role. The nematode Caenorhabditis elegans is an established model organism for the study of the roles of TFs in controlling the spatiotemporal pattern ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-399

    authors: Haerty W,Artieri C,Khezri N,Singh RS,Gupta BP

    更新日期:2008-08-27 00:00:00

  • An analysis of the transcriptome of Teladorsagia circumcincta: its biological and biotechnological implications.

    abstract:BACKGROUND:Teladorsagia circumcincta (order Strongylida) is an economically important parasitic nematode of small ruminants (including sheep and goats) in temperate climatic regions of the world. Improved insights into the molecular biology of this parasite could underpin alternative methods required to control this an...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-S7-S10

    authors: Menon R,Gasser RB,Mitreva M,Ranganathan S

    更新日期:2012-01-01 00:00:00

  • Information-theoretic gene-gene and gene-environment interaction analysis of quantitative traits.

    abstract:BACKGROUND:The purpose of this research was to develop a novel information theoretic method and an efficient algorithm for analyzing the gene-gene (GGI) and gene-environmental interactions (GEI) associated with quantitative traits (QT). The method is built on two information-theoretic metrics, the k-way interaction inf...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-509

    authors: Chanda P,Sucheston L,Liu S,Zhang A,Ramanathan M

    更新日期:2009-11-04 00:00:00

  • ATP-binding cassette systems in Burkholderia pseudomallei and Burkholderia mallei.

    abstract:BACKGROUND:ATP binding cassette (ABC) systems are responsible for the import and export of a wide variety of molecules across cell membranes and comprise one of largest protein superfamilies found in prokarya, eukarya and archea. ABC systems play important roles in bacterial lifestyle, virulence and survival. In this s...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-83

    authors: Harland DN,Dassa E,Titball RW,Brown KA,Atkins HS

    更新日期:2007-03-28 00:00:00

  • Heritability and genome-wide association analyses of fasting plasma glucose in Chinese adult twins.

    abstract:BACKGROUND:Currently, diabetes has become one of the leading causes of death worldwide. Fasting plasma glucose (FPG) levels that are higher than optimal, even if below the diagnostic threshold of diabetes, can also lead to increased morbidity and mortality. Here we intend to study the magnitude of the genetic influence...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-06898-z

    authors: Wang W,Zhang C,Liu H,Xu C,Duan H,Tian X,Zhang D

    更新日期:2020-07-18 00:00:00