Summarizing and correcting the GC content bias in high-throughput sequencing.

Abstract:

:GC content bias describes the dependence between fragment count (read coverage) and GC content found in Illumina sequencing data. This bias can dominate the signal of interest for analyses that focus on measuring fragment abundance within a genome, such as copy number estimation (DNA-seq). The bias is not consistent between samples; and there is no consensus as to the best methods to remove it in a single sample. We analyze regularities in the GC bias patterns, and find a compact description for this unimodal curve family. It is the GC content of the full DNA fragment, not only the sequenced read, that most influences fragment count. This GC effect is unimodal: both GC-rich fragments and AT-rich fragments are underrepresented in the sequencing results. This empirical evidence strengthens the hypothesis that PCR is the most important cause of the GC bias. We propose a model that produces predictions at the base pair level, allowing strand-specific GC-effect correction regardless of the downstream smoothing or binning. These GC modeling considerations can inform other high-throughput sequencing analyses such as ChIP-seq and RNA-seq.

journal_name

Nucleic Acids Res

journal_title

Nucleic acids research

authors

Benjamini Y,Speed TP

doi

10.1093/nar/gks001

subject

Has Abstract

pub_date

2012-05-01 00:00:00

pages

e72

issue

10

eissn

0305-1048

issn

1362-4962

pii

gks001

journal_volume

40

pub_type

杂志文章
  • ChimeRScope: a novel alignment-free algorithm for fusion transcript prediction using paired-end RNA-Seq data.

    abstract::The RNA-Seq technology has revolutionized transcriptome characterization not only by accurately quantifying gene expression, but also by the identification of novel transcripts like chimeric fusion transcripts. The 'fusion' or 'chimeric' transcripts have improved the diagnosis and prognosis of several tumors, and have...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx315

    authors: Li Y,Heavican TB,Vellichirammal NN,Iqbal J,Guda C

    更新日期:2017-07-27 00:00:00

  • Interspecific adaptation by binary choice at de novo polyomavirus T antigen site through accelerated codon-constrained Val-Ala toggling within an intrinsically disordered region.

    abstract::It is common knowledge that conserved residues evolve slowly. We challenge generality of this central tenet of molecular biology by describing the fast evolution of a conserved nucleotide position that is located in the overlap of two open reading frames (ORFs) of polyomaviruses. The de novo ORF is expressed through e...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv378

    authors: Lauber C,Kazem S,Kravchenko AA,Feltkamp MC,Gorbalenya AE

    更新日期:2015-05-26 00:00:00

  • Identifying DNA-binding proteins using structural motifs and the electrostatic potential.

    abstract::Robust methods to detect DNA-binding proteins from structures of unknown function are important for structural biology. This paper describes a method for identifying such proteins that (i) have a solvent accessible structural motif necessary for DNA-binding and (ii) a positive electrostatic potential in the region of ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkh803

    authors: Shanahan HP,Garcia MA,Jones S,Thornton JM

    更新日期:2004-09-08 00:00:00

  • TFIIH is an elongation factor of RNA polymerase I.

    abstract::TFIIH is a multisubunit factor essential for transcription initiation and promoter escape of RNA polymerase II and for the opening of damaged DNA double strands in nucleotide excision repair (NER). In this study, we have analyzed at which step of the transcription cycle TFIIH is essential for transcription by RNA poly...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr746

    authors: Assfalg R,Lebedev A,Gonzalez OG,Schelling A,Koch S,Iben S

    更新日期:2012-01-01 00:00:00

  • Interdependence between DNA template secondary structure and priming efficiencies of short primers.

    abstract::Here we analyze the effect of DNA folding on the performance of short primers and describe a simple technique for assessing hitherto uncertain values of thermodynamic parameters that determine the folding of single-stranded DNA into secondary structure. An 8mer with two degenerate positions is extended simultaneously ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/26.23.5525

    authors: Lvovsky L,Ioshikhes I,Raja MC,Zevin-Sonkin D,Sobolev IA,Liberzon A,Shwartzburd J,Ulanovsky LE

    更新日期:1998-12-01 00:00:00

  • SPD--a web-based secreted protein database.

    abstract::With the improved secreted protein prediction approach and comprehensive data sources, including Swiss-Prot, TrEMBL, RefSeq, Ensembl and CBI-Gene, we have constructed secretomes of human, mouse and rat, with a total of 18 152 secreted proteins. All the entries are ranked according to the prediction confidence. They we...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gki093

    authors: Chen Y,Zhang Y,Yin Y,Gao G,Li S,Jiang Y,Gu X,Luo J

    更新日期:2005-01-01 00:00:00

  • Using iodinated single-stranded M13 probes to facilitate rapid DNA sequence analysis--nucleotide sequence of a mouse lysine tRNA gene.

    abstract::From a recombinant lambda phage, we have determined a 387 bp sequence containing a mouse lysine tRNA gene. The putative lys tRNA (anticodon UUU) differs from rabbit liver lys tRNA at five positions. The flanking regions of the mouse gene are not generally homologous to published human and Drosophila lys tRNA genes. Ho...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/11.7.2053

    authors: Han JH,Harding JD

    更新日期:1983-04-11 00:00:00

  • Alu repeats as transcriptional regulatory platforms in macrophage responses to M. tuberculosis infection.

    abstract::To understand the epigenetic regulation of transcriptional response of macrophages during early-stage M. tuberculosis (Mtb) infection, we performed ChIPseq analysis of H3K4 monomethylation (H3K4me1), a marker of poised or active enhancers. De novo H3K4me1 peaks in infected cells were associated with genes implicated i...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw782

    authors: Bouttier M,Laperriere D,Memari B,Mangiapane J,Fiore A,Mitchell E,Verway M,Behr MA,Sladek R,Barreiro LB,Mader S,White JH

    更新日期:2016-12-15 00:00:00

  • SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data.

    abstract::Comparative genome hybridization (CGH) to DNA microarrays (array CGH) is a technique capable of detecting deletions and duplications in genomes at high resolution. However, array CGH studies of the human genome noting false negative and false positive results using large insert clones as probes have raised important c...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gki643

    authors: Price TS,Regan R,Mott R,Hedman A,Honey B,Daniels RJ,Smith L,Greenfield A,Tiganescu A,Buckle V,Ventress N,Ayyub H,Salhan A,Pedraza-Diaz S,Broxholme J,Ragoussis J,Higgs DR,Flint J,Knight SJ

    更新日期:2005-06-16 00:00:00

  • Analysis of the attachment of replicating DNA to a nuclear matrix in mammalian interphase nuclei.

    abstract::The attachment of replicating DNA to a rapidly sedimenting nuclear structure was investigated by digestion with various nucleases. When DNA was gradually removed by DNase I, pulse label incorporated during either 1 min or during 1 hour in the presence of arabinosylcytosine, remained preferentially attached to the nucl...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/6.1.219

    authors: Dijkwel PA,Mullenders LH,Wanka F

    更新日期:1979-01-01 00:00:00

  • The nucleotide sequence of the nitrogen-regulation gene ntrA of Klebsiella pneumoniae and comparison with conserved features in bacterial RNA polymerase sigma factors.

    abstract::The nucleotide sequence of the Klebsiella pneumoniae ntrA gene has been determined. NtrA encodes a 53,926 Dalton acidic polypeptide; a calculated molecular weight which is significantly lower than that determined by SDS polyacrylamide gel analysis. NtrA is followed by another open-reading frame (orf) of at least 75 am...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/13.21.7607

    authors: Merrick MJ,Gibbins JR

    更新日期:1985-11-11 00:00:00

  • Nascent RNA sequencing reveals mechanisms of gene regulation in the human malaria parasite Plasmodium falciparum.

    abstract::Gene expression in Plasmodium falciparum is tightly regulated to ensure successful propagation of the parasite throughout its complex life cycle. The earliest transcriptomics studies in P. falciparum suggested a cascade of transcriptional activity over the course of the 48-hour intraerythrocytic developmental cycle (I...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx464

    authors: Lu XM,Batugedara G,Lee M,Prudhomme J,Bunnik EM,Le Roch KG

    更新日期:2017-07-27 00:00:00

  • Expression of herpes virus thymidine kinase in Neurospora crassa.

    abstract::The expression of thymidine kinase in fungi, which normally lack this enzyme, will greatly aid the study of DNA metabolism and provide useful drug-sensitive phenotypes. The herpes simplex virus type-1 thymidine kinase gene ( tk ) was expressed in Neurospora crassa. tk was expressed as a fusion to N.crassa arg-2 regula...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/25.12.2389

    authors: Sachs MS,Selker EU,Lin B,Roberts CJ,Luo Z,Vaught-Alexander D,Margolin BS

    更新日期:1997-06-15 00:00:00

  • Mechanical properties of DNA-like polymers.

    abstract::The molecular structure of the DNA double helix has been known for 60 years, but we remain surprisingly ignorant of the balance of forces that determine its mechanical properties. The DNA double helix is among the stiffest of all biopolymers, but neither theory nor experiment has provided a coherent understanding of t...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkt808

    authors: Peters JP,Yelgaonkar SP,Srivatsan SG,Tor Y,James Maher L 3rd

    更新日期:2013-12-01 00:00:00

  • A distinct class of homeodomain proteins is encoded by two sequentially expressed Drosophila genes from the 93D/E cluster.

    abstract::Homeodomains appear to be one of the most frequently employed DNA-binding domains in a superfamily of transacting factors. It is likely that during evolution several sub-types of homeodomain have evolved from a common ancestral domain, resulting in distinct but closely related DNA-binding preferences. Here we describe...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/22.7.1202

    authors: Jagla K,Stanceva I,Dretzen G,Bellard F,Bellard M

    更新日期:1994-04-11 00:00:00

  • Cloning and functional analysis of spliced isoforms of human nuclear factor I-X: interference with transcriptional activation by NFI/CTF in a cell-type specific manner.

    abstract::Previous studies of the epithelial specificity of the human papillomavirus type 16 (HPV-16) enhancer pointed out an important role of nuclear factor I (NFI). In epithelial cells, NFI proteins are derived from the NFI-C gene and referred to as NFI/CTF. In contrast, fibroblasts, where the enhancer is inactive, express h...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/22.19.3825

    authors: Apt D,Liu Y,Bernard HU

    更新日期:1994-09-25 00:00:00

  • Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP).

    abstract::cDNA cloning is a central technology in molecular biology. cDNA sequences are used to determine mRNA transcript structures, including splice junctions, open reading frames (ORFs) and 5'- and 3'-untranslated regions (UTRs). cDNA clones are valuable reagents for functional studies of genes and proteins. Expressed Sequen...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gni184

    authors: Hoskins RA,Stapleton M,George RA,Yu C,Wan KH,Carlson JW,Celniker SE

    更新日期:2005-12-02 00:00:00

  • Expression of the E.coli ada gene in yeast protects against the toxic and mutagenic effects of N-methyl-N'-nitro-N-nitrosoguanidine.

    abstract::The E.coli ada gene protein coding region has been ligated into an extrachromosomally replicating yeast expression vector downstream of the yeast alcohol dehydrogenase gene promoter region to produce pADH06C. The yeast strains SX46A, 7799-4B and VV-6 are deficient in endogenous O6-alkylguanine-DNA-alkyltransferase and...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/18.2.331

    authors: Brozmanova J,Kleibl K,Vlckova V,Skorvaga M,Cernakova L,Margison GP

    更新日期:1990-01-25 00:00:00

  • Synthesis of backbone deuterium labelled [r(CGCGAAUUCGCG)]2 and HPLC purification of synthetic RNA.

    abstract::The chemical synthesis of backbone deuterium labelled [r(CGCGAAU*U*CGCG)]2 (U* = [5'-2H]U) is described. An efficient purification procedure was developed using a polymeric reverse phase (PRP) HPLC column at 60 degrees C. This procedure provided pure RNA dodecamer in the multi-milligram quantities (39% overall yield) ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/20.19.5131

    authors: Khare D,Orban J

    更新日期:1992-10-11 00:00:00

  • 2'-O-methyl, 2'-O-ethyl oligoribonucleotides and phosphorothioate oligodeoxyribonucleotides as inhibitors of the in vitro U7 snRNP-dependent mRNA processing event.

    abstract::We describe the synthesis of 2'-O-methyl, 2'-O-ethyl oligoribonucleotides and phosphorothioate oligodeoxyribonucleotides and demonstrate their utility as inhibitors of the in vitro U7 snRNP-dependent mRNA processing event. These 2'-O-modified compounds were designed to possess the binding affinity of an RNA molecule t...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/19.10.2629

    authors: Cotten M,Oberhauser B,Brunar H,Holzner A,Issakides G,Noe CR,Schaffner G,Wagner E,Birnstiel ML

    更新日期:1991-05-25 00:00:00

  • Selection of sequence elements that substitute for the standard AATAAA motif which signals 3' processing and polyadenylation of late simian virus 40 mRNAs.

    abstract::A method is described which allows selection of sequences which can substitute for the normal AATAAA hexanucleotide involved in polyadenylation of SV40 late mRNAs. Plaques were generated from viral DNA lacking the motif, forcing acquisition of substitute sequences. Four variants were characterized. All displayed wild-...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/13.22.8053

    authors: Swimmer C,Shenk T

    更新日期:1985-11-25 00:00:00

  • A novel technique for the identification of CpG islands exhibiting altered methylation patterns (ICEAMP).

    abstract::Aberrant CpG methylation changes occurring during tumour progression include the loss (hypomethylation) and gain (hypermethylation) of methyl groups. Techniques currently available for examining such changes either require selection of a region, then examination of methylation changes, or utilise methylation-sensitive...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/29.24.e123

    authors: Brock GJ,Huang TH,Chen CM,Johnson KJ

    更新日期:2001-12-15 00:00:00

  • Visualisation of extensive water ribbons and networks in a DNA minor-groove drug complex.

    abstract::The crystal structure is reported of a complex between an ethyl derivative of the minor-groove drug furamidine and the dodecanucleotide duplex d(CGCGAATTCGCG)2, which has been refined to 1.85 A resolution and an R factor of 16.6% for data collected at -173 degreesC. An exceptionally large number (220) of water molecul...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/26.12.2873

    authors: Guerri A,Simpson IJ,Neidle S

    更新日期:1998-06-15 00:00:00

  • Action of pancreatic DNase: requirements for activation of DNA as a template-primer for DNA polymerase.

    abstract::Pancreatic DNase requires both Ca2+ and Mg2+ for its activity as measured by formation of an activated DNA template for in vitro DNA polymerase alpha assay and by the hyperchromic shift. Mn2+ can partially satisfy the Mg2+ requirement of the DNase for activation of DNA but the resulting template is only 50% as active ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/4.8.2641

    authors: Baril E,Mitchener J,Lee L,Baril B

    更新日期:1977-08-01 00:00:00

  • Detection of base analogs incorporated during DNA replication by nanopore sequencing.

    abstract::DNA synthesis is a fundamental requirement for cell proliferation and DNA repair, but no single method can identify the location, direction and speed of replication forks with high resolution. Mammalian cells have the ability to incorporate thymidine analogs along with the natural A, T, G and C bases during DNA synthe...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkaa517

    authors: Georgieva D,Liu Q,Wang K,Egli D

    更新日期:2020-09-04 00:00:00

  • Readthrough of stop codons under limiting ABCE1 concentration involves frameshifting and inhibits nonsense-mediated mRNA decay.

    abstract::To gain insight into the mechanistic link between translation termination and nonsense-mediated mRNA decay (NMD), we depleted the ribosome recycling factor ABCE1 in human cells, resulting in an upregulation of NMD-sensitive mRNAs. Suppression of NMD on these mRNAs occurs prior to their SMG6-mediated endonucleolytic cl...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkaa758

    authors: Annibaldis G,Domanski M,Dreos R,Contu L,Carl S,Kläy N,Mühlemann O

    更新日期:2020-10-09 00:00:00

  • Analysis of the interaction with the hepatitis C virus mRNA reveals an alternative mode of RNA recognition by the human La protein.

    abstract::Human La protein is an essential factor in the biology of both coding and non-coding RNAs. In the nucleus, La binds primarily to 3' oligoU containing RNAs, while in the cytoplasm La interacts with an array of different mRNAs lacking a 3' UUU(OH) trailer. An example of the latter is the binding of La to the IRES domain...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr890

    authors: Martino L,Pennell S,Kelly G,Bui TT,Kotik-Kogan O,Smerdon SJ,Drake AF,Curry S,Conte MR

    更新日期:2012-02-01 00:00:00

  • Array-based analysis of genomic DNA methylation patterns of the tumour suppressor gene p16INK4A promoter in colon carcinoma cell lines.

    abstract::Aberrant DNA methylation at CpG dinucleotides can result in epigenetic silencing of tumour suppressor genes and represents one of the earliest events in tumourigenesis. To date, however, high-throughput tools that are capable of surveying the methylation status of multiple gene promoters have been restricted to a limi...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gni072

    authors: Mund C,Beier V,Bewerunge P,Dahms M,Lyko F,Hoheisel JD

    更新日期:2005-04-28 00:00:00

  • Comparing binding site information to binding affinity reveals that Crp/DNA complexes have several distinct binding conformers.

    abstract::We show that the cAMP receptor protein (Crp) binds to DNA as several different conformers. This situation has precluded discovering a high correlation between any sequence property and binding affinity for proteins that bend DNA. Experimentally quantified affinities of Synechocystis sp. PCC 6803 cAMP receptor protein ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr369

    authors: Holmquist PC,Holmquist GP,Summers ML

    更新日期:2011-08-01 00:00:00

  • The initiator element of the Drosophila beta2 tubulin gene core promoter contributes to gene expression in vivo but is not required for male germ-cell specific expression.

    abstract::The tissue-specific expression of the Drosophila beta 2 tubulin gene ( B2t ) is accomplished by the action of a 14-bp activator element (beta2UE1) in combination with certain regulatory elements of the TATA-less, Inr-containing B2t core promoter. We performed an in vivo analysis of the Inr element function in the B2t ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/28.6.1439

    authors: Santel A,Kaufmann J,Hyland R,Renkawitz-Pohl R

    更新日期:2000-03-15 00:00:00