Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis.

Abstract:

:Through comparative studies of the model organism Arabidopsis thaliana and its close relative Brassica oleracea, we have identified conserved regions that represent potentially functional sequences overlooked by previous Arabidopsis genome annotation methods. A total of 454,274 whole genome shotgun sequences covering 283 Mb (0.44 x) of the estimated 650 Mb Brassica genome were searched against the Arabidopsis genome, and conserved Arabidopsis genome sequences (CAGSs) were identified. Of these 229,735 conserved regions, 167,357 fell within or intersected existing gene models, while 60,378 were located in previously unannotated regions. After removal of sequences matching known proteins, CAGSs that were close to one another were chained together as potentially comprising portions of the same functional unit. This resulted in 27,347 chains of which 15,686 were sufficiently distant from existing gene annotations to be considered a novel conserved unit. Of 192 conserved regions examined, 58 were found to be expressed in our cDNA populations. Rapid amplification of cDNA ends (RACE) was used to obtain potentially full-length transcripts from these 58 regions. The resulting sequences led to the creation of 21 gene models at 17 new Arabidopsis loci and the addition of splice variants or updates to another 19 gene structures. In addition, CAGSs overlapping already annotated genes in Arabidopsis can provide guidance for manual improvement of existing gene models. Published genome-wide expression data based on whole genome tiling arrays and massively parallel signature sequencing were overlaid on the Brassica-Arabidopsis conserved sequences, and 1399 regions of intersection were identified. Collectively our results and these data sets suggest that several thousand new Arabidopsis genes remain to be identified and annotated.

journal_name

Genome Res

journal_title

Genome research

authors

Ayele M,Haas BJ,Kumar N,Wu H,Xiao Y,Van Aken S,Utterback TR,Wortman JR,White OR,Town CD

doi

10.1101/gr.3176505

subject

Has Abstract

pub_date

2005-04-01 00:00:00

pages

487-95

issue

4

eissn

1088-9051

issn

1549-5469

pii

15/4/487

journal_volume

15

pub_type

杂志文章
  • Population genomics in a disease targeted primary cell model.

    abstract::The common genetic variants associated with complex traits typically lie in noncoding DNA and may alter gene regulation in a cell type-specific manner. Consequently, the choice of tissue or cell model in the dissection of disease associations is important. We carried out an expression quantitative trait loci (eQTL) st...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.095224.109

    authors: Grundberg E,Kwan T,Ge B,Lam KC,Koka V,Kindmark A,Mallmin H,Dias J,Verlaan DJ,Ouimet M,Sinnett D,Rivadeneira F,Estrada K,Hofman A,van Meurs JM,Uitterlinden A,Beaulieu P,Graziani A,Harmsen E,Ljunggren O,Ohlsson C,

    更新日期:2009-11-01 00:00:00

  • The evolution of evolvability in microRNA target sites in vertebrates.

    abstract::The lack of long-term evolutionary conservation of microRNA (miRNA) target sites appears to contradict many analyses of their functions. Several hypotheses have been offered, but an attractive one-that the conservation may be a function of taxonomic hierarchy (vertebrates, mammals, primates, etc.)-has rarely been disc...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.148916.112

    authors: Xu J,Zhang R,Shen Y,Liu G,Lu X,Wu CI

    更新日期:2013-11-01 00:00:00

  • A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction.

    abstract::The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the k-mer set memory (KSM), which consist...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.226852.117

    authors: Guo Y,Tian K,Zeng H,Guo X,Gifford DK

    更新日期:2018-06-01 00:00:00

  • Comparative analysis of mammalian Y chromosomes illuminates ancestral structure and lineage-specific evolution.

    abstract::Although more than thirty mammalian genomes have been sequenced to draft quality, very few of these include the Y chromosome. This has limited our understanding of the evolutionary dynamics of gene persistence and loss, our ability to identify conserved regulatory elements, as well our knowledge of the extent to which...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.154286.112

    authors: Li G,Davis BW,Raudsepp T,Pearks Wilkerson AJ,Mason VC,Ferguson-Smith M,O'Brien PC,Waters PD,Murphy WJ

    更新日期:2013-09-01 00:00:00

  • Time course regulatory analysis based on paired expression and chromatin accessibility data.

    abstract::A time course experiment is a widely used design in the study of cellular processes such as differentiation or response to stimuli. In this paper, we propose time course regulatory analysis (TimeReg) as a method for the analysis of gene regulatory networks based on paired gene expression and chromatin accessibility da...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.257063.119

    authors: Duren Z,Chen X,Xin J,Wang Y,Wong WH

    更新日期:2020-04-01 00:00:00

  • The genome-wide determinants of human and chimpanzee microsatellite evolution.

    abstract::Mutation rates of microsatellites vary greatly among loci. The causes of this heterogeneity remain largely enigmatic yet are crucial for understanding numerous human neurological diseases and genetic instability in cancer. In this first genome-wide study, the relative contributions of intrinsic features and regional g...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.7113408

    authors: Kelkar YD,Tyekucheva S,Chiaromonte F,Makova KD

    更新日期:2008-01-01 00:00:00

  • A simplified procedure for developing multiplex PCRs.

    abstract::We have developed a simplified method for multiplex PCR based on the use of chimeric primers. Each primer contains a 3' region complementary to sequence-specific recognition sites and a 5' region made up of an unrelated 20-nucleotide sequence. Identical reaction conditions, cycling times, and annealing temperatures ha...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5.5.488

    authors: Shuber AP,Grondin VJ,Klinger KW

    更新日期:1995-12-01 00:00:00

  • Theories and applications for sequencing randomly selected clones.

    abstract::Theory is developed for the process of sequencing randomly selected large-insert clones. Genome size, library depth, clone size, and clone distribution are considered relevant properties and perfect overlap detection for contig assembly is assumed. Genome-specific and nonrandom effects are neglected. Order of magnitud...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.gr-1339r

    authors: Wendl MC,Marra MA,Hillier LW,Chinwalla AT,Wilson RK,Waterston RH

    更新日期:2001-02-01 00:00:00

  • Integrative functional genomics identifies an enhancer looping to the SOX9 gene disrupted by the 17q24.3 prostate cancer risk locus.

    abstract::Genome-wide association studies (GWAS) are identifying genetic predisposition to various diseases. The 17q24.3 locus harbors the single nucleotide polymorphism (SNP) rs1859962 that is statistically associated with prostate cancer (PCa). It defines a 130-kb linkage disequilibrium (LD) block that lies in an ∼2-Mb gene d...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.135665.111

    authors: Zhang X,Cowper-Sal lari R,Bailey SD,Moore JH,Lupien M

    更新日期:2012-08-01 00:00:00

  • Ancient duplicated conserved noncoding elements in vertebrates: a genomic and functional analysis.

    abstract::Fish-mammal genomic comparisons have proved powerful in identifying conserved noncoding elements likely to be cis-regulatory in nature, and the majority of those tested in vivo have been shown to act as tissue-specific enhancers associated with genes involved in transcriptional regulation of development. Although most...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4143406

    authors: McEwen GK,Woolfe A,Goode D,Vavouri T,Callaway H,Elgar G

    更新日期:2006-04-01 00:00:00

  • Coding and noncoding variants in HFM1, MLH3, MSH4, MSH5, RNF212, and RNF212B affect recombination rate in cattle.

    abstract::We herein study genetic recombination in three cattle populations from France, New Zealand, and the Netherlands. We identify 2,395,177 crossover (CO) events in 94,516 male gametes, and 579,996 CO events in 25,332 female gametes. The average number of COs was found to be larger in males (23.3) than in females (21.4). T...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.204214.116

    authors: Kadri NK,Harland C,Faux P,Cambisano N,Karim L,Coppieters W,Fritz S,Mullaart E,Baurain D,Boichard D,Spelman R,Charlier C,Georges M,Druet T

    更新日期:2016-10-01 00:00:00

  • Genome-scale cloning and expression of individual open reading frames using topoisomerase I-mediated ligation.

    abstract::The in vitro cloning of DNA molecules traditionally uses PCR amplification or site-specific restriction endonucleases to generate linear DNA inserts with defined termini and requires DNA ligase to covalently join those inserts to vectors with the corresponding ends. We have used the properties of Vaccinia DNA topoisom...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:

    authors: Heyman JA,Cornthwaite J,Foncerrada L,Gilmore JR,Gontang E,Hartman KJ,Hernandez CL,Hood R,Hull HM,Lee WY,Marcil R,Marsh EJ,Mudd KM,Patino MJ,Purcell TJ,Rowland JJ,Sindici ML,Hoeffler JP

    更新日期:1999-04-01 00:00:00

  • A role for palindromic structures in the cis-region of maize Sirevirus LTRs in transposable element evolution and host epigenetic response.

    abstract::Transposable elements (TEs) proliferate within the genome of their host, which responds by silencing them epigenetically. Much is known about the mechanisms of silencing in plants, particularly the role of siRNAs in guiding DNA methylation. In contrast, little is known about siRNA targeting patterns along the length o...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.193763.115

    authors: Bousios A,Diez CM,Takuno S,Bystry V,Darzentas N,Gaut BS

    更新日期:2016-02-01 00:00:00

  • Massive turnover of functional sequence in human and other mammalian genomes.

    abstract::Despite the availability of dozens of animal genome sequences, two key questions remain unanswered: First, what fraction of any species' genome confers biological function, and second, are apparent differences in organismal complexity reflected in an objective measure of genomic complexity? Here, we address both quest...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.108795.110

    authors: Meader S,Ponting CP,Lunter G

    更新日期:2010-10-01 00:00:00

  • Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions.

    abstract::Transcription factors canonically bind nucleosome-free DNA, making the positioning of nucleosomes within regulatory regions crucial to the regulation of gene expression. Using the assay of transposase accessible chromatin (ATAC-seq), we observe a highly structured pattern of DNA fragment lengths and positions around n...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.192294.115

    authors: Schep AN,Buenrostro JD,Denny SK,Schwartz K,Sherlock G,Greenleaf WJ

    更新日期:2015-11-01 00:00:00

  • Spidey: a tool for mRNA-to-genomic alignments.

    abstract::We have developed a computer program that aligns spliced sequences to genomic sequences, using local alignment algorithms and heuristics to put together a global spliced alignment. Spidey can produce reliable alignments quickly, even when confronted with noise from alternative splicing, polymorphisms, sequencing error...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.195301

    authors: Wheelan SJ,Church DM,Ostell JM

    更新日期:2001-11-01 00:00:00

  • Evolution and multilevel optimization of the genetic code.

    abstract::The discovery of the genetic code was one of the most important advances of modern biology. But there is more to a DNA code than protein sequence; DNA carries signals for splicing, localization, folding, and regulation that are often embedded within the protein-coding sequence. In this issue, Itzkovitz and Alon show t...

    journal_title:Genome research

    pub_type: 评论,杂志文章,评审

    doi:10.1101/gr.6144007

    authors: Bollenbach T,Vetsigian K,Kishony R

    更新日期:2007-04-01 00:00:00

  • Two contrasting classes of nucleolus-associated domains in mouse fibroblast heterochromatin.

    abstract::In interphase eukaryotic cells, almost all heterochromatin is located adjacent to the nucleolus or to the nuclear lamina, thus defining nucleolus-associated domains (NADs) and lamina-associated domains (LADs), respectively. Here, we determined the first genome-scale map of murine NADs in mouse embryonic fibroblasts (M...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.247072.118

    authors: Vertii A,Ou J,Yu J,Yan A,Pagès H,Liu H,Zhu LJ,Kaufman PD

    更新日期:2019-08-01 00:00:00

  • Retroposed copies of the HMG genes: a window to genome dynamics.

    abstract::Retroposed copies (RPCs) of genes are functional (intronless paralogs) or nonfunctional (processed pseudogenes) copies derived from mRNA through a process of retrotransposition. Previous studies found that gene families involved in mRNA translation or nuclear function were more likely to have large numbers of RPCs. He...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.893803

    authors: Strichman-Almashanu LZ,Bustin M,Landsman D

    更新日期:2003-05-01 00:00:00

  • Parente2: a fast and accurate method for detecting identity by descent.

    abstract::Identity-by-descent (IBD) inference is the problem of establishing a genetic connection between two individuals through a genomic segment that is inherited by both individuals from a recent common ancestor. IBD inference is an important preceding step in a variety of population genomic studies, ranging from demographi...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.173641.114

    authors: Rodriguez JM,Bercovici S,Huang L,Frostig R,Batzoglou S

    更新日期:2015-02-01 00:00:00

  • Extensive variation and low heritability of DNA methylation identified in a twin study.

    abstract::Disturbance of DNA methylation leading to aberrant gene expression has been implicated in the etiology of many diseases. Whereas variation at the genetic level has been studied extensively, less is known about the extent and function of epigenetic variation. To explore variation and heritability of DNA methylation, we...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.119685.110

    authors: Gervin K,Hammerø M,Akselsen HE,Moe R,Nygård H,Brandt I,Gjessing HK,Harris JR,Undlien DE,Lyle R

    更新日期:2011-11-01 00:00:00

  • Noncoding origins of anthropoid traits and a new null model of transposon functionalization.

    abstract::Little is known about novel genetic elements that drove the emergence of anthropoid primates. We exploited the sequencing of the marmoset genome to identify 23,849 anthropoid-specific constrained (ASC) regions and confirmed their robust functional signatures. Of the ASC base pairs, 99.7% were noncoding, suggesting tha...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.168963.113

    authors: del Rosario RC,Rayan NA,Prabhakar S

    更新日期:2014-09-01 00:00:00

  • Genome function and nuclear architecture: from gene expression to nanoscience.

    abstract::Biophysical, chemical, and nanoscience approaches to the study of nuclear structure and activity have been developing recently and hold considerable promise. A selection of fundamental problems in genome organization and function are reviewed and discussed in the context of these new perspectives and approaches. Advan...

    journal_title:Genome research

    pub_type: 杂志文章,评审

    doi:10.1101/gr.946403

    authors: O'Brien TP,Bult CJ,Cremer C,Grunze M,Knowles BB,Langowski J,McNally J,Pederson T,Politz JC,Pombo A,Schmahl G,Spatz JP,van Driel R

    更新日期:2003-06-01 00:00:00

  • A GC-rich sequence feature in the 3' UTR directs UPF1-dependent mRNA decay in mammalian cells.

    abstract::Up-frameshift protein 1 (UPF1) is an ATP-dependent RNA helicase that has essential roles in RNA surveillance and in post-transcriptional gene regulation by promoting the degradation of mRNAs. Previous studies revealed that UPF1 is associated with the 3' untranslated region (UTR) of target mRNAs via as-yet-unknown sequ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.206060.116

    authors: Imamachi N,Salam KA,Suzuki Y,Akimitsu N

    更新日期:2017-03-01 00:00:00

  • Strategies for mutational analysis of the large multiexon ATM gene using high-density oligonucleotide arrays.

    abstract::Mutational analysis of large genes with complex genomic structures plays an important role in medical genetics. Technical limitations associated with current mutation screening protocols have placed increased emphasis on the development of new technologies to simplify these procedures. High-density arrays of >90,000-o...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.8.12.1245

    authors: Hacia JG,Sun B,Hunt N,Edgemon K,Mosbrook D,Robbins C,Fodor SP,Tagle DA,Collins FS

    更新日期:1998-12-01 00:00:00

  • The region surrounding the PKD1 gene: a 700-kb P1 contig from a YAC-deficient interval.

    abstract::As part of an effort to identify the gene responsible for the predominant form of polycystic kidney disease (PKD1), we used a gridded human P1 library for contig assembly. The interval of interest, a 700-kb segment on chromosome 16p13.3, can be physically delineated by the genetic markers D16S125 and D16S84 and chromo...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6.6.515

    authors: Dackowski WR,Connors TD,Bowe AE,Stanton V Jr,Housman D,Doggett NA,Landes GM,Klinger KW

    更新日期:1996-06-01 00:00:00

  • Genome-wide mapping of human DNA-replication origins: levels of transcription at ORC1 sites regulate origin selection and replication timing.

    abstract::We report the genome-wide mapping of ORC1 binding sites in mammals, by chromatin immunoprecipitation and parallel sequencing (ChIP-seq). ORC1 binding sites in HeLa cells were validated as active DNA replication origins (ORIs) using Repli-seq, a method that allows identification of ORI-containing regions by parallel se...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.142331.112

    authors: Dellino GI,Cittaro D,Piccioni R,Luzi L,Banfi S,Segalla S,Cesaroni M,Mendoza-Maldonado R,Giacca M,Pelicci PG

    更新日期:2013-01-01 00:00:00

  • Telomeric organization of a variable and inducible toxin gene family in the ancient eukaryote Giardia duodenalis.

    abstract::Giardia duodenalis is the best-characterized example of the most ancient eukaryotes, which are primitively amitochondrial and anaerobic. The surface of Giardia is coated with cysteine-rich proteins. One family of these proteins, CRP136, varies among isolates and upon environmental stress. A repeat region within the CR...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.7.1.37

    authors: Upcroft P,Chen N,Upcroft JA

    更新日期:1997-01-01 00:00:00

  • Integration of the rat recombination and EST maps in the rat genomic sequence and comparative mapping analysis with the mouse genome.

    abstract::Inbred strains of the laboratory rat are widely used for identifying genetic regions involved in the control of complex quantitative phenotypes of biomedical importance. The draft genomic sequence of the rat now provides essential information for annotating rat quantitative trait locus (QTL) maps. Following the survey...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2001604

    authors: Wilder SP,Bihoreau MT,Argoud K,Watanabe TK,Lathrop M,Gauguier D

    更新日期:2004-04-01 00:00:00

  • Next-generation tag sequencing for cancer gene expression profiling.

    abstract::We describe a new method, Tag-seq, which employs ultra high-throughput sequencing of 21 base pair cDNA tags for sensitive and cost-effective gene expression profiling. We compared Tag-seq data to LongSAGE data and observed improved representation of several classes of rare transcripts, including transcription factors,...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.094482.109

    authors: Morrissy AS,Morin RD,Delaney A,Zeng T,McDonald H,Jones S,Zhao Y,Hirst M,Marra MA

    更新日期:2009-10-01 00:00:00