Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context.

Abstract:

:Gene order in prokaryotes is conserved to a much lesser extent than protein sequences. Only several operons, primarily those that code for physically interacting proteins, are conserved in all or most of the bacterial and archaeal genomes. Nevertheless, even the limited conservation of operon organization that is observed can provide valuable evolutionary and functional clues through multiple genome comparisons. A program for constructing gapped local alignments of conserved gene strings in two genomes was developed. The statistical significance of the local alignments was assessed using Monte Carlo simulations. Sets of local alignments were generated for all pairs of completely sequenced bacterial and archaeal genomes, and for each genome a template-anchored multiple alignment was constructed. In most pairwise genome comparisons, <10% of the genes in each genome belonged to conserved gene strings. When closely related pairs of species (i.e., two mycoplasmas) are excluded, the total coverage of genomes by conserved gene strings ranged from <5% for the cyanobacterium Synechocystis sp to 24% for the minimal genome of Mycoplasma genitalium, and 23% in Thermotoga maritima. The coverage of the archaeal genomes was only slightly lower than that of bacterial genomes. The majority of the conserved gene strings are known operons, with the ribosomal superoperon being the top-scoring string in most genome comparisons. However, in some of the bacterial-archaeal pairs, the superoperon is rearranged to the extent that other operons, primarily those subject to horizontal transfer, show the greatest level of conservation, such as the archaeal-type H+-ATPase operon or ABC-type transport cassettes. The level of gene order conservation among prokaryotic genomes was compared to the cooccurrence of genomes in clusters of orthologous genes (COGs) and to the conservation of protein sequences themselves. Only limited correlation was observed between these evolutionary variables. Gene order conservation shows a much lower variance than the cooccurrence of genomes in COGs, which indicates that intragenome homogenization via recombination occurs in evolution much faster than intergenome homogenization via horizontal gene transfer and lineage-specific gene loss. The potential of using template-anchored multiple-genome alignments for predicting functions of uncharacterized genes was quantitatively assessed. Functions were predicted or significantly clarified for approximately 90 COGs (approximately 4% of the total of 2414 analyzed COGs). The most significant predictions were obtained for the poorly characterized archaeal genomes; these include a previously uncharacterized restriction-modification system, a nuclease-helicase combination implicated in DNA repair, and the probable archaeal counterpart of the eukaryotic exosome. Multiple genome alignments are a resource for studies on operon rearrangement and disruption, which is central to our understanding of the evolution of prokaryotic genomes. Because of the rapid evolution of the gene order, the potential of genome alignment for prediction of gene functions is limited, but nevertheless, such predictions information significantly complements the results obtained through protein sequence and structure analysis.

journal_name

Genome Res

journal_title

Genome research

authors

Wolf YI,Rogozin IB,Kondrashov AS,Koonin EV

doi

10.1101/gr.gr-1619r

subject

Has Abstract

pub_date

2001-03-01 00:00:00

pages

356-72

issue

3

eissn

1088-9051

issn

1549-5469

journal_volume

11

pub_type

杂志文章
  • Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes.

    abstract::By analyzing 1,780,295 5'-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by m...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4039406

    authors: Kimura K,Wakamatsu A,Suzuki Y,Ota T,Nishikawa T,Yamashita R,Yamamoto J,Sekine M,Tsuritani K,Wakaguri H,Ishii S,Sugiyama T,Saito K,Isono Y,Irie R,Kushida N,Yoneyama T,Otsuka R,Kanda K,Yokoi T,Kondo H,Wagatsuma M

    更新日期:2006-01-01 00:00:00

  • Accurate detection and genotyping of SNPs utilizing population sequencing data.

    abstract::Next-generation sequencing technologies have made it possible to sequence targeted regions of the human genome in hundreds of individuals. Deep sequencing represents a powerful approach for the discovery of the complete spectrum of DNA sequence variants in functionally important genomic intervals. Current methods for ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.100040.109

    authors: Bansal V,Harismendy O,Tewhey R,Murray SS,Schork NJ,Topol EJ,Frazer KA

    更新日期:2010-04-01 00:00:00

  • The human homolog T of the mouse T(Brachyury) gene; gene structure, cDNA sequence, and assignment to chromosome 6q27.

    abstract::We have cloned the human gene encoding the transcription factor T. T protein is vital for the formation of posterior mesoderm and axial development in all vertebrates. Brachyury mutant mice, which lack T protein, die in utero with abnormal notochord, posterior somites, and allantois. We have identified human T genomic...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6.3.226

    authors: Edwards YH,Putt W,Lekoape KM,Stott D,Fox M,Hopkinson DA,Sowden J

    更新日期:1996-03-01 00:00:00

  • Pathway Processor: a tool for integrating whole-genome expression results into metabolic networks.

    abstract::We have developed a new tool to visualize expression data on metabolic pathways and to evaluate which metabolic pathways are most affected by transcriptional changes in whole-genome expression experiments. Using the Fisher Exact Test, the method scores biochemical pathways according to the probability that as many or ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.226602

    authors: Grosu P,Townsend JP,Hartl DL,Cavalieri D

    更新日期:2002-07-01 00:00:00

  • Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.

    abstract::Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data al...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.213405.116

    authors: Zimin AV,Puiu D,Luo MC,Zhu T,Koren S,Marçais G,Yorke JA,Dvořák J,Salzberg SL

    更新日期:2017-05-01 00:00:00

  • Nature and structure of human genes that generate retropseudogenes.

    abstract::The human genome is estimated to contain 23,000 to 33,000 retropseudogenes. To study the properties of genes giving rise to these retroelements, we compared the structure and expression of genes with or without known retropseudogenes. Four main features have emerged from the analysis of 181 genes associated to retrops...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.10.5.672

    authors: Gonçalves I,Duret L,Mouchiroud D

    更新日期:2000-05-01 00:00:00

  • Pervasive, genome-wide positive selection leading to functional divergence in the bacterial genus Campylobacter.

    abstract::An open question in bacterial genomics is the role that adaptive evolution of the core genome plays in diversification and adaptation of bacterial species, and how this might differ between groups of bacteria occupying different environmental circumstances. The genus Campylobacter encompasses several important human a...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.089250.108

    authors: Lefébure T,Stanhope MJ

    更新日期:2009-07-01 00:00:00

  • The landscape of histone modifications across 1% of the human genome in five human cell lines.

    abstract::We generated high-resolution maps of histone H3 lysine 9/14 acetylation (H3ac), histone H4 lysine 5/8/12/16 acetylation (H4ac), and histone H3 at lysine 4 mono-, di-, and trimethylation (H3K4me1, H3K4me2, H3K4me3, respectively) across the ENCODE regions. Studying each modification in five human cell lines including th...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5704207

    authors: Koch CM,Andrews RM,Flicek P,Dillon SC,Karaöz U,Clelland GK,Wilcox S,Beare DM,Fowler JC,Couttet P,James KD,Lefebvre GC,Bruce AW,Dovey OM,Ellis PD,Dhami P,Langford CF,Weng Z,Birney E,Carter NP,Vetrie D,Dunham I

    更新日期:2007-06-01 00:00:00

  • The unusual phylogenetic distribution of retrotransposons: a hypothesis.

    abstract::Retrotransposons have proliferated extensively in eukaryotic lineages; the genomes of many animals and plants comprise 50% or more retrotransposon sequences by weight. There are several persuasive arguments that the enzymatic lynchpin of retrotransposon replication, reverse transcriptase (RT), is an ancient enzyme. Mo...

    journal_title:Genome research

    pub_type: 杂志文章,评审

    doi:10.1101/gr.1392003

    authors: Boeke JD

    更新日期:2003-09-01 00:00:00

  • Evolutionary features of the 4-Mb Xq21.3 XY homology region revealed by a map at 60-kb resolution.

    abstract::Forty-three yeast artificial chromosomes (YACs) from the X chromosome have been overlapped across the 4-Mb Xq21.3 region, which is homologous to a segment in Yp11.1. The region is formatted to 60-kb resolution with 57 STSs and is merged at its edges with contigs specific for X. This allows a direct comparison of marke...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.7.4.307

    authors: Mumm S,Molini B,Terrell J,Srivastava A,Schlessinger D

    更新日期:1997-04-01 00:00:00

  • Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees.

    abstract::Accurate gene tree-species tree reconciliation is fundamental to inferring the evolutionary history of a gene family. However, although it has long been appreciated that population-related effects such as incomplete lineage sorting (ILS) can dramatically affect the gene tree, many of the most popular reconciliation me...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.161968.113

    authors: Wu YC,Rasmussen MD,Bansal MS,Kellis M

    更新日期:2014-03-01 00:00:00

  • Topologically associating domains and their long-range contacts are established during early G1 coincident with the establishment of the replication-timing program.

    abstract::Mammalian genomes are partitioned into domains that replicate in a defined temporal order. These domains can replicate at similar times in all cell types (constitutive) or at cell type-specific times (developmental). Genome-wide chromatin conformation capture (Hi-C) has revealed sub-megabase topologically associating ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.183699.114

    authors: Dileep V,Ay F,Sima J,Vera DL,Noble WS,Gilbert DM

    更新日期:2015-08-01 00:00:00

  • Comparing genomes within the species Mycobacterium tuberculosis.

    abstract::The study of genetic variability within natural populations of pathogens may provide insight into their evolution and pathogenesis. We used a Mycobacterium tuberculosis high-density oligonucleotide microarray to detect small-scale genomic deletions among 19 clinically and epidemiologically well-characterized isolates ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.166401

    authors: Kato-Maeda M,Rhee JT,Gingeras TR,Salamon H,Drenkow J,Smittipat N,Small PM

    更新日期:2001-04-01 00:00:00

  • Interaction between the X chromosome and an autosome regulates size sexual dimorphism in Portuguese Water Dogs.

    abstract::Size sexual dimorphism occurs in almost all mammals. In Portuguese Water Dogs, much of the difference in skeletal size between females and males is due to the interaction between a Quantitative Trait Locus (QTL) on the X-chromosome and a QTL linked to Insulin-like Growth Factor 1 (IGF-1) on the CFA 15 autosome. In fem...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3712705

    authors: Chase K,Carrier DR,Adler FR,Ostrander EA,Lark KG

    更新日期:2005-12-01 00:00:00

  • Patterns of meiotic recombination on the long arm of human chromosome 21.

    abstract::In this study we quantify the features of meiotic recombination on the long arm of human chromosome 21. We constructed a 67. 3-centimorgan (cM) high-resolution, comprehensive, and accurate genetic linkage map of chromosome 21q using 187 highly polymorphic markers covering almost the entire long arm; 46 loci, consistin...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.138100

    authors: Lynn A,Kashuk C,Petersen MB,Bailey JA,Cox DR,Antonarakis SE,Chakravarti A

    更新日期:2000-09-01 00:00:00

  • A tale of two templates: automatically resolving double traces has many applications, including efficient PCR-based elucidation of alternative splices.

    abstract::Trace Recalling is a novel method for deconvoluting double traces that result from simultaneously sequencing two DNA templates. Trace Recalling identifies up to two bases at each position of such a trace. The resulting ambiguity sequence is aligned to the genome, identifying one template sequence. A second template se...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5661407

    authors: Tenney AE,Wu JQ,Langton L,Klueh P,Quatrano R,Brent MR

    更新日期:2007-02-01 00:00:00

  • Phylogeny-wide analysis of social amoeba genomes highlights ancient origins for complex intercellular communication.

    abstract::Dictyostelium discoideum (DD), an extensively studied model organism for cell and developmental biology, belongs to the most derived group 4 of social amoebas, a clade of altruistic multicellular organisms. To understand genome evolution over long time periods and the genetic basis of social evolution, we sequenced th...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.121137.111

    authors: Heidel AJ,Lawal HM,Felder M,Schilde C,Helps NR,Tunggal B,Rivero F,John U,Schleicher M,Eichinger L,Platzer M,Noegel AA,Schaap P,Glöckner G

    更新日期:2011-11-01 00:00:00

  • A method for detecting IBD regions simultaneously in multiple individuals--with applications to disease genetics.

    abstract::All individuals in a finite population are related if traced back long enough and will, therefore, share regions of their genomes identical by descent (IBD). Detection of such regions has several important applications-from answering questions about human evolution to locating regions in the human genome containing di...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.115360.110

    authors: Moltke I,Albrechtsen A,Hansen TV,Nielsen FC,Nielsen R

    更新日期:2011-07-01 00:00:00

  • Somatic rearrangements across cancer reveal classes of samples with distinct patterns of DNA breakage and rearrangement-induced hypermutability.

    abstract::Whole-genome sequencing using massively parallel sequencing technologies enables accurate detection of somatic rearrangements in cancer. Pinpointing large numbers of rearrangement breakpoints to base-pair resolution allows analysis of rearrangement microhomology and genomic location for every sample. Here we analyze 9...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.141382.112

    authors: Drier Y,Lawrence MS,Carter SL,Stewart C,Gabriel SB,Lander ES,Meyerson M,Beroukhim R,Getz G

    更新日期:2013-02-01 00:00:00

  • Birth and expression evolution of mammalian microRNA genes.

    abstract::MicroRNAs (miRNAs) are major post-transcriptional regulators of gene expression, yet their origins and functional evolution in mammals remain little understood due to the lack of appropriate comparative data. Using RNA sequencing, we have generated extensive and comparable miRNA data for five organs in six species tha...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.140269.112

    authors: Meunier J,Lemoine F,Soumillon M,Liechti A,Weier M,Guschanski K,Hu H,Khaitovich P,Kaessmann H

    更新日期:2013-01-01 00:00:00

  • The effect of genotype and in utero environment on interindividual variation in neonate DNA methylomes.

    abstract::Integrating the genotype with epigenetic marks holds the promise of better understanding the biology that underlies the complex interactions of inherited and environmental components that define the developmental origins of a range of disorders. The quality of the in utero environment significantly influences health o...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.171439.113

    authors: Teh AL,Pan H,Chen L,Ong ML,Dogra S,Wong J,MacIsaac JL,Mah SM,McEwen LM,Saw SM,Godfrey KM,Chong YS,Kwek K,Kwoh CK,Soh SE,Chong MF,Barton S,Karnani N,Cheong CY,Buschdorf JP,Stünkel W,Kobor MS,Meaney MJ,Gluckma

    更新日期:2014-07-01 00:00:00

  • The pig X and Y Chromosomes: structure, sequence, and evolution.

    abstract::We have generated an improved assembly and gene annotation of the pig X Chromosome, and a first draft assembly of the pig Y Chromosome, by sequencing BAC and fosmid clones from Duroc animals and incorporating information from optical mapping and fiber-FISH. The X Chromosome carries 1033 annotated genes, 690 of which a...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.188839.114

    authors: Skinner BM,Sargent CA,Churcher C,Hunt T,Herrero J,Loveland JE,Dunn M,Louzada S,Fu B,Chow W,Gilbert J,Austin-Guest S,Beal K,Carvalho-Silva D,Cheng W,Gordon D,Grafham D,Hardy M,Harley J,Hauser H,Howden P,Howe K,

    更新日期:2016-01-01 00:00:00

  • A transposon-based strategy for sequencing repetitive DNA in eukaryotic genomes.

    abstract::Repetitive DNA is a significant component of eukaryotic genomes. We have developed a strategy to efficiently and accurately sequence repetitive DNA in the nematode Caenorhabditis elegans using integrated artificial transposons and automated fluorescent sequencing. Mapping and assembly tools represent important compone...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.7.5.551

    authors: Devine SE,Chissoe SL,Eby Y,Wilson RK,Boeke JD

    更新日期:1997-05-01 00:00:00

  • The human obese (OB) gene: RNA expression pattern and mapping on the physical, cytogenetic, and genetic maps of chromosome 7.

    abstract::The recently identified mouse obese (ob) gene apparently encodes a secreted protein that may function in the signaling pathway of adipose tissue. Mutations in the mouse ob gene are associated with the early development of gross obesity. A detailed knowledge concerning the RNA expression pattern and precise genomic loc...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5.1.5

    authors: Green ED,Maffei M,Braden VV,Proenca R,DeSilva U,Zhang Y,Chua SC Jr,Leibel RL,Weissenbach J,Friedman JM

    更新日期:1995-08-01 00:00:00

  • Systematic interrogation of human promoters.

    abstract::Despite much research, our understanding of the architecture and cis-regulatory elements of human promoters is still lacking. Here, we devised a high-throughput assay to quantify the activity of approximately 15,000 fully designed sequences that we integrated and expressed from a fixed location within the human genome...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.236075.118

    authors: Weingarten-Gabbay S,Nir R,Lubliner S,Sharon E,Kalma Y,Weinberger A,Segal E

    更新日期:2019-02-01 00:00:00

  • Sequence diversity and genomic organization of vomeronasal receptor genes in the mouse.

    abstract::The vomeronasal system of mice is thought to be specialized in the detection of pheromones. Two multigene families have been identified that encode proteins with seven putative transmembrane domains and that are expressed selectively in subsets of neurons of the vomeronasal organ. The products of these vomeronasal rec...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.10.12.1958

    authors: Del Punta K,Rothman A,Rodriguez I,Mombaerts P

    更新日期:2000-12-01 00:00:00

  • Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis.

    abstract::Through comparative studies of the model organism Arabidopsis thaliana and its close relative Brassica oleracea, we have identified conserved regions that represent potentially functional sequences overlooked by previous Arabidopsis genome annotation methods. A total of 454,274 whole genome shotgun sequences covering ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3176505

    authors: Ayele M,Haas BJ,Kumar N,Wu H,Xiao Y,Van Aken S,Utterback TR,Wortman JR,White OR,Town CD

    更新日期:2005-04-01 00:00:00

  • High-salt-recovered sequences are associated with the active chromosomal compartment and with large ribonucleoprotein complexes including nuclear bodies.

    abstract::The mammalian cell nucleus contains numerous discrete suborganelles named nuclear bodies. While recruitment of specific genomic regions into these large ribonucleoprotein (RNP) complexes critically contributes to higher-order functional chromatin organization, such regions remain ill-defined. We have developed the hig...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.237073.118

    authors: Baudement MO,Cournac A,Court F,Seveno M,Parrinello H,Reynes C,Sabatier R,Bouschet T,Yi Z,Sallis S,Tancelin M,Rebouissou C,Cathala G,Lesne A,Mozziconacci J,Journot L,Forné T

    更新日期:2018-11-01 00:00:00

  • Unamplified cap analysis of gene expression on a single-molecule sequencer.

    abstract::We report the development of a simplified cap analysis of gene expression (CAGE) protocol adapted for single-molecule sequencers that avoids second strand synthesis, ligation, digestion, and PCR. HeliScopeCAGE directly sequences the 3' end of cap trapped first-strand cDNAs. As with previous versions of CAGE, we better...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.115469.110

    authors: Kanamori-Katayama M,Itoh M,Kawaji H,Lassmann T,Katayama S,Kojima M,Bertin N,Kaiho A,Ninomiya N,Daub CO,Carninci P,Forrest AR,Hayashizaki Y

    更新日期:2011-07-01 00:00:00

  • Fourfold faster rate of genome rearrangement in nematodes than in Drosophila.

    abstract::We compared the genome of the nematode Caenorhabditis elegans to 13% of that of Caenorhabditis briggsae, identifying 252 conserved segments along their chromosomes. We detected 517 chromosomal rearrangements, with the ratio of translocations to inversions to transpositions being approximately 1:1:2. We estimate that t...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.172702

    authors: Coghlan A,Wolfe KH

    更新日期:2002-06-01 00:00:00