A complexity reduction algorithm for analysis and annotation of large genomic sequences.

Abstract:

:DNA is a universal language encrypted with biological instruction for life. In higher organisms, the genetic information is preserved predominantly in an organized exon/intron structure. When a gene is expressed, the exons are spliced together to form the transcript for protein synthesis. We have developed a complexity reduction algorithm for sequence analysis (CRASA) that enables direct alignment of cDNA sequences to the genome. This method features a progressive data structure in hierarchical orders to facilitate a fast and efficient search mechanism. CRASA implementation was tested with already annotated genomic sequences in two benchmark data sets and compared with 15 annotation programs (10 ab initio and 5 homology-based approaches) against the EST database. By the use of layered noise filters, the complexity of CRASA-matched data was reduced exponentially. The results from the benchmark tests showed that CRASA annotation excelled in both the sensitivity and specificity categories. When CRASA was applied to the analysis of human Chromosomes 21 and 22, an additional 83 potential genes were identified. With its large-scale processing capability, CRASA can be used as a robust tool for genome annotation with high accuracy by matching the EST sequences precisely to the genomic sequences.

journal_name

Genome Res

journal_title

Genome research

authors

Chuang TJ,Lin WC,Lee HC,Wang CW,Hsiao KL,Wang ZH,Shieh D,Lin SC,Ch'ang LY

doi

10.1101/gr.313703

subject

Has Abstract

pub_date

2003-02-01 00:00:00

pages

313-22

issue

2

eissn

1088-9051

issn

1549-5469

journal_volume

13

pub_type

杂志文章
  • Delineation of key regulatory elements identifies points of vulnerability in the mitogen-activated signaling network.

    abstract::Drug development efforts against cancer are often hampered by the complex properties of signaling networks. Here we combined the results of an RNAi screen targeting the cellular signaling machinery, with graph theoretical analysis to extract the core modules that process both mitogenic and oncogenic signals to drive c...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.116145.110

    authors: Jailkhani N,Ravichandran S,Hegde SR,Siddiqui Z,Mande SC,Rao KV

    更新日期:2011-12-01 00:00:00

  • A biometrical genome search in rats reveals the multigenic basis of blood pressure variation.

    abstract::A genome-wide search for multiple loci influencing salt-loaded systolic blood pressure (NaSBP) variation among 188 F2 progeny from a cross between the Brown-Norway and spontaneously hypertensive rat strains was pursued in an effort to gain insight into the polygenic basis of blood pressure regulation. The results sugg...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5.2.164

    authors: Schork NJ,Krieger JE,Trolliet MR,Franchini KG,Koike G,Krieger EM,Lander ES,Dzau VJ,Jacob HJ

    更新日期:1995-09-01 00:00:00

  • High mutational rates of large-scale duplication and deletion in Daphnia pulex.

    abstract::Knowledge of the genome-wide rate and spectrum of mutations is necessary to understand the origin of disease and the genetic variation driving all evolutionary processes. Here, we provide a genome-wide analysis of the rate and spectrum of mutations obtained in two Daphnia pulex genotypes via separate mutation-accumula...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.191338.115

    authors: Keith N,Tucker AE,Jackson CE,Sung W,Lucas Lledó JI,Schrider DR,Schaack S,Dudycha JL,Ackerman M,Younge AJ,Shaw JR,Lynch M

    更新日期:2016-01-01 00:00:00

  • Parente2: a fast and accurate method for detecting identity by descent.

    abstract::Identity-by-descent (IBD) inference is the problem of establishing a genetic connection between two individuals through a genomic segment that is inherited by both individuals from a recent common ancestor. IBD inference is an important preceding step in a variety of population genomic studies, ranging from demographi...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.173641.114

    authors: Rodriguez JM,Bercovici S,Huang L,Frostig R,Batzoglou S

    更新日期:2015-02-01 00:00:00

  • Efficient identification of Y chromosome sequences in the human and Drosophila genomes.

    abstract::Notwithstanding their biological importance, Y chromosomes remain poorly known in most species. A major obstacle to their study is the identification of Y chromosome sequences; due to its high content of repetitive DNA, in most genome projects, the Y chromosome sequence is fragmented into a large number of small, unma...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.156034.113

    authors: Carvalho AB,Clark AG

    更新日期:2013-11-01 00:00:00

  • Genome-wide A-to-I RNA editing in fungi independent of ADAR enzymes.

    abstract::Yeasts and filamentous fungi do not have adenosine deaminase acting on RNA (ADAR) orthologs and are believed to lack A-to-I RNA editing, which is the most prevalent editing of mRNA in animals. However, during this study with the PUK1(FGRRES_01058) pseudokinase gene important for sexual reproduction in Fusarium gramine...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.199877.115

    authors: Liu H,Wang Q,He Y,Chen L,Hao C,Jiang C,Li Y,Dai Y,Kang Z,Xu JR

    更新日期:2016-04-01 00:00:00

  • A positive but complex association between meiotic double-strand break hotspots and open chromatin in Saccharomyces cerevisiae.

    abstract::During meiosis, chromatin undergoes extensive changes to facilitate recombination, homolog pairing, and chromosome segregation. To investigate the relationship between chromatin organization and meiotic processes, we used formaldehyde-assisted isolation of regulatory elements (FAIRE) to map open chromatin during the t...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.096297.109

    authors: Berchowitz LE,Hanlon SE,Lieb JD,Copenhaver GP

    更新日期:2009-12-01 00:00:00

  • BRAFV600E remodels the melanocyte transcriptome and induces BANCR to regulate melanoma cell migration.

    abstract::Aberrations of protein-coding genes are a focus of cancer genomics; however, the impact of oncogenes on expression of the ~50% of transcripts without protein-coding potential, including long noncoding RNAs (lncRNAs), has been largely uncharacterized. Activating mutations in the BRAF oncogene are present in >70% of mel...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.140061.112

    authors: Flockhart RJ,Webster DE,Qu K,Mascarenhas N,Kovalski J,Kretz M,Khavari PA

    更新日期:2012-06-01 00:00:00

  • The human phosphotyrosine signaling network: evolution and hotspots of hijacking in cancer.

    abstract::Phosphotyrosine (pTyr) signaling, which plays a central role in cell-cell and cell-environment interactions, has been considered to be an evolutionary innovation in multicellular metazoans. However, neither the emergence nor the evolution of the human pTyr signaling system is currently understood. Tyrosine kinase (TK)...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.128819.111

    authors: Li L,Tibiche C,Fu C,Kaneko T,Moran MF,Schiller MR,Li SS,Wang E

    更新日期:2012-07-01 00:00:00

  • MicroRNAs reinforce repression of PRC2 transcriptional targets independently and through a feed-forward regulatory network.

    abstract::Gene expression can be regulated at multiple levels, but it is not known if and how there is broad coordination between regulation at the transcriptional and post-transcriptional levels. Transcription factors and chromatin regulate gene expression transcriptionally, whereas microRNAs (miRNAs) are small regulatory RNAs...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.238311.118

    authors: Shivram H,Le SV,Iyer VR

    更新日期:2019-02-01 00:00:00

  • Enzymatic regional methylation assay: a novel method to quantify regional CpG methylation density.

    abstract::We have developed a novel quantitative method for rapidly assessing the CpG methylation density of a DNA region in mammalian cells. After bisulfite modification of genomic DNA, the region of interest is PCR amplified with primers containing two dam sites (GATC). The purified PCR products are then incubated with 14C-la...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.202501

    authors: Galm O,Rountree MR,Bachman KE,Jair KW,Baylin SB,Herman JG

    更新日期:2002-01-01 00:00:00

  • Time course regulatory analysis based on paired expression and chromatin accessibility data.

    abstract::A time course experiment is a widely used design in the study of cellular processes such as differentiation or response to stimuli. In this paper, we propose time course regulatory analysis (TimeReg) as a method for the analysis of gene regulatory networks based on paired gene expression and chromatin accessibility da...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.257063.119

    authors: Duren Z,Chen X,Xin J,Wang Y,Wong WH

    更新日期:2020-04-01 00:00:00

  • A genome-wide study of dual coding regions in human alternatively spliced genes.

    abstract::Alternative splicing is a major mechanism for gene product regulation in many multicellular organisms. By using different exon combinations, some coding regions can encode amino acids in multiple reading frames in different transcripts. Here we performed a systematic search through a set of high-quality human transcri...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4246506

    authors: Liang H,Landweber LF

    更新日期:2006-02-01 00:00:00

  • Species-specific class I gene expansions formed the telomeric 1 mb of the mouse major histocompatibility complex.

    abstract::We have determined the complete sequence of 951,695 bp from the class I region of H2, the mouse major histocompatibility complex (Mhc) from strain 129/Sv (haplotype bc). The sequence contains 26 genes. The sequence spans from the last 50 kb of the H2-T region, including 2 class I genes and 3 class I pseudogenes, and i...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.975303

    authors: Takada T,Kumánovics A,Amadou C,Yoshino M,Jones EP,Athanasiou M,Evans GA,Fischer Lindahl K

    更新日期:2003-04-01 00:00:00

  • Broad-spectrum respiratory tract pathogen identification using resequencing DNA microarrays.

    abstract::The exponential growth of pathogen nucleic acid sequences available in public domain databases has invited their direct use in pathogen detection, identification, and surveillance strategies. DNA microarray technology has offered the potential for the direct DNA sequence analysis of a broad spectrum of pathogens of in...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4337206

    authors: Lin B,Wang Z,Vora GJ,Thornton JA,Schnur JM,Thach DC,Blaney KM,Ligler AG,Malanoski AP,Santiago J,Walter EA,Agan BK,Metzgar D,Seto D,Daum LT,Kruzelock R,Rowley RK,Hanson EH,Tibbetts C,Stenger DA

    更新日期:2006-04-01 00:00:00

  • Signatures of domain shuffling in the human genome.

    abstract::To elucidate the role of exon shuffling in shaping the complexity of the human genome/proteome, we have systematically analyzed intron phase distributions in the coding sequence of human protein domains. We found that introns at the boundaries of domains show high excess of symmetrical phase combinations (i.e., 0-0, 1...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.520702

    authors: Kaessmann H,Zöllner S,Nekrutenko A,Li WH

    更新日期:2002-11-01 00:00:00

  • A scalable high-throughput chemical synthesizer.

    abstract::A machine that employs a novel reagent delivery technique for biomolecular synthesis has been developed. This machine separates the addressing of individual synthesis sites from the actual process of reagent delivery by using masks placed over the sites. Because of this separation, this machine is both cost-effective ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.359002

    authors: Livesay EA,Liu YH,Luebke KJ,Irick J,Belosludtsev Y,Rayner S,Balog R,Johnston SA

    更新日期:2002-12-01 00:00:00

  • Complete genomic sequence and analysis of the prion protein gene region from three mammalian species.

    abstract::The prion protein (PrP), first identified in scrapie-infected rodents, is encoded by a single exon of a single-copy chromosomal gene. In addition to the protein-coding exon, PrP genes in mammals contain one or two 5'-noncoding exons. To learn more about the genomic organization of regions surrounding the PrP exons, we...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.8.10.1022

    authors: Lee IY,Westaway D,Smit AF,Wang K,Seto J,Chen L,Acharya C,Ankener M,Baskin D,Cooper C,Yao H,Prusiner SB,Hood LE

    更新日期:1998-10-01 00:00:00

  • A pooling-based approach to mapping genetic variants associated with DNA methylation.

    abstract::DNA methylation is an epigenetic modification that plays a key role in gene regulation. Previous studies have investigated its genetic basis by mapping genetic variants that are associated with DNA methylation at specific sites, but these have been limited to microarrays that cover <2% of the genome and cannot account...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.183749.114

    authors: Kaplow IM,MacIsaac JL,Mah SM,McEwen LM,Kobor MS,Fraser HB

    更新日期:2015-06-01 00:00:00

  • Genomic organization of TEL: the human ETS-variant gene 6.

    abstract::We have constructed a detailed map of the genomic region containing the ETS-variant gene 6 (ETV6), involved in translocations and deletions associated with hematologic malignancies. Thirty-eight cosmids were characterized belonging to two contigs spanning 340 kb, and an EcoRl restriction map was developed. The gap bet...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6.5.404

    authors: Baens M,Peeters P,Guo C,Aerssens J,Marynen P

    更新日期:1996-05-01 00:00:00

  • An analysis of the gene complement of a marsupial, Monodelphis domestica: evolution of lineage-specific genes and giant chromosomes.

    abstract::The newly sequenced genome of Monodelphis domestica not only provides the out-group necessary to better understand our own eutherian lineage, but it enables insights into the innovative biology of metatherians. Here, we compare Monodelphis with Homo sequences from alignments of single nucleotides, genes, and whole chr...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6093907

    authors: Goodstadt L,Heger A,Webber C,Ponting CP

    更新日期:2007-07-01 00:00:00

  • Translation initiation downstream from annotated start codons in human mRNAs coevolves with the Kozak context.

    abstract::Eukaryotic translation initiation involves preinitiation ribosomal complex 5'-to-3' directional probing of mRNA for codons suitable for starting protein synthesis. The recognition of codons as starts depends on the codon identity and on its immediate nucleotide context known as Kozak context. When the context is weak ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.257352.119

    authors: Benitez-Cantos MS,Yordanova MM,O'Connor PBF,Zhdanov AV,Kovalchuk SI,Papkovsky DB,Andreev DE,Baranov PV

    更新日期:2020-07-01 00:00:00

  • Independent evolution of transcript abundance and gene regulatory dynamics.

    abstract::Changes in gene expression drive novel phenotypes, raising interest in how gene expression evolves. In contrast to the static genome, cells modulate gene expression in response to changing environments. Previous comparative studies focused on specific conditions, describing interspecies variation in expression levels,...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.261537.120

    authors: Krieger G,Lupo O,Levy AA,Barkai N

    更新日期:2020-07-01 00:00:00

  • The effect of genotype and in utero environment on interindividual variation in neonate DNA methylomes.

    abstract::Integrating the genotype with epigenetic marks holds the promise of better understanding the biology that underlies the complex interactions of inherited and environmental components that define the developmental origins of a range of disorders. The quality of the in utero environment significantly influences health o...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.171439.113

    authors: Teh AL,Pan H,Chen L,Ong ML,Dogra S,Wong J,MacIsaac JL,Mah SM,McEwen LM,Saw SM,Godfrey KM,Chong YS,Kwek K,Kwoh CK,Soh SE,Chong MF,Barton S,Karnani N,Cheong CY,Buschdorf JP,Stünkel W,Kobor MS,Meaney MJ,Gluckma

    更新日期:2014-07-01 00:00:00

  • Asymmetric nucleosomes flank promoters in the budding yeast genome.

    abstract::Nucleosomes in active chromatin are dynamic, but whether they have distinct structural conformations is unknown. To identify nucleosomes with alternative structures genome-wide, we used H4S47C-anchored cleavage mapping, which revealed that 5% of budding yeast (Saccharomyces cerevisiae) nucleosome positions have asymme...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.182618.114

    authors: Ramachandran S,Zentner GE,Henikoff S

    更新日期:2015-03-01 00:00:00

  • DIG-seq: a genome-wide CRISPR off-target profiling method using chromatin DNA.

    abstract::To investigate whether and how CRISPR-Cas9 on-target and off-target activities are affected by chromatin in eukaryotic cells, we first identified a series of identical endogenous DNA sequences present in both open and closed chromatin regions and then measured mutation frequencies at these sites in human cells using C...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.236620.118

    authors: Kim D,Kim JS

    更新日期:2018-12-01 00:00:00

  • Nutritional control of mRNA isoform expression during developmental arrest and recovery in C. elegans.

    abstract::Nutrient availability profoundly influences gene expression. Many animal genes encode multiple transcript isoforms, yet the effect of nutrient availability on transcript isoform expression has not been studied in genome-wide fashion. When Caenorhabditis elegans larvae hatch without food, they arrest development in the...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.133587.111

    authors: Maxwell CS,Antoshechkin I,Kurhanewicz N,Belsky JA,Baugh LR

    更新日期:2012-10-01 00:00:00

  • Transcriptional enhancement by GATA1-occupied DNA segments is strongly associated with evolutionary constraint on the binding site motif.

    abstract::Tissue development and function are exquisitely dependent on proper regulation of gene expression, but it remains controversial whether the genomic signals controlling this process are subject to strong selective constraint. While some studies show that highly constrained noncoding regions act to enhance transcription...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.083089.108

    authors: Cheng Y,King DC,Dore LC,Zhang X,Zhou Y,Zhang Y,Dorman C,Abebe D,Kumar SA,Chiaromonte F,Miller W,Green RD,Weiss MJ,Hardison RC

    更新日期:2008-12-01 00:00:00

  • A predictive model for regulatory sequences directing liver-specific transcription.

    abstract::The identification and interpretation of the regulatory signals within the human genome remain among the greatest goals and most difficult challenges in genome analysis. The ability to predict the temporal and spatial control of transcription is likely to require a combination of methods to address the contribution of...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.180601

    authors: Krivan W,Wasserman WW

    更新日期:2001-09-01 00:00:00

  • Prioritizing candidate disease genes by network-based boosting of genome-wide association data.

    abstract::Network "guilt by association" (GBA) is a proven approach for identifying novel disease genes based on the observation that similar mutational phenotypes arise from functionally related genes. In principle, this approach could account even for nonadditive genetic interactions, which underlie the synergistic combinatio...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.118992.110

    authors: Lee I,Blom UM,Wang PI,Shim JE,Marcotte EM

    更新日期:2011-07-01 00:00:00