A deep learning framework for modeling structural features of RNA-binding protein targets.

Abstract:

:RNA-binding proteins (RBPs) play important roles in the post-transcriptional control of RNAs. Identifying RBP binding sites and characterizing RBP binding preferences are key steps toward understanding the basic mechanisms of the post-transcriptional gene regulation. Though numerous computational methods have been developed for modeling RBP binding preferences, discovering a complete structural representation of the RBP targets by integrating their available structural features in all three dimensions is still a challenging task. In this paper, we develop a general and flexible deep learning framework for modeling structural binding preferences and predicting binding sites of RBPs, which takes (predicted) RNA tertiary structural information into account for the first time. Our framework constructs a unified representation that characterizes the structural specificities of RBP targets in all three dimensions, which can be further used to predict novel candidate binding sites and discover potential binding motifs. Through testing on the real CLIP-seq datasets, we have demonstrated that our deep learning framework can automatically extract effective hidden structural features from the encoded raw sequence and structural profiles, and predict accurate RBP binding sites. In addition, we have conducted the first study to show that integrating the additional RNA tertiary structural features can improve the model performance in predicting RBP binding sites, especially for the polypyrimidine tract-binding protein (PTB), which also provides a new evidence to support the view that RBPs may own specific tertiary structural binding preferences. In particular, the tests on the internal ribosome entry site (IRES) segments yield satisfiable results with experimental support from the literature and further demonstrate the necessity of incorporating RNA tertiary structural information into the prediction model. The source code of our approach can be found in https://github.com/thucombio/deepnet-rbp.

journal_name

Nucleic Acids Res

journal_title

Nucleic acids research

authors

Zhang S,Zhou J,Hu H,Gong H,Chen L,Cheng C,Zeng J

doi

10.1093/nar/gkv1025

subject

Has Abstract

pub_date

2016-02-29 00:00:00

pages

e32

issue

4

eissn

0305-1048

issn

1362-4962

pii

gkv1025

journal_volume

44

pub_type

杂志文章
  • Database resources of the National Center for Biotechnology Information.

    abstract::The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. The Entrez system provides search and re...

    journal_title:Nucleic acids research

    pub_type: 杂志文章,评审

    doi:10.1093/nar/gkaa892

    authors: Sayers EW,Beck J,Bolton EE,Bourexis D,Brister JR,Canese K,Comeau DC,Funk K,Kim S,Klimke W,Marchler-Bauer A,Landrum M,Lathrop S,Lu Z,Madden TL,O'Leary N,Phan L,Rangwala SH,Schneider VA,Skripchenko Y,Wang J,Ye J,

    更新日期:2021-01-08 00:00:00

  • Elongation complexes of Thermus thermophilus RNA polymerase that possess distinct translocation conformations.

    abstract::We have characterized elongation complexes (ECs) of RNA polymerase from the extremely thermophilic bacterium, Thermus thermophilus. We found that complexes assembled on nucleic acid scaffolds are transcriptionally competent at high temperature (50-80 degrees C) and, depending upon the organization of the scaffold, pos...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkl559

    authors: Kashkina E,Anikin M,Tahirov TH,Kochetkov SN,Vassylyev DG,Temiakov D

    更新日期:2006-01-01 00:00:00

  • Determining selection free energetics from nucleotide pre-insertion to insertion in viral T7 RNA polymerase transcription fidelity control.

    abstract::An elongation cycle of a transcribing RNA polymerase (RNAP) usually consists of multiple kinetics steps, so there exist multiple kinetic checkpoints where non-cognate nucleotides can be selected against. We conducted comprehensive free energy calculations on various nucleotide insertions for viral T7 RNAP employing al...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz213

    authors: Long C,E C,Da LT,Yu J

    更新日期:2019-05-21 00:00:00

  • Physical map of Neurospora crassa mitochondrial DNA and its transcription unit for ribosomal RNA.

    abstract::A circular denaturation and restriction map of mitochondrial DNA from Neurospora crassa is presented. The map shows the position of all twelve fragments produced by restriction endonuclease Eco R I and the position of the largest Hin III fragment along the previously established map of AT-rich sequences. The two wild ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/3.11.3101

    authors: Bernard U,Goldthwaite C,Küntzel H

    更新日期:1976-11-01 00:00:00

  • DNA stretching on functionalized gold surfaces.

    abstract::We describe a method for anchoring bacteriophage lambda DNA by one end to gold by Au-biotin-streptavidin-biotin-DNA bonds. DNA anchored to a microfabricated Au line could be aligned and stretched in flow and electric fields. The anchor was shown to resist a force of at least 11 pN, a linkage strong enough to allow DNA...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/22.3.492

    authors: Zimmermann RM,Cox EC

    更新日期:1994-02-11 00:00:00

  • In situ hybridization with fluoresceinated DNA.

    abstract::We have used fluorescein-11-dUTP in a nick-translation format to produce fluoresceinated human nucleic acid probes. After in situ hybridization of fluoresceinated DNAs to human metaphase chromosomes, the detection sensitivity was found to be 50-100 kb. The feasibility and the increase in detection sensitivity of micro...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/19.12.3237

    authors: Wiegant J,Ried T,Nederlof PM,van der Ploeg M,Tanke HJ,Raap AK

    更新日期:1991-06-25 00:00:00

  • Structure determination of a nucleoside Q precursor isolated from E. coli tRNA: 7-(aminomethyl)-7-deazaguanosine.

    abstract::A precursor of modified nucleoside Q isolated from E. coli methyl-deficient tRNA was determined to be 7-(aminomethyl)-7-deazaguanosine. The structure was deduced by means of its chromatographic and electrophoretic mobilities, and UV and mass spectra, in addition to comparison with the synthesized authentic compound. T...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/5.7.2289

    authors: Okada N,Noguchi S,Nishimura S,Ohgi T,Goto T,Crain PF,McCloskey JA

    更新日期:1978-07-01 00:00:00

  • Subtle structural alterations in G-quadruplex DNA regulate site specificity of fluorescence light-up probes.

    abstract::G-quadruplex (G4) DNA structures are linked to key biological processes and human diseases. Small molecules that target specific G4 DNA structures and signal their presence would therefore be of great value as chemical research tools with potential to further advance towards diagnostic and therapeutic developments. Ho...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz1205

    authors: Kumar R,Chand K,Bhowmik S,Das RN,Bhattacharjee S,Hedenström M,Chorell E

    更新日期:2020-02-20 00:00:00

  • Genome engineering of isogenic human ES cells to model autism disorders.

    abstract::Isogenic pluripotent stem cells are critical tools for studying human neurological diseases by allowing one to study the effects of a mutation in a fixed genetic background. Of particular interest are the spectrum of autism disorders, some of which are monogenic such as Timothy syndrome (TS); others are multigenic suc...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv164

    authors: Martinez RA,Stein JL,Krostag AR,Nelson AM,Marken JS,Menon V,May RC,Yao Z,Kaykas A,Geschwind DH,Grimley JS

    更新日期:2015-05-26 00:00:00

  • Localized structural frustration for evaluating the impact of sequence variants.

    abstract::Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype-genotype associations. Protein structures provide a way of addr...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw927

    authors: Kumar S,Clarke D,Gerstein M

    更新日期:2016-12-01 00:00:00

  • Rates of formation and thermal stabilities of RNA:DNA and DNA:DNA duplexes at high concentrations of formamide.

    abstract::The thermal stabilities of RNA:DNA hybrids are substantially greater than those of DNA:DNA duplexes in aqueous electrolyte solutions containing high concentrations of formamide. Association rates to form DNA:DNA duplexes and DNA:RNA hybrids have been measured in these solvents. There is a temperature range in which DN...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/4.5.1539

    authors: Casey J,Davidson N

    更新日期:1977-01-01 00:00:00

  • Comparative sequence analysis as a tool for studying the secondary structure of mRNAs.

    abstract::Analysis of phylogenetically conserved secondary structure has been important in the development of models for the secondary structure of structural RNAs. In this paper, we apply this type of analysis to several families of informational RNAs to evaluate its usefulness in developing secondary structure models for mRNA...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/13.23.8645

    authors: Browner MF,Lawrence CB

    更新日期:1985-12-09 00:00:00

  • Y box-binding protein-1 binds preferentially to single-stranded nucleic acids and exhibits 3'-->5' exonuclease activity.

    abstract::We have previously shown that Y box-binding protein-1 (YB-1) binds preferentially to cisplatin-modified Y box sequences. Based on structural and biochemical data, we predicted that this protein binds single-stranded nucleic acids. In the present study we confirmed the prediction and also discovered some unexpected fun...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/29.5.1200

    authors: Izumi H,Imamura T,Nagatani G,Ise T,Murakami T,Uramoto H,Torigoe T,Ishiguchi H,Yoshida Y,Nomoto M,Okamoto T,Uchiumi T,Kuwano M,Funa K,Kohno K

    更新日期:2001-03-01 00:00:00

  • CHOP: visualization of 'wobbling' and isolation of highly conserved regions from aligned DNA sequences.

    abstract::The web software CHOP was developed to visualize the 'wobbling' in the third codon position of aligned DNA sequences. The simple features of this tool allow users to easily find regions suspected of containing coding sequences (CDSs). The program also allows visualization of the nucleotide diversity between two genomi...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkh448

    authors: Ohtsuka M,Horiuchi S,Kulski JK,Kimura M,Inoko H

    更新日期:2004-07-01 00:00:00

  • The transcription factor Sox5 modulates Sox10 function during melanocyte development.

    abstract::The transcription factor Sox5 has previously been shown in chicken to be expressed in early neural crest cells and neural crest-derived peripheral glia. Here, we show in mouse that Sox5 expression also continues after neural crest specification in the melanocyte lineage. Despite its continued expression, Sox5 has litt...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkn527

    authors: Stolt CC,Lommes P,Hillgärtner S,Wegner M

    更新日期:2008-10-01 00:00:00

  • Galahad: a web server for drug effect analysis from gene expression.

    abstract::Galahad (https://galahad.esat.kuleuven.be) is a web-based application for analysis of drug effects. It provides an intuitive interface to be used by anybody interested in leveraging microarray data to gain insights into the pharmacological effects of a drug, mainly identification of candidate targets, elucidation of m...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv436

    authors: Laenen G,Ardeshirdavani A,Moreau Y,Thorrez L

    更新日期:2015-07-01 00:00:00

  • The MH1 domain of Smad3 interacts with Pax6 and represses autoregulation of the Pax6 P1 promoter.

    abstract::Pax6 transcription is under the control of two main promoters (P0 and P1), and these are autoregulated by Pax6. Additionally, Pax6 expression is under the control of the TGFbeta superfamily, although the precise mechanisms of such regulation are not understood. The effect of TGFbeta on Pax6 expression was studied in t...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkl1105

    authors: Grocott T,Frost V,Maillard M,Johansen T,Wheeler GN,Dawes LJ,Wormstone IM,Chantry A

    更新日期:2007-01-01 00:00:00

  • Massive gene acquisitions in Mycobacterium indicus pranii provide a perspective on mycobacterial evolution.

    abstract::Understanding the evolutionary and genomic mechanisms responsible for turning the soil-derived saprophytic mycobacteria into lethal intracellular pathogens is a critical step towards the development of strategies for the control of mycobacterial diseases. In this context, Mycobacterium indicus pranii (MIP) is of speci...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gks793

    authors: Saini V,Raghuvanshi S,Khurana JP,Ahmed N,Hasnain SE,Tyagi AK,Tyagi AK

    更新日期:2012-11-01 00:00:00

  • High-throughput chromatin information enables accurate tissue-specific prediction of transcription factor binding sites.

    abstract::In silico prediction of transcription factor binding sites (TFBSs) is central to the task of gene regulatory network elucidation. Genomic DNA sequence information provides a basis for these predictions, due to the sequence specificity of TF-binding events. However, DNA sequence alone is an impoverished source of infor...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkn866

    authors: Whitington T,Perkins AC,Bailey TL

    更新日期:2009-01-01 00:00:00

  • A novel family of retrotransposon-like elements in Xenopus laevis with a transcript inducible by two growth factors.

    abstract::A cDNA clone named 1A11 was isolated in a screen for genes that are activated by both mesoderm inducing factors FGF and activin in animal explants of Xenopus laevis embryos. In undisturbed embryos, 1A11 is expressed during the gastrula stage in the entire marginal zone where mesoderm originates, and later in the somit...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/21.10.2375

    authors: Greene JM,Otani H,Good PJ,Dawid IB

    更新日期:1993-05-25 00:00:00

  • Interplay between GCN2 and GCN4 expression, translation elongation factor 1 mutations and translational fidelity in yeast.

    abstract::Genetic screens in Saccharomyces cerevisiae have identified the roles of ribosome components, tRNAs and translation factors in translational fidelity. These screens rely on the suppression of altered start codons, nonsense codons or frameshift mutations in genes involved in amino acid or nucleotide metabolism. Many of...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gki765

    authors: Magazinnik T,Anand M,Sattlegger E,Hinnebusch AG,Kinzy TG

    更新日期:2005-08-12 00:00:00

  • An obligate intermediate along the slow folding pathway of a group II intron ribozyme.

    abstract::Most RNA molecules collapse rapidly and reach the native state through a pathway that contains numerous traps and unproductive intermediates. The D135 group II intron ribozyme is unusual in that it can fold slowly and directly to the native state, despite its large size and structural complexity. Here we use hydroxyl ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gki973

    authors: Su LJ,Waldsich C,Pyle AM

    更新日期:2005-11-27 00:00:00

  • Oxidative damage diminishes mitochondrial DNA polymerase replication fidelity.

    abstract::Mitochondrial DNA (mtDNA) resides in a high ROS environment and suffers more mutations than its nuclear counterpart. Increasing evidence suggests that mtDNA mutations are not the results of direct oxidative damage, rather are caused, at least in part, by DNA replication errors. To understand how the mtDNA replicase, P...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz1018

    authors: Anderson AP,Luo X,Russell W,Yin YW

    更新日期:2020-01-24 00:00:00

  • Analysis of the proximal transcriptional element of the myelin basic protein gene.

    abstract::The gene encoding myelin basic protein (MBP) contains multiple activator sequences spanning upstream of its transcriptional initiation site which differentially promote transcription in glial cells. The proximal activator sequence, designated MB1, activates transcription in a glial cell type specific manner. This sequ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/20.3.545

    authors: Devine-Beach K,Haas S,Khalili K

    更新日期:1992-02-11 00:00:00

  • A conformational study of some adenosines by use of nuclear Overhauser effect.

    abstract::Conformations of 8-bromo-2'-[unk]-triisopropylbenzenesulfonyladenosine ([unk]) and its 3'-[unk]-isomer ([unk]) in solution have been determined by the use of intramolecular nuclear Overhauser effects in (1)H NMR spectroscopy. Compound [unk] has been proved to have a conformation in which the adenosine and benzene ring...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/1.6.823

    authors: Ueyama M,Tori K,Ikehara M,Kaneko M

    更新日期:1974-06-01 00:00:00

  • Mitome: dynamic and interactive database for comparative mitochondrial genomics in metazoan animals.

    abstract::Mitome is a specialized mitochondrial genome database designed for easy comparative analysis of various features of metazoan mitochondrial genomes such as base frequency, A+T skew, codon usage and gene arrangement pattern. A particular function of the database is the automatic reconstruction of phylogenetic relationsh...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkm763

    authors: Lee YS,Oh J,Kim YU,Kim N,Yang S,Hwang UW

    更新日期:2008-01-01 00:00:00

  • Primer specific and mispair extension analysis (PSMEA) as a simple approach to fast genotyping.

    abstract::A simple method, primer specific and mispair extension analysis (PSMEA) with pfu DNA polymerase was developed for genotyping. PSMEA is based on the unique properties of 3'-->5' exonuclease proofreading activity. In the presence of an incomplete set of dNTPs, pfu was found to be extremely discriminative in nucleotide i...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/26.21.5013

    authors: Hu YW,Balaskas E,Kessler G,Issid C,Scully LJ,Murphy DG,Rinfret A,Giulivi A,Scalia V,Gill P

    更新日期:1998-11-01 00:00:00

  • REDIdb: the RNA editing database.

    abstract::The RNA Editing Database (REDIdb) is an interactive, web-based database created and designed with the aim to allocate RNA editing events such as substitutions, insertions and deletions occurring in a wide range of organisms. The database contains both fully and partially sequenced DNA molecules for which editing infor...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkl793

    authors: Picardi E,Regina TM,Brennicke A,Quagliariello C

    更新日期:2007-01-01 00:00:00

  • Involvement of a nuclear matrix association region in the regulation of the SPRR2A keratinocyte terminal differentiation marker.

    abstract::The small proline-rich protein genes ( SPRRs ) code for precursors of the cornified cell envelope, and are specifically expressed during keratinocyte terminal differentiation. The single intron of SPRR2A enhanced the activity of the SPRR2A promoter in transient transfection assays. This enhancement was position depend...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/26.23.5288

    authors: Fischer DF,van Drunen CM,Winkler GS,van de Putte P,Backendorf C

    更新日期:1998-12-01 00:00:00

  • CNIT: a fast and accurate web tool for identifying protein-coding and long non-coding transcripts based on intrinsic sequence composition.

    abstract::As more and more high-throughput data has been produced by next-generation sequencing, it is still a challenge to classify RNA transcripts into protein-coding or non-coding, especially for poorly annotated species. We upgraded our original coding potential calculator, CNCI (Coding-Non-Coding Index), to CNIT (Coding-No...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz400

    authors: Guo JC,Fang SS,Wu Y,Zhang JH,Chen Y,Liu J,Wu B,Wu JR,Li EM,Xu LY,Sun L,Zhao Y

    更新日期:2019-07-02 00:00:00