Pages

Thursday, November 6, 2014

A census of human RNA-binding proteins

Nature Reviews Genetics | Analysis

 Analyses of post-transcriptional gene regulation and the protein factors involved have been substantially driven forward by technological advances such as next-generation sequencing and modern protein mass spectrometry. This Analysis provides a census of 1,542 manually curated RNA-binding proteins, for which the authors have investigated interactions with different classes of RNA, evolutionary conservation, abundance and tissue-specific expression.
 

Abstract
 
Post-transcriptional gene regulation (PTGR) concerns processes involved in the maturation, transport, stability and translation of coding and non-coding RNAs. RNA-binding proteins (RBPs) and ribonucleoproteins coordinate RNA processing and PTGR. The introduction of large-scale quantitative methods, such as next-generation sequencing and modern protein mass spectrometry, has renewed interest in the investigation of PTGR and the protein factors involved at a systems-biology level. Here, we present a census of 1,542 manually curated RBPs that we have analysed for their interactions with different classes of RNA, their evolutionary conservation, their abundance and their tissue-specific expression. Our analysis is a critical step towards the comprehensive characterization of proteins involved in human RNA metabolism. 

Key points

  • Recent advances in next-generation sequencing methods and quantitative mass spectrometry have renewed the interest in RNA biology and the genome-wide investigation of post-transcriptional gene regulatory proteins. A global census that systematically lists the number of factors involved in post-transcriptional gene regulation (PTGR) is currently not available. Here, we provide an overall summary of the proteins involved in interactions with all classes of RNAs based on our current knowledge of PTGR; this will guide future systems-wide studies of PTGR.
  • RNA-binding proteins (RBPs) are evolutionarily deeply conserved, and their structural domains diversified early in evolution.
  • RBPs are among the most abundant proteins in the cell and are generally ubiquitously expressed, which mirrors their central and conserved role in gene regulation.
  • Only ~2% of RBPs are tissue-specific, and most of these are mRNA- and non-coding RNA-binding proteins.
  • Diseases involving RBPs show characteristic phenotypes depending on the type of RNA (for example, mRNA, ribosomal RNA and tRNA) predominantly bound by the RBPs.
  • Correlated expression of RBPs across developmental processes can identify factors in shared PTGR pathways.

Introduction

Post-transcriptional gene regulation (PTGR) is essential to sustain cellular metabolism, coordinating maturation, transport, stability and degradation of all classes of RNAs (Fig. 1). Mechanistically, each of these events is regulated by the formation of different ribonucleoprotein (RNP) complexes with RNA-binding proteins (RBPs) at their core. Initially, it was thought that RNA mainly served either as the template, in the form of mRNA, or as an adaptor or a structural component during protein synthesis, provided by tRNAs and ribosomal RNAs. With the discovery of catalytic RNAs and a multitude of non-coding RNA (ncRNA) species, it was recognized that RNA is a highly versatile molecule that carries out many regulatory functions in the cell, either by acting as a guide to recognize RNA sequence motifs or RNA recognition elements present in their target RNAs, or by functioning as a scaffold and assembly platform for recruiting proteins to act synergistically1. The characterization of the proteins transiently or stably interacting with RNAs is a prerequisite for the dissection of RNA regulatory processes.
Figure 1: Overview of the main post-transcriptional gene regulation pathways in eukaryotes.
An overview is given for the biogenesis, decay and function of the most abundant RNAs: tRNAs, ribosomal RNAs, small nuclear RNAs (snRNAs), small nucleolar RNAs (snoRNAs), mRNAs, microRNAs (miRNAs), PIWI-interacting RNAs (piRNAs) and long non-coding RNAs (lncRNAs). Processes are described from left to right. Referenced gene names and complexes in the figure are listed in Supplementary information S3 (table) and within the listed references. a | tRNAs are transcribed by RNA polymerase III (Pol III); the 5′ leader and 3′ trailer sequences are removed, introns are spliced, and the ends are joined. CCA nucleotides are added to 3′ ends, and nucleotide modifications — such as methylation (M), pseudouridylation (ψ) and deamination of adenosines to inosines (I) — are introduced before tRNA aminoacylation195. b | The 5S rRNA is transcribed by Pol III, whereas 28S, 18S and 5.8S rRNAs are transcribed as one transcript by Pol I. The precursor is processed by RNA exonucleases, endonucleases and the ribonucleoprotein (RNP) RNase MRP, guided by U3 small nucleolar RNP (snoRNP). Nucleotide modifications are introduced by snoRNPs. rRNAs are assembled together with ribosomal proteins into ribosomal precursor complexes in the nucleus and transported to the cytoplasm, where they mature to functional ribosomes92, 196, 197. c | Most snRNAs are transcribed by Pol II, capped and processed in the nucleus. When exported to the cytoplasm, they undergo methylation and assemble with LSM proteins into small nuclear ribonucleic particles (snRNPs) in a process aided by the survival motor neuron 1 (SMN1). These snRNPs are re-imported into the Cajal body (CB) within the nucleus, where they undergo final maturation and snRNP assembly81. U6 and U6atac snRNAs are transcribed by Pol III and are alternatively processed in the nucleus and the nucleolus198. Mature snRNPs form the core of the spliceosome. d | snoRNAs and small Cajal body-specific RNAs (scaRNAs) are processed from mRNA introns, capped and modified before they assemble into snoRNPs or scaRNPs in the CB. snoRNPs and scaRNPs carry out methylation and pseudouridylation of rRNAs, snoRNAs and snRNAs, or function in rRNA processing (for example, processing of U3 snoRNA)81. e | mRNAs are transcribed by Pol II, capped, spliced, edited and polyadenylated in the nucleus. Correctly matured mRNAs are exported into the cytoplasm. Regulatory RNA-binding proteins (RBPs) control correct translation, monitor stability, decay and localization, and shuttle mRNAs between actively translating ribosomes, stress granules and P bodies37, 141, 142, 199, 200, 201, 202. f | miRNAs are either transcribed from separate genes by Pol II as long primary miRNA (pri-miRNA) transcripts or expressed from mRNA introns (mirtrons) and processed into hairpin pre-miRNAs in the nucleus. After transport into the cytoplasm, they are processed into 21-nucleotide-long double-stranded RNAs. One strand is incorporated into Argonaute (AGO) proteins (forming miRNA-containing RNPs (miRNPs)) and guides them to partially complementary target mRNAs to recruit deadenylases and repress translation203. g | piRNAs are ~28-nucleotides-long, germline-specific small RNAs. Primary piRNAs are directly processed and assembled from long, Pol II-transcribed precursor transcripts, whereas secondary piRNAs are generated in the 'ping pong' cycle by the cleavage of complementary transcripts by PIWI proteins. Mature piRNAs are 2′-O-methylated and incorporated into PIWI proteins. The piRNA–PIWI complexes (piRNPs) silence transposable elements (TEs) either by endonucleolytic cleavage in the cytoplasm or through transcriptional silencing at their genomic loci in the nucleus107. h | Most lncRNAs are transcribed and processed in a similar way to mRNAs. Nuclear lncRNAs play an active part in gene regulation by directing proteins to specific gene loci, where they recruit chromatin modification complexes and induce transcriptional silencing or activation185. Other non-coding RNAs (for example, 7SK RNA) regulate transcription elongation rates204 or induce the formation of paraspeckles (PS)205. Cytoplasmic non-coding RNAs can modulate mRNA translation206. i | Incorrectly processed RNAs are recognized by several complexes in the nucleus and cytoplasm that initiate and execute their degradation207, 208. CPSF, cleavage and polyadenylation specificity factor; EJC, exon junction complex; hnRNP, heterogeneous nuclear RNP; NGD, no-go decay; NMD, nonsense-mediated RNA decay; NSD, non-stop decay; PABP, poly(A)-binding protein.
The recent development of large-scale quantitative methods, especially next-generation sequencing and modern protein mass spectrometry2, 3, 4, 5, 6, facilitates genome-wide identification of RBPs, their protein cofactors and their RNA targets. Deep-sequencing approaches using immunoprecipitation of RBPs, with or without in vivo RNA–protein crosslinking (crosslinking and immunoprecipitation followed by sequencing (CLIP–seq) and RNA immunoprecipitation and sequencing (RIP-seq), respectively)2, 3, as well as in vitro evolution methods7, 8, revealed the binding ranges of RBPs and showed that many RBPs bind to thousands of transcripts in cells at defined binding sites.
 
Despite the growing amount of data collected on RBPs, many questions remain to be answered. Researchers still have an incomplete understanding of how binding specificity is achieved and how the regulatory function of an individual RBP is influenced by synergy and competition with other RBPs. We argue that a balanced approach of detailed biochemical and functional studies paired with complex systems-biology methods will ultimately lead to an understanding of the principles underlying PTGR networks.
 
Although much of the published research centres on mRNA-binding proteins (mRBPs) and messenger RNPs, PTGR is not limited to mRNA maturation and regulation; it also includes processes acting on ncRNAs. In this respect, it may not be surprising that, among the ~150 RBPs listed in the Online Mendelian Inheritance in Man (OMIM) database as being linked to human diseases, only one-third are described as directly binding mRNAs; the others target diverse ncRNAs9.
 
Here, we present a census of 1,542 human RBPs that interact with all known classes of RNAs, detail their families and evolutionary conservation across species, and analyse their expression across tissues and their potential roles in developmental processes. This catalogue of RBPs will guide future analyses of RBPs and provide an overview of known RNA pathways and their protein components.
 
......................................................................................................................................................................
...................................................................................................................................................................... 

Conclusions

A census of human RBPs is essential for organizing our current molecular and genetic understanding of the role of RNA in general gene expression and PTGR. This catalogue provides researchers with a newly curated resource to guide their investigations of PTGR processes and to systematically study RBPs. An analogous catalogue that assesses the abundance of all expressed RNAs (that is, the RBP targets) and that classifies them across tissues and cell types is still missing. Such a catalogue would be a useful complementary document to this census.
 
Of the ~20,500 protein-coding genes in humans, we determined that 7.5% are directly involved in RNA metabolism by binding to and/or processing RNA, or by constituting essential components of RNPs. RBPs are structurally diverse and include many distinct classes of RBDs. Indeed, whereas the three most abundant DNA-binding domains account for 80% of all TFs58, the three most abundant RBDs accounted for only 20% of all RBPs in our census. Based on target-RNA categorization, we found that nearly 50% of RBPs acted in mRNA metabolic pathways and 11% constituted ribosomal proteins, while the rest were involved in the diverse number of ncRNA metabolic processes. The target-based categorization of RBPs can assist interpretation of disease phenotypes and mutations emerging from rapidly increasing patient genome sequencing, and may guide future functional studies. When considering abundances, we found that ribosomal proteins and mRBPs were the most abundant RBPs in the cell. Nevertheless, most RBPs were ubiquitously expressed at higher levels than the residual protein-coding transcriptome, and up to 20% of the total expressed protein-coding transcripts encoded RBPs. Therefore, not only is RNA metabolism one of the most conserved cellular processes, but it also has one of the highest protein copy number demands.
 
Many details of PTGR remain to be revealed, including the dissection of newly discovered RNA regulatory processes1, 184, 185. The investigation of PTGR networks is aided by the rapid development of next-generation sequencing-based methods, such as RIP- and CLIP-based methods2, 3, 14, ribosome profiling186, in vivo RNA secondary structure profiling187, 188, 189, small and long RNA-seq6, 190, 191, and 3′-end sequencing methods that profile alternative polyadenylation sites and poly(A) tail lengths158, 192, 193, 194. These studies reveal an unanticipated complexity in RBP binding and targeting, and highlight the need to experimentally dissect PTGR networks in various cellular systems.

Published online

http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg3813.html?WT.mc_id=FBK_NatureReviews

 



No comments:

Post a Comment